Вопрос

I was fixing url's on a website, and one of the problems there was that the url's contained characters that were sometimes upper-case while other times lower-case, the server did not care about it, but google did, and indexed the pages as duplicates. Also some urls contained characters that are simply not allowed to be in that part of the URL, like commas "," and brackets "()" although [round brackets are technically not reserved][1] I still decided to get rid of them by encoding them.

I added a check that checks if the url is valid, and if not, would do a 301 redirect to the correct url.

for example http://www.example.com/articles/SomeGreatArticle(2012).html would do a 301 redirect to http://www.example.com/articles/somegreatarticle%282012%29.html

It works, and it does one redirect to the correct url.

But for a small fraction of the pages (which are possibly the only pages google has indexed so far) google webmaster tools started to give me the following error under the Crawl errors > Not followed tab:

Google couldn't follow your URL because it redirected too many times.

googling for this error with quotes gives me 0 results, and I'm sure I'm not the only one to ever get this error, so I would like to know some more information about it, for example:

  1. how many redirects can a single page do before google thinks that it's too many?
  2. what are the other possible causes for such an error?
Это было полезно?

Решение

SOLUTION

According to this experiment http://www.monperrus.net/martin/google+url+encoding

Google has it's own character encoding rules, where google will always encode some characters and always decode other.

The following characters are never encoded

-,.@~_*)!$'(

So even if you give Google this url

http://www.example.com/articles/somegreatarticle%282012%29.html 

where the round brackets () are encoded, google will transform this URL, decode the brackets and follow this URL instead:

http://www.example.com/articles/somegreatarticle(2012).html

What happened in my situation:

http://www.example.com/articles/somegreatarticle(2012).html

my server would do a 301 redirect to

http://www.example.com/articles/somegreatarticle%282012%29.html 

while Googlebot would ignore the encoded brackets and follow:

http://www.example.com/articles/somegreatarticle(2012).html

get redirected to

http://www.example.com/articles/somegreatarticle%282012%29.html

follow

http://www.example.com/articles/somegreatarticle(2012).html

get redirected to

http://www.example.com/articles/somegreatarticle%282012%29.html

and give up after a couple of tries and show the "Google couldn't follow your URL because it redirected too many times" error.

Другие советы

I don't know about Google webmaster tools, but I have seen a similar error in PHP, when there is an infinite loop of redirection. Make sure that none of the pages is redirecting to itself.

Oke first of all I would remove the () and , signs from the urls, it is a fact that googlebot has a harder time working with these. And they don't do any benefit for SEO purposes either. Readability for the client isn't an issues so if i where you just use a - or _ dash. Try not to use any other character in your file/folder names.

You should also clean up your html, there are quite some errors and issues to resolve.

A cleaner source is better for google, browsers and your visitors.

I couldn't find any definitive problem that google would have an issue with.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top