I have the following setup in my wordpress robots.txt file. For some reason the allow part of this isn't working? According to google webmaster tools it doesn't like the following.

Can anyone tell me why?

Disallow: /blog/author/*
Allow: /blog/author/admin

Thanks! :)

有帮助吗?

解决方案

The trailing * is unnecessary. The robots.txt convention is that the Disallow expression will block any URL that starts with the expression. The original robots.txt specification didn't have wildcards. With wildcards, /blog/author/ and /blog/author/* mean the same thing.

The original robots.txt specification says that bots are to read the robots.txt file and apply the first matching rule. Although the original spec didn't include the Allow directive, early implementors continued to use the "first matching rule" rule. If Googlebot is using that, then it would see the disallow line and assume that it can't crawl /blog/author/admin, because it matches.

I would suggest moving the Allow above the Disallow, and removing the asterisk from the Disallow expression.

其他提示

I think what you trying to do in your WordPress robots.txt is the same you can see in this case webbingbcn.es/robots.txt but allowing /wp-admin/.

  • Allow: /wp-admin/
  • Disallow: /author/
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top