Google Search Console and Robots TXT
-
Hi,
When I use Google Search Console I get this messageCrawl
Time
Jul 8, 2024, 12:50:05 PM
Crawled as
Google Inspection Tool smartphone
Crawl allowed?
error
No: blocked by robots.txt
Page fetch
error
Failed: Blocked by robots.txt
Indexing allowed?
N/A
But when I check my site robot txt page, I get this.
# If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.# Please see https://developer.wordpress.com/docs/firehose/ for more details. Sitemap: https://scottishdistanceskateboarding.org/sitemap.xml Sitemap: https://scottishdistanceskateboarding.org/news-sitemap.xml
User-agent: * Disallow:
/wp-admin/ Allow:
/wp-admin/admin-ajax.php Disallow:
/wp-login.php Disallow:
/wp-signup.php Disallow:
/press-this.php Disallow:
/remote-login.php Disallow:
/activate/ Disallow:
/cgi-bin/ Disallow:
/mshots/v1/ Disallow:
/next/ Disallow:
/public.api/
# This file was generated on Sun, 07 Jul 2024 19:36:07 +0000
Which I think means it’s okay; Google’s Robots check also comes back okay.
Am I missing something obvious?The blog I need help with is: (visible only to logged in users)
-
Hi there!
Check to see which page URLs are showing as blocked by robots.txt. There are certain pages which should be blocked from being crawled.
For instance, all pages beginning with /wp-admin/ should be blocked because they are for your site administration and are not public. The exception is that /wp-admin/admin-ajax.php is allowed because it is used to route AJAX requests on WordPress.
So, if your homepage scottishdistanceskateboarding.org is blocked by robots.txt, that’s not good, but if scottishdistanceskateboarding.org/wp-admin is blocked by robots.txt, then that is expected.
-
Hi,
I hope the images below can explain it a bit more, but I can’t figure out the mistake.
I get this when checking.

This is what Google says.
-
Hi again! Thanks for providing these images!
In the first image, I saw that you tested the domain using a robots.txt validator, and it shows that the Googlebot is allowed on your site’s root domain, scottishdistanceskateboarding.org.
In the second image, I saw that Google Search Console is showing that the root domain is blocked by robots.txt. However, one factor to keep in mind is that your site was only recently made public within the last 24 hours. When your site was private, it could not be indexed, so this is the reason Google is showing the site as blocked. It will take Google some time to understand that there has been a change to your site privacy settings and to crawl and index the site.
I advise to wait for a few days and then try again to request Google to crawl your site. You can refer to this guide from Google to learn how to troubleshoot issues with site crawling and indexing: https://support.google.com/webmasters/answer/7440203
-
- The topic ‘Google Search Console and Robots TXT’ is closed to new replies.