Google Search Console and Robots TXT

gothtan4130 · Member · Jul 8, 2024 at 12:09 pm
Copy link

Add topic to favorites
Hi,

When I use Google Search Console I get this message

Crawl

Time

Jul 8, 2024, 12:50:05 PM

Crawled as

Google Inspection Tool smartphone

Crawl allowed?

error

No: blocked by robots.txt

Page fetch

error

Failed: Blocked by robots.txt

Indexing allowed?

N/A

But when I check my site robot txt page, I get this.

# If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.

# Please see https://developer.wordpress.com/docs/firehose/ for more details. Sitemap: https://scottishdistanceskateboarding.org/sitemap.xml Sitemap: https://scottishdistanceskateboarding.org/news-sitemap.xml

User-agent: * Disallow:

/wp-admin/ Allow:

/wp-admin/admin-ajax.php Disallow:

/wp-login.php Disallow:

/wp-signup.php Disallow:

/press-this.php Disallow:

/remote-login.php Disallow:

/activate/ Disallow:

/cgi-bin/ Disallow:

/mshots/v1/ Disallow:

/next/ Disallow:

/public.api/

# This file was generated on Sun, 07 Jul 2024 19:36:07 +0000

Which I think means it’s okay; Google’s Robots check also comes back okay.

Am I missing something obvious?

The blog I need help with is: (visible only to logged in users)
helper-edwardn · Member · Jul 8, 2024 at 3:17 pm
Copy link
Hi there!

Check to see which page URLs are showing as blocked by robots.txt. There are certain pages which should be blocked from being crawled.

For instance, all pages beginning with /wp-admin/ should be blocked because they are for your site administration and are not public. The exception is that /wp-admin/admin-ajax.php is allowed because it is used to route AJAX requests on WordPress.

So, if your homepage scottishdistanceskateboarding.org is blocked by robots.txt, that’s not good, but if scottishdistanceskateboarding.org/wp-admin is blocked by robots.txt, then that is expected.
gothtan4130 · Member · Jul 8, 2024 at 5:32 pm
Copy link
Hi,

I hope the images below can explain it a bit more, but I can’t figure out the mistake.

I get this when checking.

This is what Google says.
helper-edwardn · Member · Jul 8, 2024 at 5:51 pm
Copy link
Hi again! Thanks for providing these images!

In the first image, I saw that you tested the domain using a robots.txt validator, and it shows that the Googlebot is allowed on your site’s root domain, scottishdistanceskateboarding.org.

In the second image, I saw that Google Search Console is showing that the root domain is blocked by robots.txt. However, one factor to keep in mind is that your site was only recently made public within the last 24 hours. When your site was private, it could not be indexed, so this is the reason Google is showing the site as blocked. It will take Google some time to understand that there has been a change to your site privacy settings and to crawl and index the site.

I advise to wait for a few days and then try again to request Google to crawl your site. You can refer to this guide from Google to learn how to troubleshoot issues with site crawling and indexing: https://support.google.com/webmasters/answer/7440203
gothtan4130 · Member · Jul 8, 2024 at 6:42 pm
Copy link
Cheers

No items found.