Google Search Console and Robots TXT

  • Unknown's avatar

    Hi,

    When I use Google Search Console I get this message

    Crawl

    Time

    Jul 8, 2024, 12:50:05 PM

    Crawled as

    Google Inspection Tool smartphone

    Crawl allowed?

    error

    No: blocked by robots.txt

    Page fetch

    error

    Failed: Blocked by robots.txt

    Indexing allowed?

    N/A

    But when I check my site robot txt page, I get this.

    # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.

    # Please see https://developer.wordpress.com/docs/firehose/ for more details. Sitemap: https://scottishdistanceskateboarding.org/sitemap.xml Sitemap: https://scottishdistanceskateboarding.org/news-sitemap.xml

    User-agent: * Disallow:

    /wp-admin/ Allow:

    /wp-admin/admin-ajax.php Disallow:

    /wp-login.php Disallow:

    /wp-signup.php Disallow:

    /press-this.php Disallow:

    /remote-login.php Disallow:

    /activate/ Disallow:

    /cgi-bin/ Disallow:

    /mshots/v1/ Disallow:

    /next/ Disallow:

    /public.api/

    # This file was generated on Sun, 07 Jul 2024 19:36:07 +0000

    Which I think means it’s okay; Google’s Robots check also comes back okay.

    Am I missing something obvious?

    The blog I need help with is: (visible only to logged in users)

  • Unknown's avatar

    Hi there!

    Check to see which page URLs are showing as blocked by robots.txt. There are certain pages which should be blocked from being crawled.

    For instance, all pages beginning with /wp-admin/ should be blocked because they are for your site administration and are not public. The exception is that /wp-admin/admin-ajax.php is allowed because it is used to route AJAX requests on WordPress.

    So, if your homepage scottishdistanceskateboarding.org is blocked by robots.txt, that’s not good, but if scottishdistanceskateboarding.org/wp-admin is blocked by robots.txt, then that is expected.

  • Unknown's avatar

    Hi,

    I hope the images below can explain it a bit more, but I can’t figure out the mistake.

    I get this when checking.



    This is what Google says.

  • Unknown's avatar

    Hi again! Thanks for providing these images!

    In the first image, I saw that you tested the domain using a robots.txt validator, and it shows that the Googlebot is allowed on your site’s root domain, scottishdistanceskateboarding.org.

    In the second image, I saw that Google Search Console is showing that the root domain is blocked by robots.txt. However, one factor to keep in mind is that your site was only recently made public within the last 24 hours. When your site was private, it could not be indexed, so this is the reason Google is showing the site as blocked. It will take Google some time to understand that there has been a change to your site privacy settings and to crawl and index the site.

    I advise to wait for a few days and then try again to request Google to crawl your site. You can refer to this guide from Google to learn how to troubleshoot issues with site crawling and indexing: https://support.google.com/webmasters/answer/7440203

  • Unknown's avatar
  • The topic ‘Google Search Console and Robots TXT’ is closed to new replies.