Blocking Bingbot in robots.txt. Why?
-
Hey there,
why is the Bingbot crawler being blocked on one of my pages?
The others: Fine. But Bing? That’s a huge traffic loss. How do I change that and why has it happened recently?
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-login.php
Disallow: /wp-signup.php
Disallow: /press-this.php
Disallow: /remote-login.php
Disallow: /activate/
Disallow: /cgi-bin/
Disallow: /mshots/v1/
Disallow: /next/
Disallow: /public.api/User-agent: GPTBot
Disallow: /User-agent: CCBot
Disallow: /User-agent: SentiBot
Disallow: /User-agent: sentibot
Disallow: /User-agent: Google-Extended
Disallow: /User-agent: FacebookBot
Disallow: /User-agent: omgili
Disallow: /User-agent: omgilibot
Disallow: /User-agent: Amazonbot
Disallow: /User-agent: Bingbot
Disallow: /This file was generated on Sun, 22 Oct 2023 18:59:32 +0000
I’ve noticed this on other sites too, so it’s not just my site.
-
-
This has happened to me too! And unless this is resolved I am jumping WordPress ship… I want my site to be indexed.
-
Hi @capitanos00ii I only see one site under your WordPress.com account when Iook here: https://wordpress.com/sites/
From what I can see that site has a very different robots.txt output from what you have mentioned: https://roboticaepic.wordpress.com/robots.txt
User-agent: * Disallow: / # This file was generated on Fri, 17 Nov 2023 19:34:26 +0000Can you clarify the site (or sites) you are having this issue on?
-
-
@davidaleksandersen you are sure on the spelling? I did not see any site at velferdsteknolobloggen.no when I visit. Also I looked up the domain in the .no registry, and did not see it listed here: https://www.norid.no/en/domeneoppslag/hvem-har-domenenavnet/

-
Sorry. I had a typo. :(
velferdsteknologibloggen.no is the one.
But the strange thing is that I have many pages of the website indexed. But it is a few critical pages that Bing refuses to index.I have another domain too that has the same robots.txt file but there all pages are indexed… :(
-
Hey there,
Many thanks for that clarification.
This is a relatively new change made on our part. Recently, we discovered that Microsoft is using its generic
Bingbotcrawler to scrape sites, and this has detrimental affects currently. Currently, we can’t go into details as to what the outcome of this scraping is doing, for security and privacy purposes.They have not yet documented a way to block the scraping behavior, so for the moment, we have blocked
Bingbotfrom indexing Simple sites via robots.txt directives, and this is something we’re monitoring closely.I hope this helps to provide some information!
-
Thanks. But I still do not understand why Bing is managing to index some of the pages without problem but others are not indexed.
This page is not indexed, and I get a message “The inspected URL is known to Bing but has some issues that are preventing indexation. We recommend you follow Bing Webmaster Guidelines to increase your chances of indexation.”
https://velferdsteknologibloggen.no/produkt/
But I do not understand what is wrong with the page. It has been so for quite some time (I have tried making a new page too). But now it is stopped probably due to your overall blocking of Bingcrawl.
While this page is indexed but has now been blocked by your robots.txt
https://velferdsteknologibloggen.no/hva-er-gdpr-personvern-og-sensitive-personopplysninger/
“The inspected URL has been indexed, despite being blocked from crawling by robots.txt. Please make sure that the block is valid as this might prevent us from showing the description about the page in the search results.”
We do not have any of the same issues with other CMS like HubSpot so an option is obviously to migrate, but… -
Thanks for that context. Have they indicated why they’re indexing some pages regardless of robots.txt requests, but not others?
I’ll see if we can get any further updates from this side, too.
By the way, once staff is involved with a thread, you only need to reply for us to see it, so you won’t need to add the modlook tag again; just respond and we’ll see it as we work through other requests.
-
The Bingbot is being used with ChatGPT, to scrape sites and steal content that is then reposted on other sites, which negatively affects your own site’s traffic.
Blocking the bot keeps your content safe, with the caveat that you lose indexing.
-
Sooo… To avoid having my content end up in ChatGPT for others to steal (it does not really work that way anyway…), I cannot be found in search so that my potential clients cannot find me.
If somebody uses my content as a source of inspiration, that is absolutely fine for me. Not being found in search is a 100% showstopper. -
-
-
Hi, I have the same problem. No, Bing does not “indicate why they’re indexing some pages regardless of robots.txt requests, but not others”.
It would be still good to know why certaing pages do appear there while others don’t.
-
Hi @matjazcrnivec,
Thank you for reporting here. I have relayed your message internally, and we’ll keep you and others updated as we receive new information.
Thank you! -
I’ve also just noticed that Bingbot is blocked! Replying here to get notified if there is an update / solution!
-
-
This is a terrible decision. I just started paying for WordPress to launch my blog, just to find that it can’t be indexed by Bing. Will definitely move my page if this issue isn’t resolved soon.
I don’t care if my content is included in Common Crawl, GPT 5 training data, Gemini, LLaMA 3 or whatever future LLM you are concerned about.
- The topic ‘Blocking Bingbot in robots.txt. Why?’ is closed to new replies.