Blocking Bingbot in robots.txt. Why?

capitanos00ii · Member · Nov 12, 2023 at 2:08 pm
Copy link

Add topic to favorites
Hey there,

why is the Bingbot crawler being blocked on one of my pages?

The others: Fine. But Bing? That’s a huge traffic loss. How do I change that and why has it happened recently?

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-login.php
Disallow: /wp-signup.php
Disallow: /press-this.php
Disallow: /remote-login.php
Disallow: /activate/
Disallow: /cgi-bin/
Disallow: /mshots/v1/
Disallow: /next/
Disallow: /public.api/

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: SentiBot
Disallow: /

User-agent: sentibot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Bingbot
Disallow: /

This file was generated on Sun, 22 Oct 2023 18:59:32 +0000

I’ve noticed this on other sites too, so it’s not just my site.
capitanos00ii · Member · Nov 16, 2023 at 1:11 pm
Copy link
An answer to this question or a link with information would be very nice.
davidaleksandersen · Member · Nov 17, 2023 at 1:32 pm
Copy link
This has happened to me too! And unless this is resolved I am jumping WordPress ship… I want my site to be indexed.
staff-totoro · Staff · Nov 17, 2023 at 7:36 pm
Copy link
Hi @capitanos00ii I only see one site under your WordPress.com account when Iook here: https://wordpress.com/sites/

From what I can see that site has a very different robots.txt output from what you have mentioned: https://roboticaepic.wordpress.com/robots.txt
```
User-agent: *
Disallow: /

# This file was generated on Fri, 17 Nov 2023 19:34:26 +0000
```
Can you clarify the site (or sites) you are having this issue on?
davidaleksandersen · Member · Nov 17, 2023 at 8:42 pm
Copy link
for me it is velferdsteknolobloggen.no that is suffering
staff-totoro · Staff · Nov 17, 2023 at 10:20 pm
Copy link
@davidaleksandersen you are sure on the spelling? I did not see any site at velferdsteknolobloggen.no when I visit. Also I looked up the domain in the .no registry, and did not see it listed here: https://www.norid.no/en/domeneoppslag/hvem-har-domenenavnet/
davidaleksandersen · Member · Nov 18, 2023 at 8:25 am
Copy link
Sorry. I had a typo. :(

velferdsteknologibloggen.no is the one.

But the strange thing is that I have many pages of the website indexed. But it is a few critical pages that Bing refuses to index.

I have another domain too that has the same robots.txt file but there all pages are indexed… :(
aleone89 · Staff · Nov 20, 2023 at 4:06 pm
Copy link
Hey there,

Many thanks for that clarification.

This is a relatively new change made on our part. Recently, we discovered that Microsoft is using its generic Bingbot crawler to scrape sites, and this has detrimental affects currently. Currently, we can’t go into details as to what the outcome of this scraping is doing, for security and privacy purposes.

They have not yet documented a way to block the scraping behavior, so for the moment, we have blocked Bingbot from indexing Simple sites via robots.txt directives, and this is something we’re monitoring closely.

I hope this helps to provide some information!
davidaleksandersen · Member · Nov 21, 2023 at 7:52 am
Copy link
Thanks. But I still do not understand why Bing is managing to index some of the pages without problem but others are not indexed.

This page is not indexed, and I get a message “The inspected URL is known to Bing but has some issues that are preventing indexation. We recommend you follow Bing Webmaster Guidelines to increase your chances of indexation.”

https://velferdsteknologibloggen.no/produkt/

But I do not understand what is wrong with the page. It has been so for quite some time (I have tried making a new page too). But now it is stopped probably due to your overall blocking of Bingcrawl.

While this page is indexed but has now been blocked by your robots.txt

https://velferdsteknologibloggen.no/hva-er-gdpr-personvern-og-sensitive-personopplysninger/

“The inspected URL has been indexed, despite being blocked from crawling by robots.txt. Please make sure that the block is valid as this might prevent us from showing the description about the page in the search results.”

We do not have any of the same issues with other CMS like HubSpot so an option is obviously to migrate, but…
supernovia · Staff · Nov 23, 2023 at 7:08 pm
Copy link
Thanks for that context. Have they indicated why they’re indexing some pages regardless of robots.txt requests, but not others?

I’ll see if we can get any further updates from this side, too.

By the way, once staff is involved with a thread, you only need to reply for us to see it, so you won’t need to add the modlook tag again; just respond and we’ll see it as we work through other requests.
parkdanil · Member · Nov 23, 2023 at 8:40 pm
Copy link
The Bingbot is being used with ChatGPT, to scrape sites and steal content that is then reposted on other sites, which negatively affects your own site’s traffic.

Blocking the bot keeps your content safe, with the caveat that you lose indexing.
davidaleksandersen · Member · Nov 24, 2023 at 11:54 am
Copy link
Sooo… To avoid having my content end up in ChatGPT for others to steal (it does not really work that way anyway…), I cannot be found in search so that my potential clients cannot find me.

If somebody uses my content as a source of inspiration, that is absolutely fine for me. Not being found in search is a 100% showstopper.
supernovia · Staff · Dec 1, 2023 at 2:14 am
Copy link
Thanks folks. We’ll update you when we have other options.
davidaleksandersen · Member · Dec 1, 2023 at 5:55 pm
Copy link
yeah.. I have started preparing to move my sites. 🐢
matjazcrnivec · Member · Dec 2, 2023 at 2:52 pm
Copy link
Hi, I have the same problem. No, Bing does not “indicate why they’re indexing some pages regardless of robots.txt requests, but not others”.

It would be still good to know why certaing pages do appear there while others don’t.
staff-doublebassd · Staff · Dec 2, 2023 at 6:37 pm
Copy link
Hi @matjazcrnivec,

Thank you for reporting here. I have relayed your message internally, and we’ll keep you and others updated as we receive new information.

Thank you!
elkement · Member · Dec 6, 2023 at 5:10 pm
Copy link
I’ve also just noticed that Bingbot is blocked! Replying here to get notified if there is an update / solution!
fmhs4808 · Member · Dec 7, 2023 at 5:04 pm
Copy link
I have the same problem. I am not happy.
staff-doublebassd · Staff · Dec 7, 2023 at 5:59 pm
Copy link
Hi there @fmhs4808,

Thank you for reporting this to us here. We’ll be sure to update you and the rest of the users in this thread once we receive any new information on any other options.
danielmewes7081 · Member · Dec 10, 2023 at 7:46 am
Copy link
This is a terrible decision. I just started paying for WordPress to launch my blog, just to find that it can’t be indexed by Bing. Will definitely move my page if this issue isn’t resolved soon.

I don’t care if my content is included in Common Crawl, GPT 5 training data, Gemini, LLaMA 3 or whatever future LLM you are concerned about.

No items found.