Google reports a huge increase in 404 Page not found errors

  • Unknown's avatar

    Just got a notice from Google:
    Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site’s pages.

    I went to the crawl error page, and indeed, what used to be 100-200 a day are now 2,000 a day. Starting on december 21st, the graph is rising linearly.

    Most of them are links with extraneous data in the link.

    For example, instead of:
    http://disperser.wordpress.com/2012/03/03/the-bear-of-woodland-park

    the link is:
    http://disperser.wordpress.com/2012/03/03/the-bear-of-woodland-park/Disperser.Wordpress.com

    instead of:
    http://disperser.wordpress.com/category/photography

    The link is:
    http://disperser.wordpress.com/category/photography/Disperser.wordpress.com/about

    Before I try to contact Google (nearly impossible), I wanted to make sure this is not a problem with WordPress.

    If it is a problem with WordPress, how do I fix it, and if it isn’t, does anyone know what the problem might be.

    To my knowledge (or memory), I’ve not done any changes to my site layout for a good-long while.

    Any help would be appreciated.

    ejd

    The blog I need help with is: (visible only to logged in users)

  • Unknown's avatar

    When I look at the originating sites for these bad links, it’s my own posts, but when I look for where these links may be in the posts, I don’t see them.

    Also, many of the posts flagged as originating the bad link are over a year old . . . hey would have shown up before.

    WordPress Support: have there been updates or changes in the last month that would have triggered these problems?

    Also, I download ded the crawl data, and can send it along if desired.

  • Unknown's avatar

    Finally, is there an option to add a sitemap to the blog?

  • Hi there,

    I went to the crawl error page, and indeed, what used to be 100-200 a day are now 2,000 a day. Starting on december 21st, the graph is rising linearly.
    Most of them are links with extraneous data in the link.

    Is it possible to link me directly to the page with this information about your site, or could you make a screenshot of it? Here are instructions for how to make a screenshot: http://en.support.wordpress.com/make-a-screenshot/

    You can upload it to your blog’s Media Library and I can look at it there.

    Finally, is there an option to add a sitemap to the blog?

    This isn’t necessary, as we generate sitemaps for all WordPress.com blogs automatically:

    Find Your Sitemap

  • Unknown's avatar

    I uploaded both a snapshot of the Crawl Errors Page, and the downloaded Excel spreadsheet of the links generating the errors.

    The originating pages for all the errors I checked (I did not check all 1,000s of them) are other pages on my blog.

    When I go to those pages, there is no evidence of the “faulty” link anywhere I can see.

    Thanks for your reply, and look forward to some explanation of what this is.

  • Unknown's avatar

    I would love to link it, but you would need to log into my Google account, and I’m not prepared to do that right now.

    I’ll research other methods by which it can be shared.

  • Hi there,

    That’s great, the screenshot is all I needed.

    I’m not sure what’s causing that. The odd thing is that the URL that’s on the end of those links varies in capitalization — sometimes it’s Disperser.wordpress.com and sometimes Disperser.Wordpress.com. Which makes me wonder if it’s somehow a setting that got added in webmaster tools itself.

    I poked around in my webmaster tools to see if there’s anything that would make that happen, but there is nothing obvious. Could you check through your various settings and see if there’s anything you’ve manually added anywhere?

    Meanwhile, I’m checking with our data team to see if this is something coming from our end, or something we could address.

  • Unknown's avatar

    There are some which specifically reference:
    Disperser.Wordpress.com/About

    Not just the upper level URL.

    I uploaded all my settings (and a new error count) snapshots to my media, but here’s the thing . . .

    I don’t remember setting this up.

    That’s nothing new as I am old and feeble-minded, but it could also be that Google asked me, and I said “yes” because it sounds neat to be called a “webmaster” (just saying that; the real reason is because it might have said I could track usage, but since I cannot add Google Analytics to my blog, I probably forgot about it right after saying ‘yes’)

    However, assuming I did set this up or agreed to have it on, it would have been at least a year ago, and possibly two. There is nothing I changed to any settings last month since I, the feeble-minded I, did not remember this was on.

    By the way, I also got a BingMaster errors e-mail (which I have yet to check out since that too is outside my memory).

    I noticed there is a place to report issues to Google, so I could do that if you find nothing at your end.

    Thanks again for your response.

  • Unknown's avatar

    Side note . . . did you guys do anything to my account: I just started (my last few posts) getting notices for my own comments, something that never used to happen (i.e. I respond to a comment, and I get an e-mail with my own comment).

    Getting notices for tracebacks as comments, also something new.

  • Hi there,

    Ok, our data team figured out where this is coming from — It’s the link to your site in the text portion of your Gravatar sidebar widget. You’ve entered it as a relative URL so it gets appended to the URL of each page it appears on and then Google is trying to crawl those pages. You’ll need to take out the “rel” in that HTML.

    I just started (my last few posts) getting notices for my own comments, something that never used to happen (i.e. I respond to a comment, and I get an e-mail with my own comment).

    Sometimes this can happen if you accidentally click the “change” link in the comment form before commenting, essentially registering your comment as a logged-out comment. Then, that sticks as the default when you comment in future. To fix this, next time you comment, click “change” and then the WordPress icon:
    https://cloudup.com/cqfj9xjoA1w

    Then you’ll be commenting with your logged-in account again.

  • Unknown's avatar

    The CC license . . . I did change that. I used their little script, and did not realize the way it worked. I should have tested the links it created to see where they were going.

    It would not have occurred to me at all, so I thank you for that, and my apologies for wasting your time.

    I did change the comment following your procedure, and I thank you for that as well.

  • Actually, turns out the comment issue was a bug that has since been fixed, so you should be all set there.

    Not a time waste at all – I learned something new with this, had never seen it before. :)

  • The topic ‘Google reports a huge increase in 404 Page not found errors’ is closed to new replies.