security of robots.txt

  • Unknown's avatar

    I checked the robots.txt on my blog but I find really strange that there is no disallow rules for admin and static contents (alll wp-*) like the one that are recommended for wordpress.org.
    http://codex.wordpress.org/Search_Engine_Optimization_for_WordPress
    http://qeqnes.wordpress.com/2010/04/20/the-ultimate-wordpress-security-guide/

    I understand you are confident in your security but I would prefer not find the above content indexed by any search engine.

    The blog I need help with is: (visible only to logged in users)

  • Unknown's avatar

    Hi. That’s a question that probably staff is better suited to answer. I tagged this thread for their attention. If you don’t get an answer by January 3rd, contact them directly:

    Contact

  • Unknown's avatar

    @juju41
    The codex has no application here at wordpress.com. We have support documentation located here for the blogs on this wpmu ie. wordpress multi-user blogging platform. > http://en.support.wordpress.com

    Yes, it’s possible to block all search engines from your blog so no content is indexed. Simply choose the option numbered 1. here in the privacy settings.

    I would like my blog to be visible to everyone, including search engines (like Google, Bing, Technorati) and archivers – This is the setting used by most blogs. It lets everyone read your blog and allows your blog to be included in search engines and other content sites.
    http://en.support.wordpress.com/settings/privacy-settings/
    video here > http://wordpress.tv/2009/01/14/setting-your-posts-or-entire-blog-to-private.

  • Unknown's avatar

    Since wordpress.COM is a multi-user platform where we all share the same underlying files (we don’t have our own wordpress installation) we have a separate robots.txt file just for our account/blog. There is another master robots.txt file that takes care of the wordpress files side of things.

  • Unknown's avatar

    I removed the modlook tag. Thanks for the answers TSP and timethief

  • Unknown's avatar
  • Unknown's avatar

    @airodyssey & TSP
    G’day to you both & thanks airodyssey. :)

  • Unknown's avatar

    Thanks for your feedback but that was not really the point.

    I want blog to be indexed by search engine. I simply say there is no reason that they index the admin part of any blogs. So a good robots.txt, IMHO, would block them on admin path.

  • Unknown's avatar

    @juju41,All the core theme and wordpress files are in a specific area totally separate from your blog folder. Those folders have their own robots.txt file separate from yours.

    This is a multi-user platform as I said, and that means all the core files are in one central location completely and utterly separate from your blog folder.

    In other words, you DO NOT have a wp-admin, or wp-includes, or wp-content folder in your account here at wordpress.COM.

  • Unknown's avatar

    So, I sent a email to jeff at the Wayback Machine wondering why my blog has not been spidered since 2007 and he said I have to change the robots.txt file which never was changed to stop it. Why would some be there but not the rest?

    My setting I thought was allow everything search engines and spiders and archivers. I mean Google spiders me in minutes or seconds every time. Can we even change the local robots.txt?

    My Q: How do I get my blog submitted to the Wayback Machine? Is it off for everyone?

  • Unknown's avatar

    @sugamari
    TSP has provide the correct answer for those who are referring to wordpress.COM free hosted blogs. https://en.forums.wordpress.com/topic/security-of-robotstxt?replies=10#post-542154

    However, we don’t have a clue which blog you are referring to. Please post the URL for the blog in question starting with http://

    Are you aware of the differences between wordpress.COM and wordpress.ORG?
    Please read this > https://en.forums.wordpress.com/topic/please-read-me-first-before-posting?replies=1

  • Unknown's avatar

    http://sugamari.wordpress.com/

    Here is the Wayback Machine archive link:

    http://web.archive.org/web/*/http://sugamari.wordpress.com

    Why was it there and not now? Is everyone not archived there? I also notice Alexa has become really clunky and hard to use almost as if Facebook bought it. I am looking for other archives too.

    Blog is working fine and my robots.txt appears to be the same as everyone else.
    http://ismyblogworking.com/sugamari.wordpress.com

    O yeah I think I know the main differences between the .org and .com wordpress systems. I’m still here because I love wordpress.com and the software seems better here and easier to use than wordpress.org although I may be losing braincells from not using them so often – that seems to be or is it alzheimers OMG! I’m just getting old I guess.

    I figured I didn’t need to start a new thread. Your quick reply is awesome, timethief, thanks! But I still don’t know – How do I get my blog submitted to the Wayback Machine?

  • Unknown's avatar

    Thanks for posting your URL for the blog in question. I’m sorry but we Volunteers don’t have an answer to your question. We answer technical support questions only. Please send your question directly to Staff using this link. > http://en.support.wordpress.com/contact/

  • Unknown's avatar

    O I am finding it might be my theme

    http://jigsaw.w3.org/css-validator/

    info Line: 81 http://wordpress.com/?ref=footer
    Status: 302 -> (N/A) Forbidden by robots.txt

    The link was not checked due to robots exclusion rules. Check the link manually.

  • Unknown's avatar

    O boy this is turning out to be some problem! As usual it’s turning out to be my own greatest enemy! Myself. Is it really my theme? It’s the same now as the archive links, am I seeing things? I don’t really want to do any drastic fumbling around in the dark changing my theme around when I could just adjust a line in the css file if that is what the problem is…I’m looking at the theme page – should I just try to get a new theme? That can be a lot of work getting it to WYSIWYG. It does need to be fixed though. If that last post is correct and my theme css is preventing spiders then why is google still crawling it?

    OK I guess I better stop cluttering up this channel. Unless you can help me with this. I appreciate your volunteerism in the biggest way.

  • Unknown's avatar

    OK the theme was Simpla it is now The Journalist v1.9 – the validator said nothing about that error … that was way easier than I thought and the site looks even better! And Now I got to go see if I can get jeff to spider it! If I don’t come back in a few days it’s all resolved…Thanks so much timethief !

  • Unknown's avatar

    Hi, I tried changing to The Journalist, but I’m still seeing the following Javascript on my blog:

    < noscrip t>< p >< img class=”robots-nocontent” src=”http://pixel.quantserve.com/pixel/p-18-mFEk4J448M.gif?labels=language.en%2Ctype.wpcom%2Cwp.loggedin” style=”display:none” height=”1″ width=”1″ alt=”” / >< /p >< /noscript >

    The blog is http://singsongsofpraisetohim.wordpress.com. I believe the blog may have been hacked, so I have hardened the password. This blog was being indexed by search engines yesterday, and this line only appeared tonight, and my blog dropped off the search engine pages. It’s very important that this blog be found, because I deleted the old blog that is still indexed (for the time being), and I want users to be redirected here instead.

    Thanks for any help/assistance you can provide.

  • Unknown's avatar

    That is a line relating to the Quantcast stats for your blog. It’s supposed to be there. It has not been hacked.

  • Unknown's avatar

    WordPress.COM blogs are not “hacked.” The only way someone could get into your account would be if they guessed your password, so that is on you.

    Secondly, there is no way for anyone to insert javascript. What you are seeing is the backend javascript for the quantcast web stats. It is on all blogs here.

    Your blog is only 2 days old. Your blog only has posts going back to Feb 6, 2011. It can take six to eight weeks for the search engines to index a new site. You have to just be patient. There is no way to rush the search engines. They will get to you when they get to you. You can sometimes speed up the process by verifying your blog with the major search engines as explained in this support document.

    http://en.support.wordpress.com/webmaster-tools/

    The bottom line though is that they will get to you. You just have to be patient.

  • Unknown's avatar

    Good morning people! :-)

    have you got any result/answer on the robots.txt file? I am looking now on how to make it good, but there are sooo any opinions out there.

    Looking forward to your reply.

  • The topic ‘security of robots.txt’ is closed to new replies.