“Robots.txt”

robotstxt.jpg

So there it is, the source of all we know about the growing AI that makes up Google. The only way to stop the avaricious engine of the Internet from indexing everything in it’s path is this simple phrase.

It is a fascinating thing to drill down into, the deeper workings on the spiders, bots and agents swarming the web in search of content. How do these things work, and what is really going on. I have read that Google has been building these vast server farms with independent power sources and resources at certain strategic places around the globe to ‘mirror’ the entire Internet.

Anyway, grand conspiracy theories aside, if you want to check the lexicon of search agents, check out robotstxt !

4 Responses to ““Robots.txt””

  1. I understand thats the meta but the robots.txt is actually a file

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /secure/

    Sitemap: http://www.example.com/sitemap.xml

    the above is just an example you can change it to however you need it.

    So you really dont need the tag. I dont have any secure parts so I use this as my meta.
    The Google Bot i just put in there, you dont need it.

    and my robots.txt file located on the root of my server looks like this…

    # Allows all robots

    User-agent: *
    Disallow:

  2. my meta is

  3. META NAME=”robots” CONTENT=”FOLLOW,INDEX”
    meta name=”googlebot” content=”index, follow, all”

  4. Hey Joe,

    Thanks fer commenting - excellent advice, I concur !

    best

    tarky7

Discussion Area - Leave a Comment