Question:
How does one go about creating a robots.txt file and placing it in a web site? I hear a lot of talk about robots.txt and how important it is for SEO.
Submitted by Bill R.
Answer:
A robots.txt file won't have any SEO implications at all, per se, but using one incorrectly can prevent your website from being crawled properly. This means that one or more of your web pages (or your entire site) may not be indexed and ranked by the search engines.
The most important thing to understand about a robots.txt file is that you may not need it at all! Many webmasters create one (incorrectly) and upload it to their website only to find their site missing from Google after the next update.
My recommendation is not to use a robots.txt file unless you absolutely have to. After all, it is impossible to mess up a file that doesn't exist.
When do you need a robots.txt file? Only when you have directories or files that you don't want the robots to crawl. A good example would be if you have a download page for a software package or ebook that you sell.
You obviously wouldn't want to have your download page listed in Google where people could find it and download your item for free.
You can prevent that from happening by blocking that page with a robots.txt file.
Note: Create your robots.txt file in Notepad as a plain text file, then upload it to your server's root directory.
There are several ways to use a robots.txt file, but the simplest, safest, and most effective way is to simply disallow a particular directory.
For example, you have a download page called /software-download.html. You could create a special directory called /secret and place the download page in that Directory. You would create a robots.txt file with these two lines:
User-agent: *
Disallow: /secret/
The * means that all robots (including googlebot) should respect the line(s) that follows. In this case, all robots that respect and follow robots.txt directives (some don't) will ignore the /secret folder and all files that are in it.
Another way to disallow crawling of the file is to disallow it exclusively, like this:
User-agent: *
Disallow: software-download.html
If you want to prevent googlebot (or any other robot) from crawling but allow all others, you must explicitly name the one you want to exclude. For example, the following robots.txt file would prevent googlebot from crawling the /secret directory while allowing all others:
User-agent: googlebot
Disallow: /secret/
You can also disallow crawling of multiple directories and files by adding an entry for each one:
User-agent: *
Disallow: /secret/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: software-download.html
A robots.txt file can be a powerful tool when used correctly, but when used incorrectly it can leave your private data exposed to the public and/or ruin your search engine rankings. Rule of thumb: Use a robots.txt file only when necessary and make sure you use it correctly.
Back to "SEO Questions and Answers" archive
Submit a question of your own
Be sure to check out free 21 part online guide to SEO:
Webmaster SEO Toolkit and our SEO and webmaster articles. |