星期一, 6月 06, 2005

A robots.txt File

A robots.txt File: "Using a robots.txt File
04/25/99 by John Pollock

Article Index



This is a useful file that keeps search engines from indexing pages you do not want spidered. Why would you not want a page indexed by a search engine? Perhaps you want to display a page that shows an example of spamming the search engines. This type of page might include an example of repeated keywords, hidden tags with keywords, and other things that could get a page or an entire site banned from a search engine.
An example of such a page is on this server, it is another one of the articles here- and it talks about search engine spammers. To look at the article, see The 'Secrets' of Spamdexers.
The robots.txt file is a good way to prevent this page from getting indexed. However, not every site can use it. The only robots.txt file that the spiders will read is the one at the top html directory of your server. This means you can only use it if you run your own domain. The spiders will look for the file in a location similar to these below:
http://www.pageresource.com/robots.txt
http://www.javascriptcity.com/robots.txt
http://www.mysite.com/robots.txt
Any other location of the robots.txt file will not be read by a search engine spider, so the file locations below will not be worthwhile:
http://www.pageresource.com/html/robots.txt
http://members.someplace.com/you/robots.txt
http://someisp.net/~you/robots.txt
Now, if you have your own domain- you can see where to place the file. So let's take a look at exactly what needs to go into the robots.txt file to make the spider see what you want done.
If you want to exclude all the search engine spiders from your entire domain, you would write just the following into the robots.txt file:
Us"

沒有留言:

張貼留言

改為香港祈禱

助養多年的兩個女孩,一個是孟加拉的,一個是越南的。應該已經差不多成年了吧? 今天,毅然放棄助養,是因為覺得香港的年輕一代更需攜助與支持。宣明會對香港的林鄭災難,以祈禱來解決,那麼我也祈禱那不富裕的外國家庭可以繼續讀書。 香港的未來,更應該給他們校園的陽光,而不是鐵窗密室! ...