Forum

Deny multiple bots

Prit
31 July 2009, 21:36
Hi Hugo,

Can you tell me how I can block multiple bots.

I tried using individual lines for each bot with "DenyBot". I noticed that only the first line is working.

Hiawatha version: 6.16
Operating System: Ubuntu 9.04
Hugo Leisink
31 July 2009, 22:02
Hmm, looks like you can't I will fix this for the next verison.
Prit
31 July 2009, 22:07
Thanks Hugo.

Also, it would be great if there is a small explanation with examples on how to use this feature with some common robots.
As an example - below are some bots that show up on my stats. What would be the denybot line to use to deny all access for these?
twiceler, Googlebot, MSNBot, Yahoo Slurp, BaiDuSpider
Hugo Leisink
1 August 2009, 17:32
Oke, I was wrong about my own code. It's already possible to specify multiple DenyBot options. I just tested it and it worked.
VirtualHost {
...
DenyBot = msnbot:/
DenyBot = yahoo:/
}


How to use the DenyBot option: look for the User-Agent string in your logfile and use a unique work from that line. For example, if a Google bot visits my website, it uses this User-Agent string:
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)


So, I can block the Google bot for my images and css directory by using
DenyBot = Googlebot:/images,/css
Prit
3 August 2009, 21:58
Hi Hugo, I have the denybot for Googlebot in my hiawatha conf file. I also have blocked all hosts in my robots.txt file. This is running from yesterday. Today, I noticed that Googlebot has checked the robots.txt file again and it has got 200 status code.

I thought that it should get 403 status code because we have blocked it in Hiawatha. Any ideas?
Hugo Leisink
3 August 2009, 22:14
What DenyBot line do you have? Can you post the googlebot log line?
Prit
4 August 2009, 18:47
Here are some denybot lines I am using:
DenyBot = Googlebot:/
DenyBot = twiceler:/
DenyBot = MSNBot:/
DenyBot = yahoo:/
DenyBot = BaiDuSpider:/
DenyBot = Ask:/


All these still seem to be visiting and getting a good status from the pages they read or atleast from robots.txt.
Prit
5 August 2009, 10:09
I am not sure what changed, but using the same configuration as I mentioned in my earlier post. Googlebot is receiving 403, but the other bots are still having 200 status.
Hugo Leisink
6 August 2009, 02:01
Oke, I will take a look at it.
This topic has been closed.