Hi,
I checked my logs and found a bot accessing just about all of the pages on my site without checking the robots.txt - so I queried the IP on google and found it is among a host of 'bad bots' that are normally just scraping content and in the long term could lead to a lot of excessive bandwidth being used.
I found this information and a way to trap the bad bots from scraping content - however, I dont know the technical implications on unflux server and whether this would be ok or not so I am just posting the link to the page explaining it and hopefully I get some feedback if it is ok to try and implement it.
The Ip for the spambot was: 63.80.56.36
http://www.kloth.net/internet/bottrap.php (there are two methods, one via htaccess and the other via php - although I did see that it involved placing a 1px graphic on the page to trap the bad bots - this may not be what I am looking for and I might just have to look at manually putting bad bots on a deny list as the hidden link in the form of the 1px graphic could potentially get my site penalized by search engines).
There is also a topic discussion on bad bot exlusion at webmasterworld forums, although I dont know much about the methods used: http://www.webmasterworld.com/forum23/1281.htm
Overall, just wondering if it will be better to ban them via cpanel or is there an effective automated method as outlined by any of the above methods that wouldn't adversely affect unflux servers or performance.
Thanks.
