PDA

View Full Version : Mambo Creating MASSIVE hits ?


johnypneumo
September 2nd, 2004, 09:10
hello,
recently ive moved my Mambo site to another server,before ,on my last server i was getting about 10-15 hits a day ,....
i moved to My new server about 10 days back now and its much faster and smoother - thing is in 10 days ive had nearly 26,000 hits and theres regulary 400 guests online ! :( i know my site isnt That popular yet ?
Has this cropped up before ? anywhere else ?
concerned. ;)

SvenErik
September 2nd, 2004, 09:16
It probably means that some search engines are indexing your site, and this MO sounds a lot like Google. I suggest you install the TFSforMAMBO (http://mamboforge.net/projects/tfsformambo/) component so that you can see who all these visitors are.

johnypneumo
September 2nd, 2004, 09:30
cheers sven i will ! ;) ive isolated the culprit down via "urchin" a Robot called "msnbot" and its responsible for 97.5% of my hits,. is there anyway to block robots ? i thought mambo did this.

grutkowski
September 2nd, 2004, 09:41
You can block bots using your robots.txt file.

http://www.clockwatchers.com/robots_main.html

Decent tutorial and a list of known bots - I don't suggest blocking them all, just the ones being pests.

SvenErik
September 2nd, 2004, 09:44
msnbot is the indexer-bot for Microsoft Networks (http://www.msn.com/).

johnypneumo
September 2nd, 2004, 09:46
You can block bots using your robots.txt file.

http://www.clockwatchers.com/robots_main.html

Decent tutorial and a list of known bots - I don't suggest blocking them all, just the ones being pests.
ill go with that Grutowski ;) ill have a good read.

johnypneumo
September 2nd, 2004, 09:47
msnbot is the indexer-bot for Microsoft Networks (http://www.msn.com/).
you recon its a bad idea to block msn Sven ? im not sure ,after all Msn ? take a look now, theres 254 visitors on me site :( and its all msnbot ,lol http://trancecreator.com/Tcv4/index.php
ill try blocking it.

SvenErik
September 2nd, 2004, 09:51
If it affects the running of your site and slow it down, or increase your bill for use of bandwith, then go ahead and block it. But doing so to the major search engines will limit the real hits you will get since people can't search for your content.

johnypneumo
September 2nd, 2004, 10:05
this code snippet seems to be helping ;)
who`s online edit (http://www.tim-online.nl/index.php/en/component/option,com_docman/task,view_category/Itemid,78/subcat,4/catid,24/limitstart,0/limit,5/) ill watch how it go`s tho ;)

SvenErik
September 2nd, 2004, 10:12
I started using TFSforMambo at the end of february this year, and here is the Bots/Spiders stats up til now:

9992 70.1% Googlebot (Google)
2175 15.3% Inktomi Slurp
568 4% msnbot
520 3.6% WISENutbot (Looksmart)
337 2.4% Bumblebee (relevare.com)
171 1.2% Fast-Webcrawler (AllTheWeb)
89 0.6% Jeeves
86 0.6% Netcraft Web Server Survey
70 0.5% GigaBot
53 0.4% naverbot
40 0.3% Nutch
37 0.3% Scooter (AltaVista)
31 0.2% BaiDuSpider
23 0.2% IBM_Planetwide
16 0.1% Alexa (IA Archiver)
13 0.1% Unknown robot (identified by spider)
10 0.1% Motor
6 0% Turn It In
5 0% Pompos
5 0% Unknown robot (identified by crawl)
3 0% LinkWalker
2 0% bordermanager
2 0% larbin
2 0% Robozilla
2 0% Unknown robot (identified by robot)
1 0% Voyager
1 0% vspider

And the stats for searches leading to my site: 571 60.5% Google
317 33.6% Kvasir
21 2.2% Msn
19 2% Altavista
12 1.3% Yahoo
4 0.4% Alltheweb Kvasir (http://www.kvasir.no/) is a Norwegian search-engine that is powered by Google, so you could say that searches on Google stands for 94.1% of the search hits...

One thing to remember with excluding bots/spiders in the robots.txt file, they don't have to follow the rules in it... Bad behaved bots ignore it completely!

A solution for that is to ban the bad bots in your .htaccess file. Here is what I have in mine:
#
# Block Site Rippers, bad bots and e-mail harvesters
#
SetEnvIfNoCase User-Agent "^Alexibot" bad_bot
SetEnvIfNoCase User-Agent "^Aqua_Products" bad_bot
SetEnvIfNoCase User-Agent "^asterias" bad_bot
SetEnvIfNoCase User-Agent "^b2w" bad_bot
SetEnvIfNoCase User-Agent "^BackDoorBot" bad_bot
SetEnvIfNoCase User-Agent "^BlowFish" bad_bot
SetEnvIfNoCase User-Agent "^Bookmark search tool" bad_bot
SetEnvIfNoCase User-Agent "^BotALot" bad_bot
SetEnvIfNoCase User-Agent "^BuiltBotTough" bad_bot
SetEnvIfNoCase User-Agent "^Bullseye" bad_bot
SetEnvIfNoCase User-Agent "^BunnySlippers" bad_bot
SetEnvIfNoCase User-Agent "^CheeseBot" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^Copernic" bad_bot
SetEnvIfNoCase User-Agent "^CopyRightCheck" bad_bot
SetEnvIfNoCase User-Agent "^cosmos" bad_bot
SetEnvIfNoCase User-Agent "^Crescent" bad_bot
SetEnvIfNoCase User-Agent "^DittoSpyder" bad_bot
SetEnvIfNoCase User-Agent "^dumbot" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^Enterprise_Search" bad_bot
SetEnvIfNoCase User-Agent "^EroCrawler" bad_bot
SetEnvIfNoCase User-Agent "^es" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^FairAd Client" bad_bot
SetEnvIfNoCase User-Agent "^Flaming AttackBot" bad_bot
SetEnvIfNoCase User-Agent "^Foobot" bad_bot
SetEnvIfNoCase User-Agent "^Gaisbot" bad_bot
SetEnvIfNoCase User-Agent "^GetRight" bad_bot
SetEnvIfNoCase User-Agent "^grub" bad_bot
SetEnvIfNoCase User-Agent "^Harvest" bad_bot
SetEnvIfNoCase User-Agent "^Hatena Antenna" bad_bot
SetEnvIfNoCase User-Agent "^hloader" bad_bot
SetEnvIfNoCase User-Agent "^httplib" bad_bot
SetEnvIfNoCase User-Agent "^HTTrack" bad_bot
SetEnvIfNoCase User-Agent "^humanlinks" bad_bot
SetEnvIfNoCase User-Agent "^ia_archiver" bad_bot
SetEnvIfNoCase User-Agent "^InfoNaviRobot" bad_bot
SetEnvIfNoCase User-Agent "^Iron33" bad_bot
SetEnvIfNoCase User-Agent "^JennyBot" bad_bot
SetEnvIfNoCase User-Agent "^Kenjin Spider" bad_bot
SetEnvIfNoCase User-Agent "^Keyword Density" bad_bot
SetEnvIfNoCase User-Agent "^larbin" bad_bot
SetEnvIfNoCase User-Agent "^LexiBot" bad_bot
SetEnvIfNoCase User-Agent "^libWeb/clsHTTP" bad_bot
SetEnvIfNoCase User-Agent "^LinkextractorPro" bad_bot
SetEnvIfNoCase User-Agent "^LinkScan" bad_bot
SetEnvIfNoCase User-Agent "^LinkWalker" bad_bot
SetEnvIfNoCase User-Agent "^LNSpiderguy" bad_bot
SetEnvIfNoCase User-Agent "^looksmart" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "^Mata Hari" bad_bot
SetEnvIfNoCase User-Agent "^Microsoft URL Control" bad_bot
SetEnvIfNoCase User-Agent "^MIIxpc" bad_bot
SetEnvIfNoCase User-Agent "^Mister PiX" bad_bot
SetEnvIfNoCase User-Agent "^moget" bad_bot
SetEnvIfNoCase User-Agent "^MSIECrawler" bad_bot
SetEnvIfNoCase User-Agent "^naver" bad_bot
SetEnvIfNoCase User-Agent "^NetAnts" bad_bot
SetEnvIfNoCase User-Agent "^NetMechanic" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Offline Explorer" bad_bot
SetEnvIfNoCase User-Agent "^Openbot" bad_bot
SetEnvIfNoCase User-Agent "^Openfind" bad_bot
SetEnvIfNoCase User-Agent "^Oracle Ultra Search" bad_bot
SetEnvIfNoCase User-Agent "^PerMan" bad_bot
SetEnvIfNoCase User-Agent "^ProPowerBot" bad_bot
SetEnvIfNoCase User-Agent "^ProWebWalker" bad_bot
SetEnvIfNoCase User-Agent "^psbot" bad_bot
SetEnvIfNoCase User-Agent "^Python-urllib" bad_bot
SetEnvIfNoCase User-Agent "^QueryN Metasearch" bad_bot
SetEnvIfNoCase User-Agent "^Radiation" bad_bot
SetEnvIfNoCase User-Agent "^RepoMonkey" bad_bot
SetEnvIfNoCase User-Agent "^RMA" bad_bot
SetEnvIfNoCase User-Agent "^scooter" bad_bot
SetEnvIfNoCase User-Agent "^searchpreview" bad_bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bad_bot
SetEnvIfNoCase User-Agent "^sootle" bad_bot
SetEnvIfNoCase User-Agent "^SpankBot" bad_bot
SetEnvIfNoCase User-Agent "^spanner" bad_bot
SetEnvIfNoCase User-Agent "^Stanford" bad_bot
SetEnvIfNoCase User-Agent "^suzuran" bad_bot
SetEnvIfNoCase User-Agent "^Szukacz/1.4 " bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^Telesoft" bad_bot
SetEnvIfNoCase User-Agent "^The Intraformant" bad_bot
SetEnvIfNoCase User-Agent "^TheNomad" bad_bot
SetEnvIfNoCase User-Agent "^toCrawl/UrlDispatcher" bad_bot
SetEnvIfNoCase User-Agent "^True_Robot" bad_bot
SetEnvIfNoCase User-Agent "^turingos" bad_bot
SetEnvIfNoCase User-Agent "^URL Control" bad_bot
SetEnvIfNoCase User-Agent "^URL_Spider_Pro" bad_bot
SetEnvIfNoCase User-Agent "^URLy Warning" bad_bot
SetEnvIfNoCase User-Agent "^VCI" bad_bot
SetEnvIfNoCase User-Agent "^Web Image Collector" bad_bot
SetEnvIfNoCase User-Agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit" bad_bot
SetEnvIfNoCase User-Agent "^WebCopier" bad_bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bad_bot
SetEnvIfNoCase User-Agent "^WebmasterWorld" bad_bot
SetEnvIfNoCase User-Agent "^WebSauger" bad_bot
SetEnvIfNoCase User-Agent "^Website Quester" bad_bot
SetEnvIfNoCase User-Agent "^Webster Pro" bad_bot
SetEnvIfNoCase User-Agent "^WebStripper" bad_bot
SetEnvIfNoCase User-Agent "^WebVac" bad_bot
SetEnvIfNoCase User-Agent "^WebZip" bad_bot
SetEnvIfNoCase User-Agent "^Wget" bad_bot
SetEnvIfNoCase User-Agent "^WWW-Collector-E" bad_bot
SetEnvIfNoCase User-Agent "^Xenu's" bad_bot
SetEnvIfNoCase User-Agent "^Zeus" bad_bot

<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

keliix06
September 2nd, 2004, 10:17
When I first opened my site and it first started to get indexed, msnbot freaked out on it, several thousand hits a day. Since then it has gone back down to a normal level. Just give it some time and it should correct itself.

Websmurf
September 2nd, 2004, 11:34
but still the who is online module shows a lot of online visitors when you browse your site when you've disabled cookies :D

just like spiders would browse it :P