Posted By: elbst23pitt ()
Posted On: 03/14/2005 11:58 pm
|
I recently got this message from my dedicated web host:
You'll find that Google puts about 300 hits a week in your logfile, Yahoo puts about 32,000 hits, and MSN puts about 120,000 hits on it. I'm using the samples below that I place in /tmp/yaho
Basically, the major engines are sending bots to review my sites and it is causing my internal hit tracker to be way off.
I want to stop these bots from coming into my sites, but will that negatively affect my search engine rankings? is there anyway for them to stop coming and have my rankings not be affected?
finally, how can i allow them to index my site without my internal tracker counting it as a visit.
|
|
Posted By: g1smd (Staff)
Posted On: 03/15/2005 01:58 pm
|
If the bots can't visit your site then they will not index it.
If they don't index it, then they will not include you in their search results.
You should ban other bots that are email scrapers and any others with malicious intent.
|
|
Posted By: elbst23pitt ()
Posted On: 03/16/2005 09:11 pm
|
how do I block bots with malicious intent?
|
|
Posted By: g1smd (Staff)
Posted On: 03/18/2005 12:07 pm
|
You need to add their user-agent and your suggested permissions to your robots.txt file in the root web folder of your site.
|
|
Posted By: yellowwing ()
Posted On: 03/20/2005 09:37 am
|
Isn't there some kind of server code to indicate that the page content has not changed since the last visit?
That would cut down on the robot band width.
|
|
Posted By: yellowwing ()
Posted On: 03/20/2005 09:56 am
|
I found this in the W3.org site.
"304 Not Modified
If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code"
Can you ask your hosting company to implement this?
|
|
Posted By: g1smd (Staff)
Posted On: 03/20/2005 09:58 am
|
"If Modified Since...."
|
|
Posted By: Dinkar (Staff)
Posted On: 03/20/2005 10:40 am
|
If Yahoo and MSN are hitting too much then you can slow down them by using 'crawl-delay' in robots.txt
Example:
This will tell MSN to wait for 10 seconds before quering for next document.
|
|
Posted By: Dinkar (Staff)
Posted On: 03/20/2005 10:49 am
|
how do I block bots with malicious intent?
You have to use .htaccess file. I don't know much about it but have the following code:
Add the code in your .htaccess file and replace {ADD USER AGENT HERE} with the name of malicious user agent name. You need to repeat the code for every user agent.
Examples:
SetEnvIfNoCase User-Agent "indy library" keep_out
SetEnvIfNoCase User-Agent "missigua locator" keep_out
SetEnvIfNoCase User-Agent "FndLnk" keep_out
[ Message was edited by: Dinkar 03/20/2005 08:21 pm ]
|
|
Posted By: jsrobinson ()
Posted On: 03/21/2005 05:01 pm
|
I think the problem needs to be looked at from a different perspective: why isn't the web log reporting tool taking the bots into account and removing their hits from the usage stats?
I specifically rewrote a significant portion of my web reporting tool specifically to do this, because I did not want to "limit" SE's access to sites I host/run. User-Agent is easily found in logs, and easily accessable from code (PHP/ASP) so this really should not be a huge technological issue for anyone (but then again, I don't know your situation...).
|
|
|