elbst23pitt
Joined: Mar 14, 2005
# Posts: 2
|
Posted: 03/14/2005 11:58 pm
I recently got this message from my dedicated web host:
You'll find that Google puts about 300 hits a week in your logfile, Yahoo puts about 32,000 hits, and MSN puts about 120,000 hits on it. I'm using the samples below that I place in /tmp/yaho
Basically, the major engines are sending bots to review my sites and it is causing my internal hit tracker to be way off.
I want to stop these bots from coming into my sites, but will that negatively affect my search engine rankings? is there anyway for them to stop coming and have my rankings not be affected?
finally, how can i allow them to index my site without my internal tracker counting it as a visit.
|
 |
g1smd
Moderator
Joined: Jul 28, 2002
# Posts: 10058
|
Posted: 03/15/2005 01:58 pm
If the bots can't visit your site then they will not index it.
If they don't index it, then they will not include you in their search results.
You should ban other bots that are email scrapers and any others with malicious intent.
|
 |
elbst23pitt
Joined: Mar 14, 2005
# Posts: 2
|
Posted: 03/16/2005 09:11 pm
how do I block bots with malicious intent?
|
 |
g1smd
Moderator
Joined: Jul 28, 2002
# Posts: 10058
|
Posted: 03/18/2005 12:07 pm
You need to add their user-agent and your suggested permissions to your robots.txt file in the root web folder of your site.
|
 |
yellowwing
Moderator
Joined: May 21, 2002
# Posts: 2524
|
Posted: 03/20/2005 09:37 am
Isn't there some kind of server code to indicate that the page content has not changed since the last visit?
That would cut down on the robot band width.
|
 |
yellowwing
Moderator
Joined: May 21, 2002
# Posts: 2524
|
Posted: 03/20/2005 09:56 am
I found this in the W3.org site.
"304 Not Modified
If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code"
Can you ask your hosting company to implement this?
|
 |
g1smd
Moderator
Joined: Jul 28, 2002
# Posts: 10058
|
Posted: 03/20/2005 09:58 am
"If Modified Since...."
|
 |
Dinkar
Moderator
Joined: Aug 12, 2001
# Posts: 4268
|
Posted: 03/20/2005 10:40 am
If Yahoo and MSN are hitting too much then you can slow down them by using 'crawl-delay' in robots.txt
Example:
This will tell MSN to wait for 10 seconds before quering for next document.
|
 |
Dinkar
Moderator
Joined: Aug 12, 2001
# Posts: 4268
|
Posted: 03/20/2005 10:49 am
how do I block bots with malicious intent?
You have to use .htaccess file. I don't know much about it but have the following code:
Add the code in your .htaccess file and replace {ADD USER AGENT HERE} with the name of malicious user agent name. You need to repeat the code for every user agent.
Examples:
SetEnvIfNoCase User-Agent "indy library" keep_out
SetEnvIfNoCase User-Agent "missigua locator" keep_out
SetEnvIfNoCase User-Agent "FndLnk" keep_out
[ Message was edited by: Dinkar 03/20/2005 08:21 pm ]
|
 |
jsrobinson
Joined: Dec 18, 2004
# Posts: 29
|
Posted: 03/21/2005 05:01 pm
I think the problem needs to be looked at from a different perspective: why isn't the web log reporting tool taking the bots into account and removing their hits from the usage stats?
I specifically rewrote a significant portion of my web reporting tool specifically to do this, because I did not want to "limit" SE's access to sites I host/run. User-Agent is easily found in logs, and easily accessable from code (PHP/ASP) so this really should not be a huge technological issue for anyone (but then again, I don't know your situation...).
|
 |