JimWorld Forums: Robots Exclusion List



Posted By: harjitsingh ()
Posted On: 10/21/2005 11:16 pm

Hello there,

In Robots.txt which is generally uploaded to root folder www.mydomain.com/robots.txt
How can one exclude certain directories(/images or /admin etc.) and not let anybody see which directories are excluded when reading robots.txt file. It might sound funny, but I wanted to know , whether some other methods exist or not.

Thanks
HarRy


Posted By: lizardz ()
Posted On: 10/22/2005 03:42 pm

Simple, you can't.

It's a text file, it sits there, search bots request it, they read it.

Theoretically, you could generate it dynamically based on a check of the requesting ip range or something, then only serve up the valid version to search bots, but there's no guarantee your pages wouldn't get spidered in that case since if the bot come in off another ip range you didn't have listed, it would get the robots.txt without the blocks.

So the practical answer is, if you don't want anyone to be able to see a blocked part of your site, just don't let them in, don't link to it from the main site, that's how I do it when I don't want a part of my site indexed at all.


Posted By: harjitsingh ()
Posted On: 10/23/2005 09:55 pm

Thank you for your suggestion.

Since the site is dynamic one and maintained by CMS, is there a possibility that robots will trace it looking at the backlinks to the .htm or .php pages.

Thanks HarRy


Posted By: lizardz ()
Posted On: 10/24/2005 12:44 pm

" is there a possibility that robots will trace it looking at the backlinks to the .htm or .php pages"

If I understood this question I might be able to answer it. However, in general, if you can't do programming, and you are running a cms, then what you get is what you get, you can't change it. If you can do programming, and can change components, then you can get anything you want, within reason of course.

Robots will follow any link to any page not blocked in robots.txt, so if a link exists and is not blocked, then the robot will at some point follow it.


Posted By: harjitsingh ()
Posted On: 10/26/2005 02:10 am

I was concerned about the exclusion list because only homepage www.mydomain.com was cached and not the inside pages, which are linked to it.

I have index,follow for the robots meta tag, but still inside pages are not getting crawled or cached.

also when you check for links to the website, it should show the inside pages, but it's not showing it.

can I get some help /guidance on this

thanks
harRy


Posted By: Logan (Moderator)
Posted On: 11/07/2005 06:15 am

Hi harRy, I don't think the robots.txt is a factor based on your comments. There are many other reason internal pages may not being indexed. The two most common I can think of are ..

1) Lack of link/popularity to the url
2) A url with multiple parameters (i.e. mypage.php?x=1&name=product&category=1234&anotherparameter=sfruokcn

Tough to say without reviewing, can you referenc the site w/i your profile for those interested in helping?


Posted By: harjitsingh ()
Posted On: 11/08/2005 04:54 am

Here is the website I am talking about
((url removed--put in profile only))

[ Message was edited by: bhartzer 11/25/2005 01:09 pm ]




JimWorld Forums © 1996 - 2004 .... iWeb Technology, Jimworld.com