Printer Friendly Version Print this thread
Email this thread to a friend eMail this thread to a friend
Featured Web Site Template

Hundreds More at Free Site Templates.com!

Web Site Partners
Sponsored Links
Jet City Software
 
Whos Here ?
Reflects user activity within the last 5 minutes
Moderator(s): g1smd, Logan
Member Message

TynesWeb
Joined: Eons Ago
# Posts: 11

View the profile for TynesWeb Send TynesWeb a private message

Posted: 2006-Jul-21 21:23
Edit Message Delete Message Reply to this message

I've submitted my robots.txt file to Google Sitemaps, but am told that it doesn't appear to be a valid file, even though Sitemaps does read it and show it. It sure looks valid to me. Can someone take a look and give me some guidance? File pasted below. I'll add the link in my profile.

User-agent: *
Disallow: /SendMail/
Disallow: /search/
Disallow: /reports/
Disallow: /NewFiles/
Disallow: /mp3/
Disallow: /macvpn/
Disallow: /EmailSignature/
Disallow: /counter/
Disallow: /cc2001/
Disallow: /cgi-bin/
Disallow: /bin/
Disallow: /aspnet_client/
Disallow: /_vti_pvt/
Disallow: /swform_text.html
Disallow: /jobsTEST.asp
Disallow: /intro.swf
Disallow: /ButtonBarParseArgs.java
Disallow: /ButtonBar.java
Disallow: /Banner3.java
Disallow: /Banner2.java
Disallow: /apcocite.css
Disallow: /pop.css
Disallow: /survey.css
Disallow: /apco.css



[ Message was edited by: TynesWeb 07/21/2006 01:56 pm ]





bhartzer
Staff
Joined: Jun 08, 2000
# Posts: 7042

View the profile for bhartzer Send bhartzer a private message

Posted: 2006-Jul-21 21:33
Edit Message Delete Message Reply to this message

I've submitted my robots.txt file to Google Analytics

You only need to use the / once for folders/directories:

User-agent: *
Disallow: /SendMail
Disallow: /search
Disallow: /reports
Disallow: /NewFiles
Disallow: /mp3
Disallow: /macvpn
Disallow: /EmailSignature
Disallow: /counter
Disallow: /cc2001
Disallow: /cgi-bin
Disallow: /bin
Disallow: /aspnet_client
Disallow: /_vti_pvt
Disallow: /swform_text.html
Disallow: /jobsTEST.asp
Disallow: /intro.swf
Disallow: /ButtonBarParseArgs.java
Disallow: /ButtonBar.java
Disallow: /Banner3.java
Disallow: /Banner2.java
Disallow: /apcocite.css
Disallow: /pop.css
Disallow: /survey.css
Disallow: /apco.css

There are also a few robots.txt validators out there that you can use, and there's a link to the robots.txt standard:
http://www.robotstxt.org/wc/robots.html



TynesWeb
Joined: Eons Ago
# Posts: 11

View the profile for TynesWeb Send TynesWeb a private message

Posted: 2006-Jul-21 21:41
Edit Message Delete Message Reply to this message

But according to http://www.robotstxt.org/wc/exclusion-admin.html, you DO use the ending "/" directories/folders. I'll try it without, though.



bhartzer
Staff
Joined: Jun 08, 2000
# Posts: 7042

View the profile for bhartzer Send bhartzer a private message

Posted: 2006-Jul-21 21:51
Edit Message Delete Message Reply to this message

You're right, it appears that you can use both.

I did find this problem, though, and it appears that there's a bug that Google knows about in their Google Sitemap robots.txt validator.

There's also talk about a problem here.



g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465

View the profile for g1smd Send g1smd a private message

Posted: 2006-Jul-21 21:57
Edit Message Delete Message Reply to this message

Disallow: /SendMail/ does not disallow /SendMail

When trying to access a folder using www.domain.com/SendMail some servers will return a DirectoryIndex page which lists all the files in that folder - and that page will be indexed by search engines, hence will show the filenames of your "secret" files to all.

Disallow: /SendMail will disallow any URL that starts with www.domain.com/SendMail whether including the trailing / or not.



TynesWeb
Joined: Eons Ago
# Posts: 11

View the profile for TynesWeb Send TynesWeb a private message

Posted: 2006-Jul-21 23:14
Edit Message Delete Message Reply to this message

g1smd,

Yes, BUT, Disallow: /SendMail/ is valid, is it not? In some instances, I may not WANT to disallow the file name, but may want to disallow the folder.

I really appreaciate the help, btw.



Dinkar
Staff
Joined: Aug 12, 2001
# Posts: 4391

View the profile for Dinkar Send Dinkar a private message

Posted: 2006-Jul-21 23:44
Edit Message Delete Message Reply to this message

All those files that you are using on server side and don't want to get indexed, put in either password protected folder or below web root. That's the only safe way to stop people accessing those files.



g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465

View the profile for g1smd Send g1smd a private message

Posted: 2006-Jul-21 23:51
Edit Message Delete Message Reply to this message

Yes it is valid, but it leaves a hole.

Disallow: /folder/ will still allow the DirectoryIndex page that lists all the files that exist in the folder, to be displayed and indexed (on some servers). It only disallows any URL that contains the / in it.

Disallow: /folder will disallow the folder, its index page, and any other files in the folder.





[ Message was edited by: g1smd 07/24/2006 12:00 pm ]





TynesWeb
Joined: Eons Ago
# Posts: 11

View the profile for TynesWeb Send TynesWeb a private message

Posted: 2006-Jul-24 14:39
Edit Message Delete Message Reply to this message

So, back to my original question...any ideas on why Google Sitemaps claims that my robots.txt file is invalid? I removed the trailing "/" and it has be loaded by GS. But still claims to be invalid.



bhartzer
Staff
Joined: Jun 08, 2000
# Posts: 7042

View the profile for bhartzer Send bhartzer a private message

Posted: 2006-Jul-24 16:43
Edit Message Delete Message Reply to this message

any ideas on why Google Sitemaps claims that my robots.txt file is invalid?

Yes. It's not your problem, just like I said earlier:

I did find this problem, though, and it appears that there's a bug that Google knows about in their Google Sitemap robots.txt validator.

There's also talk about a problem here.
.



TynesWeb
Joined: Eons Ago
# Posts: 11

View the profile for TynesWeb Send TynesWeb a private message

Posted: 2006-Jul-24 18:02
Edit Message Delete Message Reply to this message

Finally figured it out with a little more research and by using the robots.txt checker at
http://tool.motoricerca.info/robots-checker.phtml
Solution: Must be utf-8. My file was defaulting to utf-16.



g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465

View the profile for g1smd Send g1smd a private message

Posted: 2006-Jul-24 20:02
Edit Message Delete Message Reply to this message

Additionally, make sure that there are always a couple of carriage returns after the last disallow line in the file.


You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
  1. You have not yet logged in, or registered properly as a member
  2. You are a member, but no longer have posting rights.
  3. This is a private forum, for which you do not have permissions.

If you are a recent member, it's possible that you simply have not yet confirmed your account. Please check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions contained within.

If you cannot find this message, click here to Re-Send it.

If you are still experiencing problem, please read the Login Assistance Article for some advice on what may be causing your login not to work properly.

Switch to Advanced Editor and ... Create a New Topic or Reply to this Thread

New posts Forum is locked
© 1995  ·  iWeb, Inc  ·  DBA JimWorld Productions