Printer Friendly Version
Email this thread to a friend
|
Featured Web Site Template |
|
Reflects user activity within the last 5 minutes
|
|
| Member |
Message |
TynesWeb
Joined: Eons Ago
# Posts: 11
|
Posted: 2006-Jul-21 21:23
I've submitted my robots.txt file to Google Sitemaps, but am told that it doesn't appear to be a valid file, even though Sitemaps does read it and show it. It sure looks valid to me. Can someone take a look and give me some guidance? File pasted below. I'll add the link in my profile.
User-agent: *
Disallow: /SendMail/
Disallow: /search/
Disallow: /reports/
Disallow: /NewFiles/
Disallow: /mp3/
Disallow: /macvpn/
Disallow: /EmailSignature/
Disallow: /counter/
Disallow: /cc2001/
Disallow: /cgi-bin/
Disallow: /bin/
Disallow: /aspnet_client/
Disallow: /_vti_pvt/
Disallow: /swform_text.html
Disallow: /jobsTEST.asp
Disallow: /intro.swf
Disallow: /ButtonBarParseArgs.java
Disallow: /ButtonBar.java
Disallow: /Banner3.java
Disallow: /Banner2.java
Disallow: /apcocite.css
Disallow: /pop.css
Disallow: /survey.css
Disallow: /apco.css
[ Message was edited by: TynesWeb 07/21/2006 01:56 pm ]
|
 |
bhartzer
Staff
Joined: Jun 08, 2000
# Posts: 7042
|
Posted: 2006-Jul-21 21:33
I've submitted my robots.txt file to Google Analytics
You only need to use the / once for folders/directories:
User-agent: *
Disallow: /SendMail
Disallow: /search
Disallow: /reports
Disallow: /NewFiles
Disallow: /mp3
Disallow: /macvpn
Disallow: /EmailSignature
Disallow: /counter
Disallow: /cc2001
Disallow: /cgi-bin
Disallow: /bin
Disallow: /aspnet_client
Disallow: /_vti_pvt
Disallow: /swform_text.html
Disallow: /jobsTEST.asp
Disallow: /intro.swf
Disallow: /ButtonBarParseArgs.java
Disallow: /ButtonBar.java
Disallow: /Banner3.java
Disallow: /Banner2.java
Disallow: /apcocite.css
Disallow: /pop.css
Disallow: /survey.css
Disallow: /apco.css
There are also a few robots.txt validators out there that you can use, and there's a link to the robots.txt standard:
http://www.robotstxt.org/wc/robots.html
|
 |
TynesWeb
Joined: Eons Ago
# Posts: 11
|
Posted: 2006-Jul-21 21:41
But according to http://www.robotstxt.org/wc/exclusion-admin.html, you DO use the ending "/" directories/folders. I'll try it without, though.
|
 |
bhartzer
Staff
Joined: Jun 08, 2000
# Posts: 7042
|
Posted: 2006-Jul-21 21:51
You're right, it appears that you can use both.
I did find this problem, though, and it appears that there's a bug that Google knows about in their Google Sitemap robots.txt validator.
There's also talk about a problem here.
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465
|
Posted: 2006-Jul-21 21:57
Disallow: /SendMail/ does not disallow /SendMail
When trying to access a folder using www.domain.com/SendMail some servers will return a DirectoryIndex page which lists all the files in that folder - and that page will be indexed by search engines, hence will show the filenames of your "secret" files to all.
Disallow: /SendMail will disallow any URL that starts with www.domain.com/SendMail whether including the trailing / or not.
|
 |
TynesWeb
Joined: Eons Ago
# Posts: 11
|
Posted: 2006-Jul-21 23:14
g1smd,
Yes, BUT, Disallow: /SendMail/ is valid, is it not? In some instances, I may not WANT to disallow the file name, but may want to disallow the folder.
I really appreaciate the help, btw.
|
 |
Dinkar
Staff
Joined: Aug 12, 2001
# Posts: 4391
|
Posted: 2006-Jul-21 23:44
All those files that you are using on server side and don't want to get indexed, put in either password protected folder or below web root. That's the only safe way to stop people accessing those files.
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465
|
Posted: 2006-Jul-21 23:51
Yes it is valid, but it leaves a hole.
Disallow: /folder/ will still allow the DirectoryIndex page that lists all the files that exist in the folder, to be displayed and indexed (on some servers). It only disallows any URL that contains the / in it.
Disallow: /folder will disallow the folder, its index page, and any other files in the folder.
[ Message was edited by: g1smd 07/24/2006 12:00 pm ]
|
 |
TynesWeb
Joined: Eons Ago
# Posts: 11
|
Posted: 2006-Jul-24 14:39
So, back to my original question...any ideas on why Google Sitemaps claims that my robots.txt file is invalid? I removed the trailing "/" and it has be loaded by GS. But still claims to be invalid.
|
 |
bhartzer
Staff
Joined: Jun 08, 2000
# Posts: 7042
|
Posted: 2006-Jul-24 16:43
any ideas on why Google Sitemaps claims that my robots.txt file is invalid?
Yes. It's not your problem, just like I said earlier:
I did find this problem, though, and it appears that there's a bug that Google knows about in their Google Sitemap robots.txt validator.
There's also talk about a problem here. .
|
 |
TynesWeb
Joined: Eons Ago
# Posts: 11
|
Posted: 2006-Jul-24 18:02
Finally figured it out with a little more research and by using the robots.txt checker at
http://tool.motoricerca.info/robots-checker.phtml
Solution: Must be utf-8. My file was defaulting to utf-16.
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465
|
Posted: 2006-Jul-24 20:02
Additionally, make sure that there are always a couple of carriage returns after the last disallow line in the file.
|
 |
You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
- You have not yet logged in, or registered properly as a member
- You are a member, but no longer have posting rights.
- This is a private forum, for which you do not have permissions.
If you are a recent member, it's possible that you simply have not yet confirmed your account. Please
check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions
contained within.
If you cannot find this message, click here to Re-Send it.
|
If you are still experiencing problem, please read the
Login Assistance
Article for some advice on what may be causing your login not to work properly.
|
Switch to Advanced Editor and ...
Create a New Topic
or Reply to this Thread
|
|