Printer Friendly Version
Email this thread to a friend
|
Direct links vs Relative Links (In: Google)
Featured Web Site Template |
|
Reflects user activity within the last 5 minutes
|
|
| Member |
Message |
alonk
Joined: Mar 15, 2006
# Posts: 17
|
Posted: 2006-Mar-15 21:11
I've been discovering cached pages on Google from pages that I removed from my site one or two years ago...
I recently did a Remove URL and updated my robots.txt to reflect this - waiting to see if "pending" will ever become "processed."
Also - very weird - I've found cached pages from a long dead affiliate program:
www.mysite.com/index.shtml?merchantbob
or...
www.mysite.com/index.shtml?merchantjane
Where Google still has my two year-old index page cached, by when I do a duplicate content check with "www.mysite.com" and "www.mysite.com/index.shtml?merchantbob" it return a 100% duplicate content (even though the current index.shtml page is only 11% the same!)
My question: Can I rely on Google's Remove URL, my new robot.txt & mod_rewrite 301 code to vanquish these old caches or is there something I'm missing?
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465
|
Posted: 2006-Mar-15 23:54
The 301 redirect will get the correct version (www) to be better indexed within a month or so (but do make sure that every page of the site has a unique title and meta description).
The non-www pages will take a lot longer to drop out. If they are supplemental results, then they will take at least three years to disappear. That is a bug with Google.
For the pages that no longer exist, the Removal Tool will "hide" the pages for 3 or 6 months and then they will be put back into the index even though they still do not exist. That is another Google Bug.
If Google ever take an interest in fixing their bugs, then the usage of the 301 redirect and the robots.txt information is exactly what you need on your site to help them fix the problems.
|
 |
alonk
Joined: Mar 15, 2006
# Posts: 17
|
Posted: 2006-Mar-16 00:05
Thank you for clarifying that... only, does that mean I will keep being penalized for pseudo "duplicate" content like those old dead links or the non-www pages.
Three years sounds scacy.
Also, I thought that <title> and <meta tags> were no longer part of Google's ranking algo... I'm hearing a lot of contradictory info.
How different does each title tag have to be? 3, 4, 5 words... the whole thing?
(I plan on going through and revamping the site this month.)
Thanks for your expertise.
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465
|
Posted: 2006-Mar-16 00:08
The supplemental results aren't harming you, they are just a pain in the neck to be showing old content in the results. Make sure you have a custom 404 error page to catch any visitors clicking those links. Give them a page of useful navigation to get them on their way to the right place.
The title tag and meta description are very important. They need to be different on every page.
|
 |
alonk
Joined: Mar 15, 2006
# Posts: 17
|
Posted: 2006-Mar-16 00:15
"The supplemental results aren't harming you, they are just a pain in the neck to be showing old content in the results. "
...good. I can sleep tonight (as soon as I rewrite all my title tags.)
I was having trouble with my custom redirect.shtml page returning a 200 to Google - of course, my hosting service had no clue and would not help me. Had to revert to a "file not found" to avoid annoying the spiders (and giving the impression that I was automatic redirects to my homesite.)
Right now, that's the least of my troubles... but thanks for the reminder.
|
 |
alonk
Joined: Mar 15, 2006
# Posts: 17
|
Posted: 2006-Apr-30 14:06
Amazingly, 6 weeks later and Google has yet to remove the dead links!
(However, it refuses to index my current index page!)
Instead of removing www.mysite.com/deadlink?referrer
google tried to remove:
www.mysite.com/www.mysite.com/deadlink?referrer
So there's a cache of a page from Nov 2004 but no cache of my index page from today!
It's laughable when google says, "dead links will be removed at the next crawl"
Nov 2004 is a long time ago....
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465
|
Posted: 2006-Apr-30 21:48
I see stuff going back to 2004 January all over the place.
Google has no clue how to fix it. That much is very obvious.
Yes, a Custom 404 Error Page is a Good Thing. Make it happen.
|
 |
alonk
Joined: Mar 15, 2006
# Posts: 17
|
Posted: 2006-Apr-30 22:06
Why can't Google teach their bots to do simple reasoning like:
Googlebot: Let's pull up mysite.com/thislinkhasbeendeadforyears... hmm.. it looks like that link is giving me a 404. hmm... a page that can't be found? Should I repeatedly index it for the next ten years? No! I know! Maybe I should erase the url and cache from my datacenters! I'm a genius!
;-)
Google acts like it has no control over it's own datacenters. I realize that there are billions of pages, but a dead link is a dead link is a 404.
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10465
|
Posted: 2006-Apr-30 22:10
Yes, but to keep their index "big" they keep the URL as a Supplemental Result, and continue to show a two year old cache for that page.
Nice work when the company has changed telephone number, address, and prices for all their products...
|
 |
You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
- You have not yet logged in, or registered properly as a member
- You are a member, but no longer have posting rights.
- This is a private forum, for which you do not have permissions.
If you are a recent member, it's possible that you simply have not yet confirmed your account. Please
check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions
contained within.
If you cannot find this message, click here to Re-Send it.
|
If you are still experiencing problem, please read the
Login Assistance
Article for some advice on what may be causing your login not to work properly.
|
Switch to Advanced Editor and ...
Create a New Topic
or Reply to this Thread
|
|