Printer Friendly Version Print this thread
Email this thread to a friend eMail this thread to a friend
Featured Web Site Template

Hundreds More at Free Site Templates.com!

Web Site Partners
Sponsored Links
Jet City Software
 
Whos Here ?
Reflects user activity within the last 5 minutes
Moderator(s): g1smd, Logan
Member Message

Everyman
Joined: Apr 15, 2001
# Posts: 147

View the profile for Everyman Send Everyman a private message

Posted: 2004-Aug-21 21:51
Edit Message Delete Message Reply to this message

The "URL only" type of listing has been a problem on large sites for the last 18 months. A typical large site (30,000 pages or more) will find that perhaps half or more of the total number of pages Google claims are indexed, are really only listed in this manner.

If you use site:www.mysite.com you get a number from Google.

But if you have one word that is always on every page of your site and is not in the URL, you should use this more restrictive form of the command:

site:www.mysite.com keyword

If you have a large site, you can also try this:

site:www.mysite.com -keyword

That will show you the pages from your site that are listed with the URL only.

It's a big problem for a lot of big sites, because even if you have fashioned your filenames to hit on your keyword, the fact that the URL only is listed by Google means you cannot rank well for that page. You have no title, no headlines, no real "juice" behind your keyword.

I believe that Google ran out of docIDs 18 months ago due to the 4-byte integer problem. In order for a page to get indexed these days, another page has to get dropped from the index. That's why a lot of pages show up as URL only -- they are waiting for a docID number. Over the last 18 months, I've seen the same pages alternate several times between a URL listing and an indexed entry. I don't think it has anything to do with the site, or with the content of the page.

What it has to do with is that Google has placed a very low priority on fixing the main index. Ad revenue has been in the driver's seat at the Googleplex for the last 18 months.




Everyman
Joined: Apr 15, 2001
# Posts: 147

View the profile for Everyman Send Everyman a private message

Posted: 2004-Aug-22 02:08
Edit Message Delete Message Reply to this message

Search on Google or Yahoo for:
google docid

The first link gives background on the docID theory.

The notion that Google doesn't like the page seems mistaken. I've seen thousands of pages from my big site go in and out of the URL-only status every few months. There is no relation to the content of the page whatsoever. Currently I have about 55,000 pages indexed, another 30,000 with just the URL listed, and about 40,000 more that don't appear in any form. When Google had their big blowout in April 2003 it hit me hard, because all of a sudden only 35,000 pages were fully indexed, and the remainder that had been indexed the month before were now only URL listings. This was the first time I noticed this. I wasn't the only one. Many other webmasters, particularly from big sites, reported the same thimg.

It has always been a problem getting Google to crawl and index a big site. But April 2003 was the first time I noticed the URL-only phenomenon.




wereworm
Joined: Jan 17, 2004
# Posts: 27

View the profile for wereworm Send wereworm a private message

Posted: 2004-Aug-22 05:08
Edit Message Delete Message Reply to this message

That's a scary theory, actually. If it's true, then whenever new spammy pages are getting indexed, they're causing Google to knock legitimate pages off into "URL-only" status, leaving them to be ranked poorly because their content isn't being fully recognized. Also, since spammy pages reproduce so quickly, they'd dominate the top results.



arlindo_correia
Joined: Aug 20, 2004
# Posts: 8

View the profile for arlindo_correia Send arlindo_correia a private message

Posted: 2004-Aug-22 08:41
Edit Message Delete Message Reply to this message

Hypothesis: Google leaves with [URL only] the pages with less visits in a period of time, without taking in consideration neither the PR of the page or the PR of the pages linking to it.



doctormd23
Joined: Eons Ago
# Posts: 98

View the profile for doctormd23 Send doctormd23 a private message

Posted: 2004-Aug-26 15:08
Edit Message Delete Message Reply to this message

Interesting thread... so I looked on the Google site (Webmaster Section) and found this answer related to the discussion of why URLs only appear in the results. Answer was found on question # 6 on this page: http://www.google.com/webmasters/faq.html#delay

Here it is though...
"6. Where is my page's title?

Unlike many search engines, Googlebot can return results for pages that are known but haven't been crawled yet. Since we haven't looked at those pages yet, their titles aren't shown; the Google results page displays the URL instead."

This may simply be a basic response to a more serious problem as mentioned by Everyman in this thread.

Doc



arlindo_correia
Joined: Aug 20, 2004
# Posts: 8

View the profile for arlindo_correia Send arlindo_correia a private message

Posted: 2004-Aug-26 15:55
Edit Message Delete Message Reply to this message

No! Because pages which now show only the URL without cache, have had already the CACHE for some time.
And when the “activity” of searchers for some page increases, the CACHE returns…




mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Aug-26 18:22
Edit Message Delete Message Reply to this message

Last year one of my sites had around 300 products indexed in the top 10 and they were all dynamic .asp pages. They had dynamic meta and did very well until around April/Nov 2003 hiltop algo. They went missing in action on google around then. Now only 1% of the 300 rank and most of the 300 are

www.domainname.com/properties.asp?someid=1975
Similar pages

At one time these pages did VERY well now they bring little or no triffic. They do well in Yahoo but nothing close to what traffic Google can bring.

I have a new site 3 months old with 9000 pages indexed and I had not see this "Simlar pages" issue at all until this week. Now it seems to be running though my site like a virus. Stats starting to show the decline.

What I have found is my asp pages get hit bad but my html page are not. My html page are very simlar in content but do not get hit. I figure google is trying to stop products being spammed with dynamic page. My html on this site are created from a database. The database writes hundreds of html pages to the hard drive. I have done this to see if I can get around this problem and it seems to be working so far. Html rank better than dynamic pages and I am happy with the results. What ever I do I can not get my dynamic asp pages to rank products so I am doing the html FOR NOW.




[ Message was edited by: mteasdal 08/26/2004 12:15 pm ]





mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Aug-26 20:58
Edit Message Delete Message Reply to this message

After more research on other sites I see html pages are having the same problem if the layouts are the same. Eeekk...

Is this a dup layout issue? Different content but same layout? hmmm Build dynamic layout LOL Hmmm



mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Aug-26 21:27
Edit Message Delete Message Reply to this message

Check out ebay and their problems with this. To narrow down the 509,000 results I have an example that just goes after camping with 4,070 results. In Google's Result Pages click on next 10 then almost all after that have this issue.

site:http://listings.ebay.com inurl:camping

Then when I get to the end I click on "repeat the search with the omitted results included." and start again. I find "Results 261 - 270 of about 4,010" have description and cache and all the rest have the "Similar pages" Issue.

Is Google only indexing a percentage?
Is Google looking for a big difference in content or layout?

What ever it is I need to find out. Back to researching this...





mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Aug-26 23:01
Edit Message Delete Message Reply to this message

I think I found a work around that seems to be working for this site.
site:subletinthecity.com inurl:property

I figure Google is not liking one page dishing up may different results unless a large percent of the page has different content. This site does something different.

This site dishes up default pages like this
/property/1255
/property/1254
/property/1253

Each path is different so it does not get the "Simlar pages" Issue. This is the same layout as my site so I think I will give it a try. Currently my site has 230 pages with "Simlar pages" and only 10 index correctly with cache. I hope this work around will work.





mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Aug-27 08:58
Edit Message Delete Message Reply to this message

After a day of researching this and a site recoding I am ready for bed. This may not be the best way but I always say doing something is better than doing nothing. I will let you know if I get better results.



Everyman
Joined: Apr 15, 2001
# Posts: 147

View the profile for Everyman Send Everyman a private message

Posted: 2004-Aug-30 03:56
Edit Message Delete Message Reply to this message

On the site in my profile I added an essay today about this problem. It's called "Google is dying: Death by a billion cuts." Look for the "Google is dying" link on the home page. It's time to publicize this issue, and see if Google will put some additional resources into their main index in response to the publicity. Another question I have, which I don't go into in that essay, is "Why did Google start that so-called "Supplemental Index" about one year ago? I don't see the logic behind that at all, unless it's a symptom of the same docID problem.




mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Aug-30 04:27
Edit Message Delete Message Reply to this message

Everyman - Thanks for the read. Google has issues!!! I still have hope for my site recode but much less the more I read. *GRIN* Googlebot hit the new pages today so I hope to see results soon.




arlindo_correia
Joined: Aug 20, 2004
# Posts: 8

View the profile for arlindo_correia Send arlindo_correia a private message

Posted: 2004-Sep-11 12:36
Edit Message Delete Message Reply to this message

I have two pages with indexes in my site, linking to almost all of the pages of the site. Both indexes had PR 2. When, some wekks ago, one of the indexes changed to PR 3, all the pages linked were upgraded, appearing in the Google with CACHE...



mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Sep-11 18:48
Edit Message Delete Message Reply to this message

arlindo_correia - I am still trying to figure this out but this is what I feel happened to your site.

PR is passed from the main page to the other pages. Something like this...

MainPage has a PR3 it passes the PR to the other pages on your site and the deeper the pages the less PR they get. If your MainPage has a PR 3 it passes a PR3or2 to other links on the page. Now from your main page you link to a site map page that has a pr3 that pr3 is passed to the links on that page. Making your site map page have a pr of something like 2. NOW the pages are all showing the "Simlar Page" with no cache so no PR is passed to them. This means your site map page would not lose any PR. Now your Simlar Pages do not rank well or at all. The reason for this is that the same news article you have cut and paste is out on the web already. Thats what I figure.





doctormd23
Joined: Eons Ago
# Posts: 98

View the profile for doctormd23 Send doctormd23 a private message

Posted: 2004-Sep-11 20:25
Edit Message Delete Message Reply to this message

Here's the skinny on this topic directly from the Google web site here:
http://www.google.com/webmasters/3.html

They say:
"4. There's no description of my site.

The Google index contains two types of pages--fully indexed and partially indexed pages. Your page is currently partially indexed, which means that although we know about your site, our robots have not read all the content on your page(s) in past crawls. This does not adversely affect your PageRank or your inclusion in our index. It does mean that we don't 'know' what to call your page, so it gets listed with the URL as the title and no description.

We appreciate the frustration this causes webmasters who work hard to make their sites accessible to users. We are working to increase the number of fully indexed pages in our search results to alleviate this problem."



mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Sep-13 01:44
Edit Message Delete Message Reply to this message

Google may say that but I do not feel that is totally true. The reason why is that I had around 300 pages most in the top 10 last year. They were asp pages like this productpage.asp?id=2323. Every product ranked very well they changed their algo and I have done many changes over 6 or 8 months and no luck getting any of the pages to rank. Sometime during the last 6/mths they dropped their cache and show "Simlar Pages" most of the pages do not rank at all. They just stay "Simlar Page" and no description for months. Even after the google bot visits them they do not get a cache and I am taking months. I have other site with html pages that get cached in days. I have changed my folder structure and in my robots.txt I have told google not to index my old productpage.asp pages and go to my new path in hopes to fix this but no luck. I still get the "Simlar Page" issue.

I have seen a MAJOR TRAFFIC INCREASE in the last 2 weeks by doing this...
I have created a html flyer page with 25 products discriptions on each page. I made 15 pages like this and linked them to my main page. In 2 or 3 days they were indexed in google and they are now my top entry pages. They are 100% like a flyer you would see in your newspaper. The pages don't use my site nav or structure they are just flyers telling the public about my products. Now these pages have cache and good descriptions. They are also ranking like my asp of last year. These pages have only a title not keywords or description. I have just added keywords and descriptions to the meta to see if it moves their ranking up. Right now without this they rank in around top 10 to 20 for my keywords. If you are having problems with your dynamic pages ranking I would highly recommend doing this. It is almost like the olden days when google worked. *SMILE*




drewpers
Joined: Sep 13, 2004
# Posts: 1

View the profile for drewpers Send drewpers a private message

Posted: 2004-Sep-13 19:03
Edit Message Delete Message Reply to this message

Our site was partially indexed for a few months, it even came up to #29 why'll not haveing a description, I thought wow great I can't wait till it get's fully indexed. Well it did saturday, and now we are a whopping 554th ranking. I am so annoyed at google at this time. So does this mean I must spam the hell out of google to get ranked, cheat and such cause it is on my mind.



mteasdal
Joined: Eons Ago
# Posts: 376

View the profile for mteasdal Send mteasdal a private message

Posted: 2004-Sep-13 22:59
Edit Message Delete Message Reply to this message

drewpers welcome to the forum!
I know this can get all of us nuts thats for sure. Getting dynamic pages to rank are a big problem right now. I went to your site for a quick look and found many of your pages are getting "syntax error occurred". If they are bad links they needs to be corrected with error pages for the bots. I also saw sex= in your url BAD BAD BAD. Use another parameter because you will rank very poor for such words. Google does not know you are looking for a M/F for shirt size. I think that is one reason you are in the 554th ranking. Plus golf is a very compeditive term. Getting links to your site is key. The more you have the better off you are. Last thing is content is king the more unique wording the better off you are with the bots. I hope you don't mind my pointers.



g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10438

View the profile for g1smd Send g1smd a private message

Posted: 2004-Sep-19 19:09
Edit Message Delete Message Reply to this message

>> sex=

Jeez. I'd be entering a numeric quantity in that field. smile


You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
  1. You have not yet logged in, or registered properly as a member
  2. You are a member, but no longer have posting rights.
  3. This is a private forum, for which you do not have permissions.

If you are a recent member, it's possible that you simply have not yet confirmed your account. Please check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions contained within.

If you cannot find this message, click here to Re-Send it.

If you are still experiencing problem, please read the Login Assistance Article for some advice on what may be causing your login not to work properly.

Switch to Advanced Editor and ... Create a New Topic or Reply to this Thread

New posts Forum is locked
© 1995  ·  iWeb, Inc  ·  DBA JimWorld Productions