Printer Friendly Version
Email this thread to a friend
|
Featured Web Site Template |
|
Reflects user activity within the last 5 minutes
|
|
| Member |
Message |
neiljones
Joined: Eons Ago
# Posts: 80
|
Posted: 2004-Dec-29 23:54
I seem to be having difficulty getting the googlebot to properly spider a new section of my site.
I have recently expanded a series of photo galleries adding thousands of new pictures. The pages follow the same format , layout and structure as thousands of pages that are allready in the Google index.
The googlebot has visited but has not visited every one of the new pages. Cosequently they are not listed in the searches. I have checked the older pages which had a small design change and that change is showing on the cached copies.
I have good rankings for my chosen keywords and have a strategy that workes when it all gets into the index.
Since there is no real difference between the old and new stuff. I presume I am offering Google too much to digest at one go and that it will takes some time to spider everything.
Am I correct? . Does any one have any suggestions as to how to speed this up.?Links to the new contetnt from another site is one Idea I have. What do people think? Are there any others?
|
 |
lizardz
Joined: Nov 12, 2004
# Posts: 1394
|
Posted: 2004-Dec-30 03:59
High quality links help. You are probably offering google too much to digest at once, and my guess is you are also offering it something that is very close to duplicate content, most galleries are.
It all depends how you're doing it, if you are using query strings, forget it, you might as well give up. If you are using real urls, with mod_rewrite, you have a chance, but only if your pages actually have something unique about them. Personally, when I put up galleries I don't even bother trying to make them search engine friendly, there's no point as far as I'm concerned.
|
 |
neiljones
Joined: Eons Ago
# Posts: 80
|
Posted: 2004-Dec-30 15:37
I can rule out duplicate content as a problem on the following grounds..
1. Similar content which is older is in the index with no problems.
2. Many of the pages aren't getting spidered so google cannot know what the content is for certain without examining it.
Actually I find galleries effective at attracting people and since I design them properly there seems to have been no problem getting them into the search results.
The system seems to work differently now. When the old monthly dance happened you would get spidered in total abnd then the stuff would appear after the danvce had settled down.
Now the speider seems to behave quite differnttly visitng the site every day and doind a bit. The problem is for me to get it to accept the new bits.
|
 |
neiljones
Joined: Eons Ago
# Posts: 80
|
Posted: 2004-Dec-30 15:54
oh BTW I forgot to say I do not use query strings. The URLS all show as plain html files.
|
 |
philh
Joined: Sep 14, 2001
# Posts: 3050
|
Posted: 2004-Dec-30 15:59
Build a site map
|
 |
neiljones
Joined: Eons Ago
# Posts: 80
|
Posted: 2004-Dec-30 22:02
Thanks Philh, for the suggestion but I do have the equivalent of a site map. There are sets of pages of links which ensure that everything is linked to. Nothing is more than 4 links from the top ( at least not in the new stuff) and the site is now approaching 100,000 pages in size. The latest upgrade is over 27,000 pages.
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10438
|
Posted: 2004-Dec-30 22:26
>> I have recently expanded a series of photo galleries adding thousands of new pictures. The pages follow the same format , layout and structure as thousands of pages that are allready in the Google index. <<
Yes, but Google indexes TEXT content. How much text is there with each image? Is it very different to the text on all of the other pages?
|
 |
lizardz
Joined: Nov 12, 2004
# Posts: 1394
|
Posted: 2004-Dec-31 21:48
The latest upgrade is over 27,000 pages.
Which were put up when? Which have how much unique text content per page. Which have how much duplicate content per page? How many unique characters are there on each output page? How many identical? Aside from the often discussed lag for googlebot adding large numbers of new pages on existing sites, what exactly is the page level differentiation? Obviously you aren't doing very much work to create each page, it's almost all automated, no? Title extracted from what? same <title> tag each page except for numerical / category stuff?
I'm interested in the details, saying that google hasn't listed your new stuff #1 right away isn't particularly interesting to be honest, although seeing exactly what you have in terms of structure is.
|
 |
neiljones
Joined: Eons Ago
# Posts: 80
|
Posted: 2004-Dec-31 23:26
>Which were put up when?
17 th December for the most part some earlier during november but they haven't been spiderd much either.
>Which have how much unique text content per page.
> Which have how much duplicate content per page?
>How many unique characters are there on each output page?
>How many identical?
I don't have any definate figures on this since the method I use introduces variations.
Each photograph has a proper descriptive titte, There is a section on each page about several other parts of the site which varies in actual content but is semantically similar across sections of the new stuff.
I am aware of the pages looking too similar because the real meaning is in the photograph which the pside cannot see so I have tried to minimise the identicallity of the informaiton present..
> Aside from the often discussed lag for googlebot adding large numbers of new pages on existing sites
Can you direct me to some of these discussions? I am not a regular visitor here I am too busy working on my site.
> what exactly is the page level differentiation? Obviously you aren't doing very much work to create >each page, it's almost all automated, no?
There is a level of automation of course, in that the site is database generated, but quite a bit of data is manually input. Most of the database work is done off the site to generate tables which create the pages.
My latest version of picture cataloging program.( homemade of courss) allows me to title a photograph in under 10 seconds.
>Title extracted from what? same <title> tag each page except for numerical / category stuff?
The title is unique fo each page. It is derived from the nme of the photograph held on the database.
It has a number in it sometimes but this is to distinguish between photographs which co-incidentally have the same title.
>I'm interested in the details, saying that google hasn't listed your new stuff #1 right away isn't particularly >interesting to be honest, although seeing exactly what you have in terms of structure is.
Pages in groups of 110. 100 pics with 10 index pages each listing 10 photos.
All interlinked with text based links which have at least some semblance of meaning.
|
 |
lizardz
Joined: Nov 12, 2004
# Posts: 1394
|
Posted: 2005-Jan-01 21:17
Sounds pretty good, congrats on programming it yourself, that's what I always prefer to do too.
However, december 17 is nothing, google took about 3 weeks to add only about 2000+ real text content pages I added to a site.
So 27,000, I don't know, do us a favor and let us know when you first see them all in the index, easy to check by searching for unique title of any page:
"unique title text in quotation marks"
in google, once they all show up, all your pages are indexed. You should also see a big jump in your googlebot stats, assuming you have access to spider stats, hopefully that's the case.
For other articles referencing this topic, search this in google:
lag adding new pages to google
Some sites require registration to access material.
Take all discussions of this topic, and any sandbox type topic [I don't believe these two questions are in fact directly related, although they could be indirectly related] with large grains of salt, unfortunately a lot of the reasoning being used in many of these threads isn't very solid, to put it politely.
The main point is this: for the last year google's primary index has behaved as if it's full. There's a lot of debate about this question, so I'll word it like that. One of these behaviors is for a significant lag on adding new pages to the index. Best example: you must wait for some water to leave a full glass before you can add more drops of water. This is the behavior of the current google index, and it's what you are seeing exactly.
|
 |
neiljones
Joined: Eons Ago
# Posts: 80
|
Posted: 2005-Jan-04 17:10
Actually you can check the number of pages in the index by entering the following site:domainname.com
This seems to display the number of pages in the index for a given site. According to this I am gaining 1000 pages or so a day. I have full access to Analog and I have been running the results through small perl scripts to get figures out for various things.
I will try to keep people posted on the results.
My next project is to add loads more text pages. I have a bright idea with some data that just happens to be sitting on my hard disk from a previous project off the web that didn't work out too well. This will be a useful comparison. I thought the data was just too difficult to manipulate but thanks to the wonders of Linux I have just solved that one.
I just have to find out how to link all the wretched stuff together in a way that makes sense. A way that is spiderable with all the keywords in the links etc.. Still it is an interesting programming project.
|
 |
You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
- You have not yet logged in, or registered properly as a member
- You are a member, but no longer have posting rights.
- This is a private forum, for which you do not have permissions.
If you are a recent member, it's possible that you simply have not yet confirmed your account. Please
check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions
contained within.
If you cannot find this message, click here to Re-Send it.
|
If you are still experiencing problem, please read the
Login Assistance
Article for some advice on what may be causing your login not to work properly.
|
Switch to Advanced Editor and ...
Create a New Topic
or Reply to this Thread
|
|