Cookie blocks crawling?

Posted By: kasperseo ()
Posted On: 2005-Jan-27 14:24

Hi,

a quick question, it's not really about php but I couldn't find a more suitable forum... so I hope somebody can help me out here. I have the following links on the homepage of my client (see profile):

<a href="/NL/home/index.html" onclick="javascript:cook('/NL/home/index.html');return false;" class="lng">> Nederlands</a>

Normally Google and others should be able to follow A HREF links with javascript in them as long as the destination URL is known, yet none of these pages are indexed. Several other pages that don't have js-cookies in links pointing to them, are indexed fine. Could it be that the specific nature of the javascript (placing a cookie in order to recognize language choice) prevents crawlers to follow these links?

Thanks in advance (and sorry if I'm off-topic/off-forum here)!


Posted By: Sinoed ()
Posted On: 2005-Jan-27 17:32

I would have suspected the Javascript but I wondered if there was something else wrong with the page. I ran it through a spider test and you're missing the robots.txt for these pages, that might help although I'm not sure its related. I was actual able to spider the /nl/ version so its not coming up with something else like a 404 or anything that would cause it not to be indexed. I would try adding the robots.txt to see if that helps and then put the Javascript itself under question. There could be something either in the command itself or the way its written that's causing the crawlers to stumble. I'm not sure though. I would also double check to make sure you're not missing any elements like a closing tag or something simple like that. As an example, if the link above is an actual link it looks like you've got an extra ">" by Nederlands..


Posted By: lizardz ()
Posted On: 2005-Jan-27 18:30

Are the pages duplicate content from other sites, I remember you asking this elsewhere but don't remember your answer. Personally, I'd pull the cookie url out of the code/ and replace it with 'NL', then construct the url in the javascript library file, like
url = '/' + country + '/home/index.html';

that would be cleaner, and there wouldn't be a second url in the link, which currently google is looking at and understanding I think sort of, not completely.

But I still suspect you simply have duplicate content.


Posted By: kasperseo ()
Posted On: 2005-Jan-28 12:04

Thanks for the help. I've suggested again to add a robots.txt.
The purpose is in fact to show "> Nederlands" on the page, but to avoid any confusion we advised to change the second ">" in the code into "&gt;".

To solve the "double URL in one href"-problem, I got the suggestion to replace the URL in the js-command with "this.href":

<a href="/NL/home/index.html" onclick="javascript:cook(this.href);return false;" class="lng">&gt; Nederlands</a>

Lizardz, it could be duplicated content that's causing the indexing problems here. Google though guaranteed us trough e-mail that none of these pages were penalized because of duplicated content.


Posted By: lizardz ()
Posted On: 2005-Jan-28 22:14

You got an answer from google specifically referring to your site? Congratulations.

Is that all the email said, or did it say: the pages are not penalized because of duplicate content.

Which could still mean that they are penalized, just not for duplicate content.

How old are the pages, how old is the domain?


Posted By: kasperseo ()
Posted On: 2005-Feb-03 11:23

This was the mail:

Thank you for your note. Please be assured that (page x) and (page y) are not currently banned or penalized by Google. We searched for these sites and found that they are currently included in our search results. To see the results of our search, please visit the following links: (link to searches on the exact web domain).

There's a problem with one of the listings: it's listed as "http://website.com" and it should be "http://www.website.com". I've asked them to change it.

I have no idea by the way how old the pages are...

Anyway, I'm very pleased with Google's service: quick and detailed responses. So I'm sure we'll get there in the end.


Posted By: Sinoed ()
Posted On: 2005-Feb-03 15:48

I wonder if its the way your host is set up? Double check to make sure that enquires to 'http://website.com' resolve to the 'www.website.com' version too.