Searh Engines / Spiders & Php

Posted By: rol_one ()
Posted On: 2005-May-12 23:35

This may be an obvious question to some, but I recently had my site revamped by a professioal developer and there are a number of pages that now .php as opposed to the basic .htm

I appreciate that this makes the site a lot easier to modify, but does this mean the pages located in the database are no longer listed and won't be picked up by spiders? Does a website site with the same amount of .htm pages rank heavier due to quality and quantity of content than those with a mix of .htm and .php?


Would really appreciate your comments and any first hand experience on this



Posted By: g1smd (Staff)
Posted On: 2005-May-13 00:31

The extension itself doesn't matter; you can have whatever you like.

However, changing the names will cause your rankings to drop, a lot.

You should set up the server so that DirectoryIndex now points to index.php as the default filename. You might want to redirect all calls for .htm filenames over to the equivalent .php files OR you could run your entire PHP site with the old .htm filenames but parsed for their PHP content OR you could rewrite the URLs. There is a setting for these latter two options right there in the server configuration files.

If you ever link to an index file that is in a folder, then never include the actual filename of the index file in the link. Link to the folder name followed by a trailing / on the URL. Let the server decide what the index file is actually called, find it, and serve it. That way, you don't have to change anything at all when all your index files change from being index.htm to be index.php as the links will still carry on working.

Additionally, note that URLs with more than 3 variables or anything that looks like a session ID in them do not get indexed properly. Avoid that whatever you do.


Posted By: lizardz ()
Posted On: 2005-May-13 02:10

"but I recently had my site revamped by a professioal developer"

Professional? Developer, hmmm. To me a professional should know about these types of issues, I havent' made a .php page for years, all are parsed php with .htm extensions, it's simply too easy to do that to even consider any other option, and since every page of every site I do is scripted, there's no reason not to use .htm always, or nothing, as some people prefer, although I prefer to have an extension if only for old time's sake.

This might only be me, but when I see a site that uses php, asp, aspx, jsp etc, it's not a good first sign, it assumes the same processing language will always be running your website, which it may not be. Much safer to just stick with plain old htm, which is what the user is always receiving anyway, the php/asp/asp/jsp is really only relevant to the server, and aren't necessary at all.

It will be a good day when professional developers understand enough about seo and good site urls [good urls don't change] to stop making these types of mistakes.

By the way, I just checked an old site I did for someone, and they had switched it to shtml, probably to include nav bars or something, thinking they were all clever, not realizing the site was always setup to run as php if required. All their backlinks picked up over the last year, broken, all pages now treated as duplicate content by search engines, sad. I'm starting to think the number of actually professional web developers is MUCH smaller than we believe, by a very large percentage.


Posted By: rol_one ()
Posted On: 2005-May-13 07:41

I guess part of this comes down to if I go onto internet explorer and use View > Source then I guess what I see as content is what the Spider sees right? Or wrong?





Posted By: lizardz ()
Posted On: 2005-May-13 10:47

Right. Or you can download lynx for windows and surf your site with that, it's a text only browser that shows you almost exactly what the spider sees, literally, the spider isn't that different a program than lynx.


Posted By: g1smd (Staff)
Posted On: 2005-May-13 19:55

Server-Side Includes and Server Side Scripting simply means that all the work is done in the server. The server assembles the page of HTML before it is delivered to the browser or bot that requested it. It is identical to the code that would appear on a static page, and will rank the same too.


Posted By: rol_one ()
Posted On: 2005-May-14 00:57

THANK YOU FOR ALL YOUR HELP ON THIS

smile