primed
Joined: Eons Ago
# Posts: 18
|
Posted: 2003-Feb-05 09:14
Hi all.
I have decided to try my hand at creating an Internet search engine and would like to know what the process entails. Does anyone know where I can find any information on this? Also, are there any scripts already available I can use as a model?
What are the requirements for an Internet search engine from a user viewpoint (webmasters and visitors). Exactly what is right and what is wrong with the current popular search engines?
[url & sig removed]
[ Message was edited by: unreviewed 02/05/2003 10:28 am ]
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-05 13:15
Jim
I would be interested to help with this also. I know of one engine that is being developed at the moment by some of my collegues so I maybe able to get some information from them (and maybe the search scripts )
[email & sig removed]
[ Message was edited by: unreviewed 02/05/2003 10:29 am ]
|
 |
unreviewed
Joined: Dec 07, 2000
# Posts: 6776
|
Posted: 2003-Feb-05 16:37
primed, you are free to discuss this here. It's a great subject. However, asking for our membership to go to your board to discuss it is kind of silly, don't you think?
[This thread has been moved from HELP! and New Member's to Running Your Own Search Engine or Directory]
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-05 16:41
Unreviewed just wondering why you remove e-mail address' from posts?
|
 |
primed
Joined: Eons Ago
# Posts: 18
|
Posted: 2003-Feb-05 17:09
Hi Unreviewed.
I was not aware I was breaching any rules. Sorry about that. However For info - I currently do not have a board on the helpfiles.**.uk website - so was inviting people to contact me by e-mail. However I will keep posting here
cheers
Jim
|
 |
unreviewed
Joined: Dec 07, 2000
# Posts: 6776
|
Posted: 2003-Feb-05 18:48
netcservices.co.uk we like to keep conversations within the board, your email was removed because it was out of context after I removed the invitation to our membership to discuss this subject elsewhere.
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-05 18:58
Oh okies, if an address is posted however and its relevant then thats okay right.
I dont want to get in trouble cos im a nice guy
If I am out of line grab the nearest herring and wack me round the head with it plz :D
|
 |
unreviewed
Joined: Dec 07, 2000
# Posts: 6776
|
Posted: 2003-Feb-05 19:22
primed, are you planning on indexing the entire Internet or just a small specialized index.
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-05 19:15
The scripts and bot stuff I can get hold of indexs site that are submitted externally and one that we put directly into it. Dunno if its of any use...
|
 |
primed
Joined: Eons Ago
# Posts: 18
|
Posted: 2003-Feb-05 19:27
Hi Unreviewed
I was thinking of a bit of both. On the first hand, an engine which specialises on specific topics based on subject categories. On the other hand the engine should also have the option for the user to search a wider framework (i.e. the entire internet).
One of the problems I find with ALL search engines is the wealth of results it brings back but only 5% of them are actually relevent. My vision is to create a flexible engine which is more accurate and gives the user a lot more control.
For the specialist subject search I was thinking of a MySQL database which links to an Oracle database for the www. This should be quicker for results and more accurate.
Here is what I have so far but it is only a rough outline of the perfect search engine:
What makes the perfect search engine
Things that should be included are:
• Spam reduction/elimination
• Keyword detection
• Public(user) interaction
• Trademark issues
• Dealing with dynamic content
• Category Structures of Unlimited Depth
• Search Engine Friendly Pages (Dynamic or Static)
• Spidering
• URL Validation and Verification
• Completely (100%) Template based
• Statistical Tracking & Reporting
• Customisable API
• Backfill search results from other engines
• Adult / Porn Filtering
• Administration Approval of Submissions
• Moderator System, fully permissionable
• Mass Emailing to List-ees and Subscribers
• Automation of repetitive tasks
• SQL or Oracle Database Powered
• Full Database Control - Oracle and SQL
• OLAP/OLTP Database Ready
• Clusterable
• Pay per Click Engine Ready
Netcservices - Any scripts you can get which will move toward this will be great!!! I have several core scripts which do different things but the challenge will be in putting them all together as one application.
Cheers
Jim
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-05 19:35
One thing from that I would say is to not use a mysql database because think they are a little too unstable.
Dunno what exactly id use as an alternative but then again im not a programmer
I think your list of points is valid, maybe it would be worth trying to get Google to "give" you an old version of their spider on the promise that if you can get these points working well you give it back to them.
Maybe not google though it would be better to try someone smaller I dont think google are likely to even think about the idea.
I think the category pages structure should be something that resembles Yahoo's because its easy to use and there are no problems navigating it...
Another possibility to reduce the level of "garbage" in the engine would be to have a secondry "spider" or similar to crawl through the pages you already have listed removing anything that the other spider missed. Either that or you could review every site coding yourself
Id like to help if I can because all this is of interest to me.
If you need anything you know where I am...
|
 |
primed
Joined: Eons Ago
# Posts: 18
|
Posted: 2003-Feb-05 20:02
Hi netcservices
Great ideas. I think maybe Oracle will be the way to go for the database since it is stable (but pricey) and i already have a copy .
Get in touch with me and we can start working up some ideas. And of course publish our chats on this board here to keep our good friend Unreviewed happy, involved and informed
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-05 20:37
Okies erm I think I have your addy stored somewhere ::scratches head and nuts his outlook box....::
|
 |
unreviewed
Joined: Dec 07, 2000
# Posts: 6776
|
Posted: 2003-Feb-05 21:02
"keep our good friend Unreviewed happy, involved and informed"
That's all I ask.
Are you planning on using a shared hosted server or dedicated?
|
 |
primed
Joined: Eons Ago
# Posts: 18
|
Posted: 2003-Feb-05 21:36
Hi Unreviewed
""keep our good friend Unreviewed happy, involved and informed" That's all I ask. Are you planning on using a shared hosted server or dedicated? "
I think the answer to this has got to be a dedicated server. I have set up a Raq4 for offline testing but envisage hiring/buying a dedicated server once the engine is ready to go live. Speed, database, space, back-ups, bandwidth - they all play a major part to get the instrument finely tuned.
The other big question is whether there should be automated robots or humans indexing the submissions? Humans are slower but can be more selective, however I think robots can play an important part shortlisting sites which are not correctly set up, have no titles or keywords etc.
I notice file uploads are not permitted on this forum. I would like us to exchange ideas via documents too so we can build a decent concept. Any thoughts on how best to do this without breaching the forum rules?
Any thoughts? Please feel free to input.
|
 |
unreviewed
Joined: Dec 07, 2000
# Posts: 6776
|
Posted: 2003-Feb-05 21:51
Lets simplify the "rules" and get that out of the way.
We want people to contribute to the community. The community is about webmasters helping other webmasters.
Post what you will... including links that "give" to the community.
Take a look at, http://www.aspseek.org/ and try it here http://www.aspseek.com/
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-05 21:53
That sorts that little problem out
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-06 10:32
Well just so everyone knows what is going on.
We now have a search script but it needs work doing on it. Basically it will find pages but it wont filter them.
Jim you dont know this yet but you do now
Ill mail you later and give you some more info from my discussion this morning with Mr Coder.
|
 |
primed
Joined: Eons Ago
# Posts: 18
|
Posted: 2003-Feb-06 10:41
Hi Gareth
Great news!!! I will look forward to it. I did some searching this morning trying to find what people think about different search engines. Mostly people tink they are confusing and don't return accurate results. They also hated the banner ads and pop-ups so there needs to be a better way of generating finance from a search engine so it is self-sustainable. Personally I think charging a listing fee would be the way to go. It may be tougher to build up awareness but you have more control over content and results as well as the revenue side of things.
Thoughts anybody?
Cheers
Jim
|
 |
netcservices.co.uk
Joined: Aug 06, 2002
# Posts: 1594
|
Posted: 2003-Feb-06 11:11
Jim
The ideal engine will have no adverts on it at all.
In reality if we make this search engine so that all revenue is via ppc then that means we dont have to have pop up adds to finance the free inclusions.
Revenue of course is important but im a great fan of customer service and above else I would like it to be made towards listings rather than making a quick buck.
Gareth
|
 |