So, say a person's looking for code to build a crawler.
Nothing fancy, just a basic crawler that you can:
1 - feed a select list of links into to tell it to start crawling from (think the main page of different sites)
2 - tell the spider to watch for select keywords (give it a list to look for)
3 - index only those with the keyword/phrase mentioned
4 - spec whether the mention happens in the metas, the body, the domain, the URL, etc.
Assume the desire is for a crawler that will live on a server and can be given it's own IP if absolutely needed - MySQL database friendly and, thus, likely php-based, or at least Linux box friendly.
...and it's care-taker is not a programmer, so this pet spider will need to come with instructions.
|