Printer Friendly Version Print this thread
Email this thread to a friend eMail this thread to a friend
Related Forum Posts
  1. sell my site, or domain. 100k veiws month (In: I Want to Sell My Website)
  2. Classifieds site for sale (Adsense revenue...) (In: I Want to Sell My Website)
  3. Profitable softwares and Site for sale (In: I Want to Sell My Website)
  4. Established 3 yr old site for sale 2500 uniques/da (In: I Want to Sell My Website)
  5. Mixxclone Site For Sale (In: I Want to Sell My Website)
Featured Web Site Template

Hundreds More at Free Site Templates.com!

Web Site Partners
Sponsored Links
Jet City Software
 
Whos Here ?
Reflects user activity within the last 5 minutes
Member Message

joe_vimal
Joined: Mar 22, 2001
# Posts: 104

View the profile for joe_vimal Send joe_vimal a private message

Posted: 09/30/2005 12:32 am
Edit Message Delete Message Reply to this message

Anyone heard of a scraper script which extracts job ads from many sites and dumps them into a database ?

I could see tons of other scripts in all the usual places, but not one along this line. Writing a script from the scratch seems daunting. Any help would be much appreciated.



masidani
Joined: Oct 21, 2005
# Posts: 10

View the profile for masidani Send masidani a private message

Posted: 10/24/2005 12:24 am
Edit Message Delete Message Reply to this message

Joe,

I think it would have to be custom-written, I'm afraid. The reason is that the script needs to know how the HTMl code on each of the sites is written in order to know where to find the job data in the HTML.

If you visit each of the job sites yourself and look at the HTML source code, you'll see that each one is different. The "screen scraper" program will need to know where to look in each page to find things like job title, salary, location etc., which will be different in each case. Hence it will need to be custom-written.

That said, a Perl program with LWP::Useragent library and a few regular expressions will suffice, so long as there are no login/registration procedures etc. that need to be dealt with.

Simon



joe_vimal
Joined: Mar 22, 2001
# Posts: 104

View the profile for joe_vimal Send joe_vimal a private message

Posted: 10/26/2005 08:02 am
Edit Message Delete Message Reply to this message

Thanks Simon. I was afraid I would have to start from the beginning. There are other issues too. Will I be infringing on some copy right law if the script scrapes a couple of lines from many sites ?





bhartzer
Staff
Joined: Jun 08, 2000
# Posts: 7036

View the profile for bhartzer Send bhartzer a private message

Posted: 10/26/2005 09:44 am
Edit Message Delete Message Reply to this message

Will I be infringing on some copy right law

Yes.



joe_vimal
Joined: Mar 22, 2001
# Posts: 104

View the profile for joe_vimal Send joe_vimal a private message

Posted: 10/27/2005 12:22 am
Edit Message Delete Message Reply to this message

Thanks bhartzer. I knew something like this would happen. Ok. I have read somewhere that if you quote a couple of lines from any site in your site and use appropriate credit, you will not be hauled up for violation of copyright. Is this true ?

I am sorry I am asking this in a Perl forum.



lizardz
Joined: Nov 12, 2004
# Posts: 1394

View the profile for lizardz Send lizardz a private message

Posted: 10/27/2005 12:00 pm
Edit Message Delete Message Reply to this message

Use of a few lines is fair use I believe, that's not copyright infringement.

That's why you can quote somebody's writing for example, but not duplicate their whole article, but you can quote from an article.



excell
Staff
Joined: Mar 19, 2001
# Posts: 14504

View the profile for excell Send excell a private message

Posted: 10/27/2005 12:04 pm
Edit Message Delete Message Reply to this message

a scraper script - automation of the process of taking content...yes I would be careful with what you create.



joe_vimal
Joined: Mar 22, 2001
# Posts: 104

View the profile for joe_vimal Send joe_vimal a private message

Posted: 10/27/2005 10:43 pm
Edit Message Delete Message Reply to this message

No way excell. I perfectly understand and abhor the stealing of content from others. But what I am interested is - we want to populate the database of a jobsite with enough job offers to make the site attractive for the job seekers. Our client does not wish to infringe any laws and we won't either.

Scraping a line of content from other sites is perfectly acceptable if you don't overdo it. eg: For SEO purposes, many scrape the search results pages of search engines:

Results 1 - 100 of about 3,640,000 for 'keyword'

Same way we use snippets of information from weather sites too usually with the express consent from the webmasters.

In our case, even a couple of lines might be frowned upon as the snippet of imformation has some commercial value.

I am confused. We don't want to be associated with any route that will even remotely land us in trouble. Losing this client in such a case would be preferable. What is the consensus of the Ladies and Gentlemen here ?



You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
  1. You have not yet logged in, or registered properly as a member
  2. You are a member, but no longer have posting rights.
  3. This is a private forum, for which you do not have permissions.

If you are a recent member, it's possible that you simply have not yet confirmed your account. Please check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions contained within.

If you cannot find this message, click here to Re-Send it.

If you are still experiencing problem, please read the Login Assistance Article for some advice on what may be causing your login not to work properly.

Switch to Advanced Editor and ... Create a New Topic or Reply to this Thread

New posts Forum is locked
© 1995  ·  iWeb, Inc  ·  DBA JimWorld Productions