![]() ![]() |
Home |
How it works |
Projects archive |
Contact Us Air Compressor Bot |
|
| The Career Path of Freelance Programming Jobs |
Extract data from 2 specific categories in www.superpages.com website and place into CSV format files. |
![]() |
Bidding Time: |
28/02/2006 21:00 - 03/03/2006 00:00 |
Budget: |
$100-300 |
Status: |
Closed |
|
|
|
Job Type: |
|
Description: |
1. I don't need the program or the source code. I only need the data so you can get it any way you need to (legally). I have seen others on this site supply program and source code to do this for $200. I have also researched software that does this for $350-$450 and can do it an unlimited amount of times. I am looking in the $100 range for assistance to get the data only. Maybe (hopefully) you already have grabber software to do this and don't need to spend allot of time. For example: http://www.softplatz.com/kw/yellow-pages-grabber/ or http://www.egrabber.com 2. Here are the 2 specific URL's I need the completed contact list from: a) http://yellowpages.superpages.com/listings.jsp?CB=1&R=N&STYPE=S&C=&CID=00000480014&cbdt=Jewelers&catID=2947&L=&PS=15&RT=&RS=&RR=&OO=&search=Find+It b) http://yellowpages.superpages.com/listings.jsp?CB=1&R=N&STYPE=S&C=&CID=00000493919&cbdt=Gift+Shops&catID=15295&L=&PS=15&RT=&RS=&RR=&OO=&search=Find+It 3. Item 2a has 53474 records and item 2b has 89589 records. Please note that I cannot accept data any other yellowpages service except www.superpages.com 4. Your spider or grabber program must parse the HTML and extract the business name, city, state, zip code, telephone number, email address (if applicable), and website (if applicable) into a CSV formatted text file. Also, a field needs to be added before “business name” called "category". For item 2a, this field must contain "jewelers-retail" for all records. For item 2b, this field must contain "gift shops" for all records. 5. A general clean up of the data must be done so the fields are as clean as possible. Most important are the phone numbers that absolutely must be in the format 999-999-9999 and be totally clean (no extra characters like semi-colons or extra digits etc.). Before submitting the files to me, I need the combined file (2a and 2b) merged and purged of any records with duplicate phone numbers. If duplicate telephone numbers are found, records with the least information must be the ones that are deleted. For example, 2 records with the same telephone numbers but one lists a fax and the other doesn't, then delete the one without the fax number. Addresses are less important than phone numbers and email addresses. Even if there are more than 2 business names for the same phone number, pick one randomly; just make sure one record is left with the phone number. Finally, I need the data sorted by category first, then state second. 6.The data is to be submitted to me in files with approximately 10000 records each. I would be expecting a total of 15 files (14 @ 10000 and 1 partial). My Requirements: 1. You must be easily contacted. Either by phone, or you will be required to answer any e-mail I send to you within 12 hours time. 2. Must speak and write English well. 3. You can keep any code or program. I only need the data in a format that can be imported into Excel and other software that reads CSV files. 4. I would like this done and emailed to me (I am willing to try to FTP download) no later than March 4th. I am on dial-up in a very rural area. Related Projects: This project is the proprietary information of .
Click here to remove this project from OUR database.
|
Operating System: |
(None) |
Database System: |
(None) |
| <<< back |
|
| Home | Projects archive | RSS | Resources | Links | Contact Us | © 2004-2008 ProjectsList.biz /16.012 |