Home | How it works | Projects archive | Contact Us
Air Compressor Bot
 
The Career Path of Freelance Programming Jobs 

   Extract data from 2 specific categories in www.superpages.com website and place into CSV format files.

Bidding Time:
28/02/2006 21:00 - 03/03/2006 00:00
Budget:
$100-300
Status:
Closed


Job Type:
N/A
Description:



HostRocket.Com - 1000MB - $4.99

1. I don't need the program or the source code. I only need the data so you can
get it any way you need to (legally). I have seen others on this site supply
program and source code to do this for $200. I have also researched software
that does this for $350-$450 and can do it an unlimited amount of times. I am
looking in the $100 range for assistance to get the data only. Maybe
(hopefully) you already have grabber software to do this and don't need to spend
allot of time. For example: http://www.softplatz.com/kw/yellow-pages-grabber/
or http://www.egrabber.com
2. Here are the 2 specific URL's I need the completed contact list from:
a)
http://yellowpages.superpages.com/listings.jsp?CB=1&R=N&STYPE=S&C=&CID=00000480014&cbdt=Jewelers&catID=2947&L=&PS=15&RT=&RS=&RR=&OO=&search=Find+It
b)
http://yellowpages.superpages.com/listings.jsp?CB=1&R=N&STYPE=S&C=&CID=00000493919&cbdt=Gift+Shops&catID=15295&L=&PS=15&RT=&RS=&RR=&OO=&search=Find+It
3. Item 2a has 53474 records and item 2b has 89589 records. Please note that I
cannot accept data any other yellowpages service except www.superpages.com
4. Your spider or grabber program must parse the HTML and extract the business
name, city, state, zip code, telephone number, email address (if applicable),
and website (if applicable) into a CSV formatted text file. Also, a field needs
to be added before “business name” called "category". For item 2a,
this field must contain "jewelers-retail" for all records. For item
2b, this field must contain "gift shops" for all records.
5. A general clean up of the data must be done so the fields are as clean as
possible. Most important are the phone numbers that absolutely must be in the
format 999-999-9999 and be totally clean (no extra characters like semi-colons
or extra digits etc.). Before submitting the files to me, I need the combined
file (2a and 2b) merged and purged of any records with duplicate phone numbers.
If duplicate telephone numbers are found, records with the least information
must be the ones that are deleted. For example, 2 records with the same
telephone numbers but one lists a fax and the other doesn't, then delete the one
without the fax number. Addresses are less important than phone numbers and
email addresses. Even if there are more than 2 business names for the same
phone number, pick one randomly; just make sure one record is left with the
phone number. Finally, I need the data sorted by category first, then state
second.
6.The data is to be submitted to me in files with approximately 10000 records
each. I would be expecting a total of 15 files (14 @ 10000 and 1 partial).
My Requirements:
1. You must be easily contacted. Either by phone, or you will be required to
answer any e-mail I send to you within 12 hours time.
2. Must speak and write English well.
3. You can keep any code or program. I only need the data in a format that can
be imported into Excel and other software that reads CSV files.
4. I would like this done and emailed to me (I am willing to try to FTP
download) no later than March 4th. I am on dial-up in a very rural area.

Start your work-at-home career for $7.00. Get direct access to thousands of freelance and home-based jobs. Click here to find work now.

Related Projects:
Shopping Auction Site
Logo Revision Work
Add Content To Site *easy
Proffesional Auction Site
Mod Rewrite

This project is the proprietary information of . Click here to remove this project from OUR database.
Operating System:
(None)
Database System:
(None)
<<< back

Recent Projects Archive:

Monday - Sunday - Saturday - Friday - Thursday - Wednesday - Tuesday

View all freelance web projects

 
Home | Projects archive | RSS | Resources | Links | Contact Us © 2004-2008 ProjectsList.biz /16.012