Home | How it works | Projects archive | Contact Us
Air Compressor Bot
 
The Career Path of Freelance Programming Jobs 

   data crawler to login & spider inventory data from distributor website to csv file

Bidding Time:
09/10/2005 21:25 - 19/10/2005 00:00
Budget:
$30-100
Status:
Closed

Job Type:
PHP, C/C++, .NET, ASP
Description:



We need to create a automated crawler that will log into a distributor warehouse
website and download inventory data from tables to a delimtered file.

The website we will be crawling is the login/search catalog section of
www.electrograph.com.

I have saved copies of their site locally to demonstrate what needs to be done.
After closing of project we will provide actual login details to the live site
for the job to be completed.

Walk through process of what needs to be done:

Login Home Page
http://66.70.17.115/electrograph/index.htm
Goto main website and login using the form in the uppler left hand corner of
page. User name and password should be definable.

Successfully Logged In
http://66.70.17.115/electrograph/step1.htm
After the login has been processed successfully the page is refreshed now
including a "My Account" section in the upper left hand corner.
Additionlly, The "keyword/ Item# search" form is now enabled for our
specific account. It will display the specific pricing, and inventory
quantities available for our account when submitted.
Currently their web site allows you browse through the inventory of items by
category, and then paginate through the results (cannot show all products in one
iteration). We need to follow each category link through the select menu
"ddlCategory" individually, download all the data in the page to
specified format, and continue on to the next page of results if another page
exists.

Crawling first result page of the first category searched
"Accessories"
http://66.70.17.115/electrograph/step2.htm
This page displays the information that we are looking to store in a
delimitered file format. We need to trim & store Model #, Manufacturer,
Description, Availability, Reseller Price columns. Each table row, a new line
in the delimtered file created.

Take note of the Availability column, it provides a total quantity number in
stock and then a "I" icon. When you hover above this "I"
icon it displays the breakup of which warehouse locations that product is stored
in. For example: 18 (I says: 14 - NY, 4, NV, meaning 14 units in stock in New
York, 4 units in stock in Nevada). We need to store both the total quantity
available as well as those individual location listings. A column for each
warehouse location.

Crawling second/additional result page(s) of the first category searched
"Accessories" (page 2+)
http://66.70.17.115/electrograph/step2b.htm
Perform the same process as Step2 downloading & storing all the inventory
data, and continue onto the next page if it exists.
(Note on the saved version of the this page i povided you; the javascript is
not working to show the individual warehouse splitup, it will of course be
operating on the live site)

Crawling first result page of additional LARGE category searched "Plasma
Displays"
http://66.70.17.115/electrograph/step3.htm (interim refine page)
http://66.70.17.115/electrograph/step4.htm (actual results page)
Some categories of their website that contain a substation amount of products,
when you first click on "SEARCH" it does not display results. It
brings you to another "search plasma displays" form where you can
refine your results, and search by attributes. We do not care to do this, we
simply want to select the "GO" button, which will display all the
products under that category in the same manner as step2.

Crawling second/additional result page(s) of additional LARGE category searched
"Plasma Displays"
http://66.70.17.115/electrograph/step5.htm
Perform the same process as Step2 downloading & storing all the inventory
data, and continue onto the next page if it exists.

The end result needs to create a file that is Delimitered by Comma
Example result for parsing of example link
http://66.70.17.115/electrograph/step2.htm

Model Number, Manufactuer, Description, Reseller Price, Total Available Qty,
Location NY Qty, Location NV Qty, Location XX Qty
ACE615, ADCOM, ACE-615 ILS SURGE (120V), 315.00, 12, 12, 0, 0
TRAVEL CS/42"PANASON, CALZONE CASE CO, TRAVEL CASE 42" PANASONIC,
345.33, 0, 0, 0, 0
FSD-4100, CHIEF MANUFACTURING, FSD-4100, 97.39, 0, 0, 0, 0
CMA-0608, CHIEF MANUFACTURING, 6'-8' ADJUSTABLE PLATE, 93.39, 0, 0, 0, 0
RC-1PXL, ELECTROGRAPH SYSTEMS, 24-BUTTON SWITCH PANEL FOR VS-1XL, 104.76, 0, 0,
0, 0
RC-1XL, ELECTROGRAPH SYSTEMS, NEW MODEL NUMBER (WAS VS-1XL) REMO, 104.76 0, 0,
0, 0
FRAME-O, ELECTROGRAPH SYSTEMS, SINGLE GANG FRAME TO HOLD UP TO 3 W, 245.35, 0,
0, 0, 0
FRAME-W, ELECTROGRAPH SYSTEMS, SINGLE GANG FRAME TO HOLD UP TO 3 W, 14.89, 5,
5, 0, 0

Notice on the website, some products it gives a quantity, some it says
"call for availability". We need to be able to map whatever text is
in that field to a text/numerical equivalent. For example in this impelentation
we define "Call for availability" as 0.

Also, because they are always adding and changing warehouse locations we need
to leave room at the end of the delimitered file for new locations that are
added. When text is found in the quantity available field, and we compare it to
find its equivalency and apply that to all the other location columns. For
example: "call for availabiilty" will result in 0, 0, 0, 0 (Total
Quantity Available, Location 1 Qty, Location 2 Qty, Location 3 Qty). We should
make room for up to 10 warehouse locations (0, 0, 0, 0, 0, 0, 0, 0, 0, 0). When
a quantity is not defined for a warehouse that is indexed we will replace it
with zero.

In this example Call for availbility means the product is not in stock, thus we
are marking it and all subsequent warehouse locations as 0.

I also need to able to control the delimiter used in the output file (I have
used comma in this illustration for ease).

I also need to be able to control the delay between page navigation
(milliseconds)

A database should not be necessary; a simple config file is fine.

Need to get this project completed ASAP. We have several data crawlers that
need to be created: Winner of this project can expect future work in the
development of similar crawlers.

Get $20 Off Panda Internet Security 2008. Coupon Code: WOWPISUSD20.

Related Projects:
Engine that Allows Pay Per Click
Profit Share for Web Promotion
Dotnetnuke customization
Basic ecommerce .net site
Skin Need For Ilance V3

This project is the proprietary information of . Click here to remove this project from OUR database.
Operating System:
(None)
Database System:
(None)
<<< back

Recent Projects Archive:

Friday - Thursday - Wednesday - Tuesday - Monday - Sunday - Saturday

View all freelance web projects

 
Home | Projects archive | RSS | Resources | Links | Contact Us © 2004-2009 ProjectsList.biz /0.072