![]() ![]() |
Home |
How it works |
Projects archive |
Contact Us Air Compressor Bot |
|
| The Career Path of Freelance Programming Jobs |
Web Crawler |
![]() |
Bidding Time: |
16/04/2006 23:02 - 23/04/2006 23:02 |
Budget: |
N/A |
Status: |
Closed |
|
|
|
Job Type: |
|
Description: |
DESCRIPTION: ==================== Project. A Data Procurement tool written in PHP and writing data to a mysql data base. I require a tool that is a trainable robot similar to Zeus put out by cyber-robotics.com but web based The robot will be trained on Keywords in the title, keywords, discription tags and text. The keywords will be able to be scored in 5 catergories 1. Ignore 2. Common 3. Slighly Unique 4. Unique 5. Very Unique The robot will not follow any links from sites that dont have at least 3 words slighly unique and 2 words unique. I will need to be able to change this criteria as I wish in the control panel. The crawler will extract all links from sites it visits. The data will be written to the attached data structure. The main tables for data entry are jos_mt_links and jos_mt_cats, this is a mosets tree data structure. Extra tables needed for the crawler will be designed by you. The control panel will have features that include the following. 1. How deep to crawl sites. 2. If a site is put into a catergory then all links from that site will be crawled 3. If a site is deleted then all links from that site are dropped. 4. The ability to show the next 500 links to be visited and the ability to delete any or all of those so they won't be visited. 5. A banned list of urls that the robot will not visit. 6. An area to install the next url to visit or multilple urls to visit. 7. Pause and Continue buttons. 8. An ability to revisit sites and refresh content. 9. A method so we dont have duplicate url's in the database. I am open to any ideas on how to achieve the above at the most reasonable cost. If you already have something similar that can be modified let me know. Thanks Dan Related Projects: This project is the proprietary information of .
Click here to remove this project from OUR database.
|
Operating System: |
Linux |
Database System: |
MySQL |
| <<< back |
|
| Home | Projects archive | RSS | Resources | Links | Contact Us | © 2004-2008 ProjectsList.biz /16.97 |