Web Site Scraping
We want to build a service which srapes web sites in order to maintain an external database and to extract data from dynamic web pages. The targeted website has to be entered through a log into site.
The service will be initiated by an external scheduler. The external scheduler uses XML code which contains all information for the service. The service shall execute the following steps
a) receive XML
d) pass the log into site
c) maintain the external database
d) extract data
e) send XML
Once the service is finished, it shall report its success (XML).
Technical details: Communication only via XML interface. The XML schema is given. We expect cURL or Java. Multiple instances on the same machine are required.
As a contractor you can use a testing system for the XML interface. Regarding the third party websites you will receive the login data for a user account and a screen shot documentation of the manually maintenance for every targeted web site. Please note that we cannot provide a testing system for third party websites, every change is real life and has to be restored to the original data.
We want to scrape 250 web sites successive within the next months. This is an enquiry for the first package of 25 web sites. Ongoing we need another 10 a month, eventually up to 25 a month.
At the moment we are asking for external development only and will do the ongoing maintenance by ourselves. In a further stage we will shift this work as well.