Hi,
I would like to express my keen interest in taking up this work.
I have scraped 100s of websites, including well-structured as well as not-so-well-structured ones, including open as well as those that need automated logins etc. employing various techniques as best suited for the site in question.
In general, my scraping tasks are handled like:
1. Download all pages to be scraped to a local directory, using PHP/Curl combination. I sometimes just use httrack if the site is truly open.
2. If the pages are well-structured, use PHP to parse out the required content using Regex.
3. If the pages aren't too well structured, use a JS based solution for DOM based parsing - making use the lenient DOM parsing implementation provided by browsers.
In this case, since the target data includes a field that is HTML, i suggest the latter approach (DOM based parsing) as it is likely to give way better results and is much easier to implement.
I would love to be given an opportunity to show my expertise in scraping. I can hand over the data in 4 days.
Looking forward to working with you.
Tks
Veerer