Build a web scraper to retrieve specific website information on a regular basis
£10-80 GBP
Płatność przy odbiorze
Overview
For this project several bits of information must be gathered from a website and saved into a MySQL database. The information type is always the same but the parameters are different.
In most cases the website has APIs for retrieving the data and this should be the preferred way of getting the data.
Please see attached PDF for the exact data required.
Requirements
- Project needs to be created in Java.
- Application needs to use the Selenium Framework and Chrome/Firefox in either window mode or headless mode.
- All calls must be parameterised and should work for all Leagues, Teams, Players, Matches.
- APIs from the target website must be used only, no scraping of HTML. In some cases the data is embedded in the code using JSON, it is acceptable to retrieve the data from Javascript variables or using Javascript calls if APIs are not available.
- API endpoints seem to be available only after visiting the main pages, so some protection exists, possibly based on cookies. So this needs to be circumvented.
- Data will be saved in a MySQL database with tables per each dataset. Eg. Players, Teams, Leagues, Matches.
- IDs must be used and preferred as opposed to Strings. IDs from the target website must be used as opposed to locally created auto-increment IDs.
- For each ID stored in the database an associated table must exist with the String value. Eg. Player ID – Player Name.
- So tables need to be normalised.
Numer ID Projektu: #18194755