English-Arabic scrapping and identifying (Python script)
$100-500 USD
Płatność przy odbiorze
You will be given 15 websites of newspapers having articles in both English and Arabic (written ina TXT file).
Your Python should do the following:
1) download locally those websites (only HTML/TEXT content, nothing binary); each newspaper will have its own main folder.
2) using a dictionary, identify the parallel texts (basically we need to know which text in Arabic corresponds to which text in English)
3) extract only the article text for both English and Arabic and place them in a separate folder (one folder per language pair).
Numer ID Projektu: #3196458