Distributed web page scraper (preferably on EC2)

Zamknięty Opublikowano Aug 26, 2010 Płatność przy odbiorze
Zamknięty Płatność przy odbiorze

As input to your script, I have a list of about 1M URLs. I want these URLs scraped, and inserted into a database. You do NOT need to recursively crawl the URLs. You just need to retrieve them.

I want a distributed scraper. In particular, I want to give a parameter N, and have the script automatically provision N scrapers, maybe N different Amazon EC2 instances, or some other cloud service. The N instances should avoid doing the same work.

I don't care you write a wrapper script around Scrapy, or another existing web scraper implementation. You can do this if you already know Scrapy or Bixo and want to use it.

The script should really require very little configuration. It should be convenient and one-click if possible. That way, the next time I have a batch of 1M URLs, I can easily run your script.

Amazon Web Services Inżynieria Java Linux Zarządzanie projektem Python Instalowanie skryptów Shell Script Architektura oprogramowania Testowanie oprogramowania

Numer ID Projektu: #3680209

O projekcie

13 ofert Zdalny projekt Aktywny Dec 16, 2010

13 freelancerów złożyło ofertę za $217 w tym projekcie

ddemidenko

See private message.

$255 USD w ciągu 14 dni
(72 Oceny)
6.1
johnweavervw

See private message.

$170 USD w ciągu 14 dni
(55 Oceny)
5.3
mlys

See private message.

$254.15 USD w ciągu 14 dni
(31 Oceny)
5.4
happytron

See private message.

$212.5 USD w ciągu 14 dni
(9 Oceny)
4.8
happydotnet

See private message.

$235.45 USD w ciągu 14 dni
(17 Oceny)
4.3
app2technologies

See private message.

$255 USD w ciągu 14 dni
(16 Oceny)
3.9
readyfacts

See private message.

$212.5 USD w ciągu 14 dni
(32 Oceny)
4.2
kwovw

See private message.

$254.15 USD w ciągu 14 dni
(2 Oceny)
3.9
quintonwebz

See private message.

$204 USD w ciągu 14 dni
(6 Oceny)
3.6
napoleonmr

See private message.

$255 USD w ciągu 14 dni
(2 Oceny)
2.8
richmondcd

See private message.

$127.5 USD w ciągu 14 dni
(2 Oceny)
0.7
woolee

See private message.

$170 USD w ciągu 14 dni
(0 Oceny)
0.0
bryano

See private message.

$212.5 USD w ciągu 14 dni
(0 Oceny)
0.0