Scrapy project to download the contents of any given website
$30-250 USD
Płatność przy odbiorze
Hello
I need you to write a scrapy project which will download the contents of any given website. It should follow all internal links. It should not follow any external links.
So the user enters a website, e.g. [login to view URL], and your code will go and download that entire website. Something like this:
[login to view URL] '[login to view URL]'
The content should be saved in three ways:
1. The entire site saved in a folder, with separate files for each page. So let's say the website '[login to view URL]' only has two pages, /about/ (or something like [login to view URL]), and /services/ (or something like [login to view URL]). The HTML contents of these two files should be saved as [login to view URL] and [login to view URL] in a folder called 'testcrawl'. Please also save the CSS and JS files if possible.
2. Very similar to (1) above, except the HTML should be stripped so only the text from each page is saved. It should be in a folder called 'testcrawl-text'.
3. Save the text from all the website's pages in one file only. So if we use the [login to view URL] example, the text (stripped from the HTML) from /about/ and /services/ would be saved in a folder called 'testcrawl-combined' in a single file called textcrawl.txt. I do not need the CSS or JS files for this option.
Any questions, just ask.
Note automated responses telling me about your PHP and graphic design experience will be ignored. Also please only bid on this project if you have used scrapy and python before. I have already had to cancel two previous iterations of this project as the winning bidders eventually admitted they have never used scrapy before and don't know how to install it...
Thank you.
I should add this is a python 3.x project.
Numer ID Projektu: #11300174
O projekcie
18 freelancerów złożyło ofertę za $201 w tym projekcie
Dear Sir, I'm very much delighted to let you know that i did data scraping with PHP-cURL, PhantomJS, Node.js, Selenium from many sites. I just scraped the data from web site and then wrote the data in mysql database Więcej
Hi! I would prefer Python 2.7 if possible. Python 3 is quite "raw". But anyway please contact me and we will discuss your project.
Hello sir, Here I want to help you with expertise of advanced data mining, web scraping, DARK web research and excel skills. I am able to collect any information from any restricted website or any specific informatio Więcej
Hi, I’m a London-based software engineer with more than 10 years of experience working on enterprise level systems and consumer applications. I have a lot of experience with scrapy or hetrix platform for web conten Więcej
i have gone through your requirement we done similar kind of job before looking forward your earliest Reply on this for a project discussion Awaiting for your earliest reply
Hi, I'd be happy to help you with this project. I'm new to freelancer but I've very experienced with python scripting. Unfortunately I don't know scrapy very well but I'm very confident I could do this all in pytho Więcej
I am a developer with extensive experience in Python 2.x and 3.x projects from scripts to web applications. I have used Scrapy in the past to familiarize myself with an employer's websites (e.g. creating site maps and Więcej
Hi, I work in a investment firm in London. My daily job is text mining which includes using python to scrape content from internet and do the analysis. I have done something very similar before. The main concern Więcej
i am very experienced in python(worked for 4 years), in scrappy i admit i am beginner but i managed to install it :). I also made some tutorials and it worked very well.