Scrape Data from 5 different websites(Need to Learn Scraping data in php and python)

Anulowany Opublikowano Feb 20, 2016 Płatność przy odbiorze
Anulowany Płatność przy odbiorze

Basically my father has to go to 5 different websites to go watch his online videos i want to be able to grab any information from any part of a website and more importantly

grab an array of all the items then run little loops and that to extract the first second third fourth and fifth things type of thing within the bigger array to add to the arrays to be placed in for the moment file on the computer as a html file for accessing later.

Now i want someone to teach me the fine arts of web scraping so i can put together one webpage based on what i will scrape from these websites.

This is a small thing but something i need to do both for this and an upcoming website that could be worth a bit should i get it scraped in time but its better i learn scraping for python and php so i can apply this to my own websites i can in theory use php to enter data into a mysql database that stuff is easy to do if you have the data. i can even learn to hack my own wordpress theme with that data but i need to get the data before i can do any of that plus if someone knows wordpress plugin integration that would help me with my projects.

At the moment a tutorial to scrape this website for both python and php would be appreciated main site is

With python and beautiful soup i can get down to

<div class="section-programs">

<p class="episode" data-keywords="abc3">

<a style="color:#262626" href="/programs/yoohoo-and-friends/ZX6514A015S00" title="Series 1 Ep 41 Stoney Island">YooHoo And Friends</a>

<span style="color:#6f6f6f"> - 15 episodes</span>

with the following code

from BeautifulSoup import BeautifulSoup

import requests

url ="[url removed, login to view]"

r = [url removed, login to view] (url)

soup = BeautifulSoup([url removed, login to view])

paragraph_number = len([url removed, login to view]('p', attrs={"class":"episode"})) paragraph number for looping

current_paragraph = [url removed, login to view]('p', attrs={"class":"episode"})[0] current paragraph

php i havent tried sucessfully to pull anything but the main site [url removed, login to view] but this is one of 5 or so i need to scrape and the basic information is similar to above code i need the abc3 in the datakeywords as channel the href with a base url added to that and with the above data apply it like the a text YooHoo And Friends and append the title data to that so Yoo Hoo And Friends Series 1 Ep 41 i can add and change ep to episode and the like i believe but i need to grab them oh yeah and i need to grab the span tag specifically the 15 episodes this area will create the loop number for that link for how many hidden episodes of that program there are and then every link grabbed from the first list if more then 1 episodes are in that span area then the resulting links are parsed and they go in a got links area value in the links that have more episodes the links are treated the same way they get created and the hrefs with a base url are compared to already got links if they arnt in there added and finally in the programs page there are some images i need to steal aka channel images and the like then a big list is made with a <a href ="abc links" title="abc program titles">program name series episode etc</a>

all links are then put into a file then the next site.

But this is basically what i need from someone so i can scrape pages get the main page i can do that on php and python. python i can get down to an array of paragraphs with all the info i need to get php i cant even get the first element of the items i need.

So anyone who can teach me php and python scraping i would appreciate it.

David Beams

MySQL PHP Python Pozyskiwanie danych z Internetu

Numer ID Projektu: #9715827

O projekcie

6 ofert Zdalny projekt Aktywny Feb 27, 2016

6 freelancerów złożyło ofertę za $64 w tym projekcie

mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$250 AUD w ciągu 5 dni
(309 Oceny)
7.4
adeelpirzada

Hi there i have done scrapping almost on Half of Worldwide web including ecommerce giants(Amazon,ebay,craigslist) News Feed, Social media websites, API's. I develop my own tools based on client requirements with Muli Więcej

$25 AUD w ciągu 1 dnia
(18 Oceny)
5.5
Sanky92

A proposal has not yet been provided

$30 AUD w ciągu 5 dni
(0 Oceny)
0.0