Find Jobs
Hire Freelancers

Creating a large dataset by crawling some public website

₹1500-12500 INR

Zamknięte
Opublikowano około 6 lat temu

₹1500-12500 INR

Płatne przy odbiorze
I need a large dataset in JSON format, to upload in a MongoDB database. The contents can be anything, but should be meaningful. I need between 500 MB and 5 TB of data to be generated. The data will be used for some training demonstrations. I want someone to write a program that crawls some website for publicly available data (such as books and reviews from some e-commerce site; news articles from some news sites; hotels and reviews from some travel site; restaurants and reviews from some food aggregator site; articles from wikipedia, etc). I don't need you to send me the data. I need you to write a program I can run at my end to download the data. But the program must store it in a JSON format that can be directly imported into MongoDB. The structure could be flat JSON documents, or documents that contain embedded documents. Individual documents may be anywhere in the range from 100 bytes to 100 KB. No individual document should be bigger than 100 KB in size. We'll have to discuss together to decide the site from which the data is to be downloaded. There should be no violation of any data access policies of the site. This is very important for me; I don't want us to break any law. I will need an assurance from you on this, and a link to the data access policies of the site, if available. Once we agree on the site to download the data from, you will write the program, test it at your end, send me some sample data, and once approved, send me the program for me to run at my end. If I run into any difficulties while running the program I would require you to support me. The program should allow me to choose the approximate data size (such as 500 MB) after which it will stop crawling any further to download the data.
Identyfikator projektu: 16917471

Informację o projekcie

7 ofert
Zdalny projekt
Aktywny 6 lat temu

Szukasz sposobu na zarobienie pieniędzy?

Korzyści ze składania ofert na Freelancer.com

Ustal budżet i ramy czasowe
Otrzymuj wynagrodzenie za swoją pracę
Przedstaw swoją propozycję
Rejestracja i składanie ofert jest bezpłatne
7 freelancerzy składają oferty o średniej wysokości ₹9 253 INR dla tej pracy
Awatar Użytkownika
I am an expert nodejs/Javascript developer with good experience. I have worked a very good data scraping and crawling script with nodejs. I am interested in working on your project and also available for ongoing support and development. Please contact me via chat to discuss the details.
₹12 000 INR w 4 dni
5,0 (5 opinii)
5,7
5,7
Awatar Użytkownika
I have done many crawling projects. On of my interesting project is webdb, a mongdb 9.1GB collection of URLs from online search engines. I crawled 2 million words on Google by maintaining policy using proxy servers. I have very hand on experience in python, Java and many languages and used scrapy, request, BS, lxml and selenium many times. Please let's make it together.
₹5 555 INR w 2 dni
5,0 (8 opinii)
3,9
3,9
Awatar Użytkownika
I have more than 10 years of experience in data scraping and extraction. Kindly message me so we can decide the website from which the data will be scraped.
₹12 222 INR w 3 dni
4,9 (3 opinii)
1,6
1,6
Awatar Użytkownika
Hi, I am a senior developer from Czech Republic with 10 years of experiences with Python on Windows or Linux, C/C++ and much more. I love precision and i am applying this in my work. I am sure that i can do the best for you, cause i want to start career as freelance and this job should be great for my good name and honor. Because of this i can offer you maximum of my time, all my knowledge and experiences. So...lets do it ;) With regards, Jan
₹7 777 INR w 3 dni
0,0 (0 opinii)
0,0
0,0
Awatar Użytkownika
Hello Sir, I have read your Requirements and after reading them i can see that i already have written code similar to the piece of code you need, my code downloads tweets which have famous celebs mentioned in them and then stores it text file in json format, i can do same for you with twitter or Wikipedia or another website which suits you Please send me a message in the chat so i can describe it more to you .Hoping to hear from you soon.
₹5 555 INR w 3 dni
0,0 (0 opinii)
0,0
0,0

O kliencie

Flaga INDIA
Mumbai, India
5,0
10
Zweryfikowana metoda płatności
Członek od cze 7, 2012

Weryfikacja Klienta

Podobne prace

Website Data Extraction to CSV
₹600-1500 INR
MongoDB Database & Query Expert
₹1500-12500 INR
Resolve socket.io Issue in pm2 Cluster
₹1500-12500 INR
Dynamic API Wizard - JavaScript node js Expert
$250-750 USD
Python Professional for Google Cloud's Pub/Sub service
$30-250 USD
B2B White-Label Solutions Developer Needed (.Net, React, MongoDB, AWS)
$250-750 USD
MongoDB Expert for Pipeline Optimization
$20-50 USD
Fashion Clothing Website with Accounting System - 07/05/2024 14:12 EDT
₹12500-40000 INR
It will crawl and extract all itags from all images from Docker Hub, Quay.io, AWS ECR Public Gallery, Snapcraft, Flatpak, AppImages, Bundles, BitTorrent, EDonkey2000, SoulSeek and DC++, and it will build and publish deb packages (as APT repository).
€8-30 EUR
Node.js Backend Developer Needed
₹37500-75000 INR
ML Detection algorithms Assignment
₹600-1500 INR
Automatically update web from WhatsApp
₹12500-37500 INR
Speed Optimize MongoDB Queries in Application
₹750-1250 INR / hour
Thai Text Typing Software Development
$3000-5000 USD
Python Game Developer Needed
$30-250 AUD
Bluehost VPS Deployment for Node.js/Vue.js Website
$10-30 USD
Recreate Cryptobubbles.net with New Data & Design
$30-250 USD
CAB Service App UI Overhaul
₹75000-150000 INR
Telegram Bot Development with Specific Design
$50-100 USD
python chrom headless script which will visit given list of URLs/websites aftear each config delay in seconds. script will use different ip in each iteration of list urls visit. it will keep running untill ip list end and start from 1 again.
$10-30 USD
Dziękujemy! Przesłaliśmy Ci e-mailem link do odebrania darmowego bonusu.
Coś poszło nie tak podczas wysyłania wiadomości e-mail. Proszę spróbować ponownie.
Zarejestrowani Użytkownicy Całkowita Liczba Opublikowanych Projektów
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Wczytywanie podglądu
Udzielono pozwolenia na Geolokalizację.
Twoja sesja logowania wygasła i zostałeś wylogowany. Proszę, zalogować się ponownie.