Find Jobs
Hire Freelancers

Database Cleanup & Website Ripping

$500-1000 USD

Anulowano
Opublikowano około 17 lat temu

$500-1000 USD

Płatne przy odbiorze
This project has 2 parts. Part 1 i need done first and has about a $100 budget and i need done immediately, it is a test. Part 2 I need done second and has about a $900 budget and i need done in 2 weeks. The link to download the part one data does not work, if i like your bid, i will send you data to look at but all the fields are aligned, there are just a bunch of sheets and need to be merged and cleaned. Actual Instructions are attached, READ IT IS A LOT OF LITTLE THINGS! I have a bunch of databases that need to be cleaned up and merged and purged. Make sure it is done carefully. You will get data to look at after we talk. Project Explanation: A) Files: there are 4 excel files and a second zip file. - [login to view URL] , [login to view URL], [login to view URL] and [login to view URL] - RichieMaster, BigData & TheList are the major databases. The file files within [login to view URL] you need to double check about the BigData file as they contain 4 of the pages in BigData cleaned up a bit. Merge those files in with the associated page in BigData first. Website to rip along with cleaning and merging into the database - [login to view URL] is a sample that you will need to use to rip one the Who's Who database (<[login to view URL]>) which is another database that i need ripped, the format and data required out of it is standard like in [login to view URL] - go through the who's who directory in there, the company info is available at the top and then you need to click through on personnel to get the data on the individual employees, i need records created for all individual employees listed in the database. This is the only database that I need ripped as part of data cleaning. There are about 800 records in there. ## Deliverables - Data Cleaning Directions: - Special instruction for RichiesMaster file - If any record does not contain at least a first name, last name or company name, the please scan the email addresses in that record and if the email address is sales@domain, marketing@domain or webmaster@domain then remove the record from the file. This ONLY applies to [login to view URL] and only for records that do not contain a name or company name General Instructions - Data Cleaning Directions: - Special instruction for RichiesMaster file - If any record does not contain at least a first name, last name or company name, the please scan the email addresses in that record and if the email address is sales@domain, marketing@domain or webmaster@domain then remove the record from the file. This ONLY applies to [login to view URL] and only for records that do not contain a name or company name General Instructions Note; Use Email Address field as master key field 1) If in any file there nothing in the TYPE field, Unknown needs to be entered into the TYLE field for that record. Do not enter Unknown into any field other than TYPE 2) If in any file, there is nothing in the source file, then the name of the excel file it is in should go in source - for example if a record is in [login to view URL] and has nothing in the source file then BigData should go in the source column for that record 2) If there is nothing in the last name field, then the first name field should be checked whether there is Mrs, Mr, Ms, Miss as part of the first name, if any of those prefixes are there leave it alone and donï?'¿?'½t attempt to reformat the record a. If there is no prefix, separate the first name from (noted by when there is a space between names) and leave the first name in the FN field and whatever else is in the FN field currently after the first name and put it in the LN field - basically we are trying to separate First and Last Names where they are combined. First Name and last are sometimes combined into one, we want to sort it out as well as possible. 3) If there is no name at all and only email address is present then the source should be EmailOnly and not the name of the excel file in the source field 4) Within the [login to view URL] there is a worksheet where the name, company name and title are all in the same field, please separate into separate fields 5) within [login to view URL] there is a worksheet where the address is all in 1 field instead of in 4 fields, please separate so the address is in 4 fields not one 6) All worksheets from all spreadsheets should be cleaned according to the instructions above and merged into a single access file and all duplicates removed. 7) Once it is in access, we need to have any records with the following names, company names or domain names in the record removed from the database (ie: if adotas is in name, company name or anywhere in the website or email address remove the record) - [login to view URL], [login to view URL], [login to view URL], [login to view URL], vizi, vizimedia, adotas, [login to view URL], adotas, moskowitz, andrew moskowitz, doyle, tom doyle, pace, lord pace, pesach, lattin, ruchit, shah, hecker, vizidirect, icon, iconadsol, icon advertising, paul mush, adbumb, bumb, [login to view URL], hecker, peggy, spolar, spolarized, dunham, marc levy 8) Once the database has been cleaned up, I need the final product in the following formats - an access database - a excel file with multiple worksheets (each sheet with 100% unique entries) - as a single .csv file 9) Eventhough it should be merged together as a master database, Any record that does not have anything in First Name, Last Name or Company Name fields, should be classified as EmailOnly in the source column. - all records classifieds as EmailONLY needs to be merged in to make sure there isnï?'¿?'½t a duplicate record in the files or to make sure that there isnï?'¿?'½t a record with the same email but additional information - however once the merge/purge is complete, all EmailONLY records should be stored in a different database than the records with more data points - Go through all of the files and try to merge everything together. In the end all records that do not contain an email address should be saved in a separate database - in the end I actually need 6 files (3 access dbases, 3 excel files and 3 csvï?'¿?'½s files, one with the emailonly records and one with more complete records but there should be no duplicate email addresses between the 2 files), and one with records that contain no email addresses 10) Special Instructions - within [login to view URL], there are 4 files, Mixx, AdTech, Imedia & Era that do not contain email addresses .. additionally there are scattered records throughout that do not contain email, any record that does not have an email address, try what is written below to find itï?'¿?'½ - you need to run a script that will search for the company within the rest of the database (use your intelligence about how to structure it) and if another record is found with that company, figure out the email address format and apply it to the file so that we can "create" email addresses for everyone in those files - if that doesnt work, can you figure out the website of the company and spider/crawl the website for email addresses and then apply that format to those contacts * Part 2 - Site Ripping For site spider contacts we want to try to focus on getting sales or marketing or business development and we only want stuff ripped from pages accessblle off the home page and direct links from home page like contact us page so we dont get stuck with any spam traps, i want corporate phone number and corporate address and preferably one contact name and email from sales or marketing or business development Everything here should be ripped into separate files as well as one merged file. Do NOT merge this everything you are cleaning up above. 1) [login to view URL] top 250,000 sites - alexa rank, whois info, site spider for postal/phone/emails/contacts 2) [login to view URL] - the list all 30,000 or so publishers within the site, rip out list of all publishers + all data available on each one, whois data, site spider for postal/phone/emails/contacts along with alexa rank for each site 3) <[login to view URL]> has a site list of 1200 sites, need alexa rank, whois, site spider for postal/phone/emails/contacts along with alexa rank for each site 4) I need all the contacts that can be extracted from [login to view URL], [login to view URL] & [login to view URL] from within the forums only and i do not want any contacts that could be an admin or employee of any of those 3 sites 5) <[login to view URL]> - they have a supplier directory - i want every listing, information that is in the supplier profile, then go to the supplier site and gather contact information from a site spider and whois data on each supplier - it has contact name, address, phone and website for many of them, then we need to get email address only from the website 6) <[login to view URL]>- i need the whole affiliate program directory ripped out ## Platform 8) Create a [[login to view URL]][1]account as a pubslisher, once you are a publisher you should be able to send an email to the affiliate manager of every affiliate program in there, i need the name of the affiliate program and email address of the affiliate manager and website of the affiliate program (if you can get other contact details from within [login to view URL] that would be great as wel on each advertiser - they have like 20,000+ advertisers) 9) <[login to view URL]>- every afiliate program on there should be spidered to grab contact information from the website - postal/company name/website/contact name/phone/email - same as whenever we spider any other site 10) [login to view URL] - do the same here as you did with [login to view URL]
Identyfikator projektu: 2838651

Informację o projekcie

Zdalny projekt
Aktywny 17 lat temu

Szukasz sposobu na zarobienie pieniędzy?

Korzyści ze składania ofert na Freelancer.com

Ustal budżet i ramy czasowe
Otrzymuj wynagrodzenie za swoją pracę
Przedstaw swoją propozycję
Rejestracja i składanie ofert jest bezpłatne

O kliencie

Flaga UNITED STATES
ny, United States
3,5
1
Członek od lis 10, 2005

Weryfikacja Klienta

Dziękujemy! Przesłaliśmy Ci e-mailem link do odebrania darmowego bonusu.
Coś poszło nie tak podczas wysyłania wiadomości e-mail. Proszę spróbować ponownie.
Zarejestrowani Użytkownicy Całkowita Liczba Opublikowanych Projektów
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Wczytywanie podglądu
Udzielono pozwolenia na Geolokalizację.
Twoja sesja logowania wygasła i zostałeś wylogowany. Proszę, zalogować się ponownie.