Find Jobs
Hire Freelancers

Data comparison in batch

$250-750 USD

Zamknięte
Opublikowano około 10 lat temu

$250-750 USD

Płatne przy odbiorze
Develop a mechanism & software to identify similar content in a huge base of articles. Input format from csv flat file. Output should tell which entries are similar with indicator of "similarity strength" Language for this program is flexible as long as it deliver the result. If you are interested, please give me a message and let me know how you want to start this. I can give you examples and our detail requirements.
Identyfikator projektu: 5489126

Informację o projekcie

27 ofert
Zdalny projekt
Aktywny 10 lat temu

Szukasz sposobu na zarobienie pieniędzy?

Korzyści ze składania ofert na Freelancer.com

Ustal budżet i ramy czasowe
Otrzymuj wynagrodzenie za swoją pracę
Przedstaw swoją propozycję
Rejestracja i składanie ofert jest bezpłatne
27 freelancerzy składają oferty o średniej wysokości $527 USD dla tej pracy
Awatar Użytkownika
Dear sir, I am really interested in development of this project, I have strong programming skills in several languages, so I have many options to develop this application. Thanks and regards, Yasser
$250 USD w 2 dni
5,0 (132 opinii)
7,1
7,1
Awatar Użytkownika
Hello. I'm interested in your project since I have experience in searching similar content in databases. Please give detailed requirements. Thanks.
$600 USD w 20 dni
5,0 (33 opinii)
6,1
6,1
Awatar Użytkownika
i can make this as a python script . .
$450 USD w 10 dni
4,8 (79 opinii)
6,2
6,2
Awatar Użytkownika
This has a lot to do with my work for my Master's thesis which was in the field of Artificial Intelligence applied in Linguistics. If the article are in English, my first idea would be to first do a part of speech tagging and then only compare the sets of words that are relevant to your purpose - for example, proper nouns. After that, something like a modified version of the Lesk algorithm might show some good results. Would you rather this to be something that you can run from a server, like a PHP script, or a Windows program? How many articles are there? Can you give me some examples of entries that you consider similar and entries that you don't consider similar? The similarity measure is a subjective function.
$600 USD w 10 dni
5,0 (45 opinii)
5,7
5,7
Awatar Użytkownika
I might be able to this project using locality sensitive hashing or compressed sensing methods, depending on the details of you dataset. Please send me a few examples and I'll let you know if this is possible.
$333 USD w 10 dni
5,0 (43 opinii)
5,9
5,9
Awatar Użytkownika
Hey There, Thanks you for posting the project overview. It looks very feasible and I am interested to do it. Next steps: Lets discuss more about the requirements/data input/output and and and I start the work accordingly. I am an excel/access VBA automation professional (Data Analyst) having more then 5+ years of experience in the same domain. Please consider and contact me for further discussion. I am available online to take any further queries. Thanks, Abhinav
$283 USD w 5 dni
5,0 (53 opinii)
5,7
5,7
Awatar Użytkownika
Hi, I have more than 14 years of exp and I am expert in this kind of work. I have completed more than 200 projects. Please look at the feedback left by my employer to know more about my work. Waiting for your positive response. Thanks.
$277 USD w 10 dni
4,9 (89 opinii)
5,9
5,9
Awatar Użytkownika
I would like to give this a try. Please send me details on "similarity strength" and some sample input data. I will try to come up with a prototype in 2-3 days.
$277 USD w 5 dni
4,7 (21 opinii)
5,7
5,7
Awatar Użytkownika
Hi I've completed many projects before but I'm not very sure what you need now. provide some examples and your needs so that I can understand them clearly. Thanks Zhining
$444 USD w 10 dni
5,0 (26 opinii)
4,3
4,3
Awatar Użytkownika
I have clearly read and understood your project requirements. I have a rich experience of Team Lead for 2+ years with a total experience of 6+ years. I am responsible for managing teams, writing Frameworks and Scripts in Python. I have recently completed several Projects in Python on oDesk, Elance and Freelancer with excellent (5 star) rating. I am in Top 10% (11th rank) amongst Python test takers at oDesk, Elance and Freelancers. Assure you of accurate and on time delivery of work with utmost quality. Please see my profile and portfolio. I assure you I am the one you are looking for as a Python Developer. Looking forward to work with you. Thanks, Vikas
$250 USD w 7 dni
4,6 (5 opinii)
4,0
4,0
Awatar Użytkownika
Hello, I have a good experience with Python and Ruby and I wish to know more about this project. Can you please send me some more details.. Thanks Vinod
$555 USD w 30 dni
5,0 (6 opinii)
3,8
3,8
Awatar Użytkownika
Greetings. You have interesting project and I suppose to use Perl for data comparison program development on web base. I am ready to help you and solve your task in time and in budget.
$530 USD w 25 dni
4,8 (6 opinii)
3,6
3,6
Awatar Użytkownika
Hello, I am interested to work with you on this project. I would choose C++ for this project as it is a quiet fast programming language. I would need more details about the "similarity strength" and what exactly you would expect it to be. I hope we could have a nice experience on working on this project. Respectfully, Grig
$388 USD w 10 dni
5,0 (9 opinii)
3,6
3,6
Awatar Użytkownika
I have check this requirement,have some query,so need to discuss this,please tell me how we can start the discussion. to know more about us please check Private Message. We have a team of professionals,they have more than 11 year of experience,so we can manage this work and will give you quality solution.
$600 USD w 17 dni
2,5 (37 opinii)
6,8
6,8
Awatar Użytkownika
Hi, I have been developing in python for over 10 years, and have experience in Natural Language Processing. To accomplish this, I intend to create a statistical model based on the word distribution of your articles and use that as a comparative metric. This will remove the need to do a pairwise comparison of every file (while is impossible for a large data base) and will be rather quick. If you have any questions or wish to see some of my work, don't hesitate to ask. Thank you, Chris
$555 USD w 4 dni
4,6 (2 opinii)
3,6
3,6
Awatar Użytkownika
I have a masters degree in applied mathematics and have worked with these kinds of problems previously. I will probably use Python or Ruby for the data processing.
$888 USD w 30 dni
5,0 (2 opinii)
2,3
2,3
Awatar Użytkownika
use python3 to do this. we can test it first. running in windows or linux. also can have gui to dispaly result.
$333 USD w 7 dni
0,0 (0 opinii)
0,0
0,0
Awatar Użytkownika
Hello, I implemented a similar project a few years ago, but it was a bit more complex. It processed thousends of textual documents with a bunch of distributed computers. Obviously I have enough experience with Information Retrieval techniques. Based on your description, I would first automatically clean up the document (remove punctuation etc.) and then extract the pure words. These words a combined to n-grams (for n=1, .., m; with a user defined m), weighted with "term frequency - inverse document frequency" and finally the documents are compared with cosine similarity. This produces a score from 0 (not similar) to 1 (equal). Based on the score, it is possible to identify all documents DS wich are similar to D by thresholding the score. And of course it is possible to identify the k most similar documents. I would implement it with Python and the scikit-learn package (BSD licens). If you have any questions, do not hesitate and send me a message. Sincerely, Sebastian
$500 USD w 3 dni
0,0 (0 opinii)
0,0
0,0
Awatar Użytkownika
A proposal has not yet been provided
$444 USD w 10 dni
0,0 (0 opinii)
0,0
0,0
Awatar Użytkownika
Hello, I would make it with Python for sure. You can consider putting all the data in a relational database (eg. sqlite) to speed up search queries, repeat queries, analyze data etc. Because CSV files is not handy for that.
$555 USD w 3 dni
0,0 (0 opinii)
0,0
0,0

O kliencie

Flaga HONG KONG
Hong Kong, Hong Kong
5,0
35
Zweryfikowana metoda płatności
Członek od lut 23, 2010

Weryfikacja Klienta

Dziękujemy! Przesłaliśmy Ci e-mailem link do odebrania darmowego bonusu.
Coś poszło nie tak podczas wysyłania wiadomości e-mail. Proszę spróbować ponownie.
Zarejestrowani Użytkownicy Całkowita Liczba Opublikowanych Projektów
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Wczytywanie podglądu
Udzielono pozwolenia na Geolokalizację.
Twoja sesja logowania wygasła i zostałeś wylogowany. Proszę, zalogować się ponownie.