Discovering common phrases in multiple blocks of text

Zamknięty Opublikowano Jul 12, 2006 Płatność przy odbiorze
Zamknięty Płatność przy odbiorze

This project will build a tool that will efficiently find common phrases in a large volume of discrete text blocks. The text blocks will be read from a database, and the set of available blocks will grow continually. The tool must be able to use it’s knowledge of the existing set of blocks to process incoming blocks efficiently and find phrases that occur multiple times anywhere in the entire set of text blocks. Most text blocks will be around 1000-6000 words in length, although some may be significantly shorter or longer.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

Requirements:

* Use an efficient “Order N?? algorithm, which is to say that it’s processing requirements will scale linearly (or sub-linearly) as proportional to the quantity of text being processed

* Be able to run inside of 4GB of RAM regardless of the volume of text being processed

* Capable of operating in a 64-bit Linux environment

* Be capable of using multiple processors concurrently by using multiple threads or processes on each system

* Be capable of running concurrently on multiple systems, preferably using a MySQL database as a control and locking mechanism

* Written in the developer’s choice of perl, java, python or c++

* Able to scale to millions of text blocks in the dataset

* Ability to stop the application and restart at the same point

* A variable setting for the number of words that define a "phrase"

*

## Platform

Linux - 64-bit

Inżynieria Linux MySQL PHP Architektura oprogramowania Testowanie oprogramowania

Numer ID Projektu: #3641373

O projekcie

10 ofert Zdalny projekt Aktywny Aug 22, 2006

10 freelancerów złożyło ofertę za $1522 w tym projekcie

SovDyn

See private message.

$850 USD w ciągu 30 dni
(83 Oceny)
8.2
etags

See private message.

$552.5 USD w ciągu 30 dni
(45 Oceny)
5.9
vw1852498vw

See private message.

$5610 USD w ciągu 30 dni
(4 Oceny)
5.3
javaj2eeoracle

See private message.

$850 USD w ciągu 30 dni
(8 Oceny)
3.4
infostarvw

See private message.

$552.5 USD w ciągu 30 dni
(3 Oceny)
3.3
bytefoundryvw

See private message.

$1062.5 USD w ciągu 30 dni
(9 Oceny)
2.9
vw2141512vw

See private message.

$2975 USD w ciągu 30 dni
(3 Oceny)
0.6
meetinfotech

See private message.

$552.5 USD w ciągu 30 dni
(0 Oceny)
0.0
alex5555vw

See private message.

$1190 USD w ciągu 30 dni
(0 Oceny)
0.0
davidrn

See private message.

$1020 USD w ciągu 30 dni
(0 Oceny)
0.0