I need a web based app that will accurately OCR scanned pdfs. The uploading of the pdfs will be done through a web interface. The app will poll a directory of pdfs and any new pdfs that gets dropped into the directory will immediately be OCR'ed into a text file. The OCR engine must be able to deal with scans and images.
Once the file is OCR'ed, the app must find a dynamic list of user supplied regex expressions and output the results into a csv file for each pdf.
The polling can be a cron job or daemon, I don't care, but you need to instruct me on how to set it up.
The app can be done in php or rails (preferably rails).
The web interface must use bootstrap or [login to view URL]
Before I award you the project, I want to see a sample app that can OCR the scanned pdf. Once I am satisfied that your solution can output decent text that matches the scans, I will award you the project 50% down and 50% upon completion and code transfer.