Software to compare PDF files

  • Status: Closed
  • Nagroda: $600
  • Uzyskane Zgłoszenia: 7
  • Zwycięzca: carlquist

Opis Konkursu

This contest is to compare multiple PDF files based on the similarities of bounding boxes. This is not an easy contest and will require understanding of PDF libraries.
There are many PDF libraries available and it is not important which one is used.

Features required:
Upload multiple PDF files (many).
Converts PDFs to PNGs with bounding box squares
PNG with bounding boxes shown - user selects which bounding boxes are of interest. Can select multiple bounding boxes.
Software then searches ALL the original PDFs - to find which files have the same bounding boxes.

Matches must be based on either:-
1. Approximate co-ordinates of the bounding boxes and the respective page number. Leaving room for 3% error in placement of bounding boxes.
OR
2. Image match the area of the bounding box. It means for each match from (1) that another step must also convert that bounding box to a PNG file and do an image comparison - if almost identical images then it returns as a match.

The end result is the software shows a list of links to download that contain the PNGs\PDFs of the files with ONLY the same bounding boxes.

The winner will be asked to add a module to:-
-Enable the placement of another PNG image over any PDF image and re-write the PDF image. Many github libraries can do this.

-Put the bounding box through tessarect and do OCR text search in addition to the simple bounding box co-ordinate comparison. This would produce another criteria to match on.

So the winner can earn total $800+ from this Contest through the add on module.

Good Luck.

Please serious entries only. I have zero patience so only do this once it is fully working! I suggest you first message me your proposed methodology and I can then confirm your ideas will succeed.

Be quick!




I recommend using https://blueimp.github.io/jQuery-File-Upload/ to save time.

Some other ideas would be to convert the bounding boxes to SVG format and use an existing SVG comparison library.

Zalecane Umiejętności

Najlepsze zgłoszenia do tego konkursu

Zobacz więcej zgłoszeń

Publiczna Tablica Wyjaśnień

  • Asianexperts
    Asianexperts
    • 5 lat temu

    hehehe all thought to get this prize and disspointed

    • 5 lat temu
  • sunnyguptahotels
    Organizator Konkursu
    • 5 lat temu

    Please do not enter this contest! One contestant is extremely close to winning.

    • 5 lat temu
    1. danielvz96
      danielvz96
      • 5 lat temu

      :( How close? I've already implemented a bounding box finder (can find from the smallest detail to whole paragraphs), the bulk compare function and was working on the frontend when I saw this.

      • 5 lat temu
  • teachartdevteam
    teachartdevteam
    • 5 lat temu

    Hey there! I have an slightly different idea and I will be happy to discuss it with you. Basically what you think, does it make sense if the user draws the bounding boxes. Rendering a box to each object over the pdf might not be 100% useful, I saw tons of pdf's in the past with bad structure and arrangement which contain overlapping objects. This will result into overlapping bounding boxes. With the current way a recursive lookup must be implemented, each object must be extracted from the pdf and parsed. Each object must be parsed with different internal parser (itextsharp and pdfsharp work on that way) just to take the details like size and position.

    • 5 lat temu
    1. sunnyguptahotels
      Organizator Konkursu
      • 5 lat temu

      I see what you are saying. So which library do you propose to use for image comparison? And how would you extract the corresponding area from the other PDFs? Or does it need to compare the selected area in png against the entire png full pages of every PDF ?

      • 5 lat temu
    2. sunnyguptahotels
      Organizator Konkursu
      • 5 lat temu

      Speed is a big consideration. To do what you are describing - it may be neccesary to overlay the page with a 12x16 grid - and then find all 'touched' grid-boxes that the hand-drawn bounding box touches - so that it does the comparison more efficiently. but that seems to add more complexity to the exercise. adobe acrobat reader seems to get the bounding boxes right without much overlap.

      • 5 lat temu
  • ITPyramid85
    ITPyramid85
    • 5 lat temu

    At first, I want to see the pdf quality if it is possible for image processing or not. Can you provide pdf files you have?

    • 5 lat temu
    1. sunnyguptahotels
      Organizator Konkursu
      • 5 lat temu

      Assume that all the pdfs are generated from the same creation utility. The most obvious example is a bank statement. But - I think image comparison is missing the point - we want comparison by bounding box co-ordinates. So the 1st step is to find the alogirithm that Adobe uses to obtain the bounding-boxes. Most of the open-source utility treat ever character as a separate co-ordinate.

      • 5 lat temu
  • sunnyguptahotels
    Organizator Konkursu
    • 5 lat temu

    Hi Everyone.. please ask your questions here for everyone. If you don't know what a bounding box is in a PDF document then you should not attempt this contest. I don't have time to educate, sorry. No point explaining your experience - this is a guaranteed contest - if you understand the concepts in the brief then you may submit an entry. It's as simple as that. If you don't understand it then you do the basic work first and return with specific questions.

    • 5 lat temu
  • sunnyguptahotels
    Organizator Konkursu
    • 5 lat temu

    Hi Everyone

    • 5 lat temu
  • Codeitsmarts
    Codeitsmarts
    • 5 lat temu

    Hi, I have read your project description. I have few queries before I can begin the work. Can we discuss the same through chat? I shall endeavor to exceed your expectations.

    I have 5 years of experience in PHP, mysql, Codeigniter, Wordpress, Jquery, HTML, CSS ,Python and many more . Please see my portfolio for art work samples and my clients feedback.

    1 . http://www.astrologyindubai.com/
    2 . http://www.sweetspace9.com/
    3 . http://www.ngotiator.com/
    4 . http://www.shypon.com/
    5 . https://www.pixbrand.in/
    6 . http://www.etfmodelsolutions.com/
    7 . http://wricitieshub.org/worldtodresource/

    And I'm confident that I can complete your project on time and within your budget. I can achieve the results that you are asking for
    Please initiate chat for further discussion. I will do my best for you , with a Positive Hope! Regards

    • 5 lat temu
  • ITPyramid85
    ITPyramid85
    • 5 lat temu

    Also If you want to do the image searching, It will be normallized by special size so that it is needed image quality, pdf page amounts and it will give effect for searching speed

    • 5 lat temu
  • sprlabs9
    sprlabs9
    • 5 lat temu

    Hi, I would like to discuss. Please drop me a message.

    • 5 lat temu
  • dev681999
    dev681999
    • 5 lat temu

    I am probably wrong fell free to correct me

    • 5 lat temu
  • dev681999
    dev681999
    • 5 lat temu

    By reading the description this is what I have understood - You want a website where people can upload PDFs files. Then the PDF is converted to PNG which contains bounding boxes. These bouding boxes matches any other boxes from uploaded files. Then user can select bouding boxes to download.

    • 5 lat temu
  • sunnyguptahotels
    Organizator Konkursu
    • 5 lat temu

    It can be in PHP, Python, or C#. There must be a web-front end to accept the upload of the files so Java\VB are not suitable.

    • 5 lat temu
  • a6jack
    a6jack
    • 5 lat temu

    Dear,
    May we know which language (PHP, Python, C#, JAVA ...) this software should be written and is it will be a website or Desktop app?

    • 5 lat temu
  • sunnyguptahotels
    Organizator Konkursu
    • 5 lat temu

    Please submit a blank entry then it will allow me to message you.

    • 5 lat temu
  • desmondmile03
    desmondmile03
    • 5 lat temu

    Hi, please message me so I can discuss my proposed methodology. Thanks

    • 5 lat temu
  • ahsanfaheem3
    ahsanfaheem3
    • 5 lat temu

    Dear contest holder, kindly message me so I can discuss my proposed methodology. Thanks.

    • 5 lat temu

Pokaż więcej komentarzy

Jak rozpocząć z konkursami?

  • Opublikuj swój konkurs

    Opublikuj swój Konkurs Łatwo i szybko

  • Uzyskaj tysiące ofert

    Uzyskaj Tysiące Ofert Z całego świata

  • Nagródź najlepszą ofertę

    Nagródź najlepszą ofertę Pobieraj pliki - Łatwo!

Opublikuj Konkurs Teraz lub dołącz do nas już dziś!