Commit Graph

27 Commits (b91df8a1b8b7a42fb1aa9fc17fd1f76bfb8346c1)

Author SHA1 Message Date
Vinayak Mehta 72c42c74db Remove ocr 2018-09-01 16:23:54 +05:30
Vinayak Mehta 9753889ea2 Add option to specify end in page range 2017-08-16 14:53:15 +05:30
Vinayak Mehta e252e476b9 Add better y-cuts detection 2017-04-25 18:44:53 +05:30
Vinayak Mehta d28e4b8c1e Change default value for iterations 2017-04-21 13:20:48 +05:30
Vinayak Mehta 4da754ddcb [ENH] Add OCR and better joint detection
* Add iterations for dilation

* Add OCRLattice and OCRStream

* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta 4a87a77003 Remove ncols 2017-04-11 15:50:12 +05:30
Vinayak Mehta 8e8f5bbb3b Add zip of csvs option 2017-04-11 14:14:54 +05:30
Vinayak Mehta 72233f25ce Parameterize thresholding blocksize and constant 2017-04-10 21:15:54 +05:30
Vinayak Mehta 84d354ba10 Add deepcopy and debug scripts 2017-04-10 18:59:48 +05:30
Vinayak Mehta 4dd0d2330e Fix shift text 2017-03-21 16:04:55 +05:30
Vinayak Mehta 3651fb2347 Remove ncolumns everywhere 2017-03-01 19:53:48 +05:30
Vinayak Mehta edcf770d93 Remove verbose option 2017-02-07 23:44:01 +05:30
Vinayak Mehta 3eb18ef199 More logs 2017-02-07 22:23:05 +05:30
Vinayak Mehta bc86346154 Don't let processes modify instance attributes 2017-02-07 22:13:33 +05:30
Vinayak Mehta 970256e19d Add OCR support for image based pdfs with lines
* Cosmits

* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value

* Add image attribute in Table and Cell

* Add OCR

* Fix coordinates

* Add table_area

* Add ocr options to cli

* Direct ghostscript call output to /dev/null

* Add ocr dostring

* Add requirements

* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta 5c6a74fb2a Add new params 2016-10-18 18:23:35 +05:30
Vinayak Mehta 75c7deffaa Minor Stream fix 2016-09-27 17:27:34 +05:30
Vinayak Mehta 79afb45e2e Support for vertical tables in Stream
* Change var names

* Add test pdf

* Add tests for Lattice rotation

* Add support for vertical tables in Stream, test pdfs

* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta 757ba0444a Remove jtol 2016-09-13 17:28:21 +05:30
Vinayak Mehta 439059817d Update tests with new API
* Update Lattice tests with new API

* Update Stream tests with new API, fix CLI

* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30
Vinayak Mehta d86630e70b Add table_area
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta 0bb6ce0bf9 CLI debug fix 2016-09-01 02:16:58 +05:30
Vinayak Mehta 552f9cf422 Add various metrics to score the quality of a parse
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta d834faeac8 Fix README
Fix README
2016-08-09 18:36:43 +05:30
Vinayak Mehta 13568865b5 Add verbose 2016-08-03 13:14:19 +05:30
Vinayak Mehta 57917426e8 Fix docstrings 2016-08-03 13:14:11 +05:30
Vinayak Mehta e9602bb353 Create python package
Add version support

Add new test file

[RFC] First phase

[RFC] Second phase

[RFC] Third phase

Add logging

Update README

Add debug

Add debug, fixes

Add pep8 changes

Add fix

Rename CLI tool

Add csv fix

Update README

Add fix for numpages

Update README

Update requirements.txt

Use yield

Add tuple unpacking fix

Fix n00b mistake

Add check for None

Fix check for None

Fix unicode

Add relative imports
2016-07-29 21:09:39 +05:30