Commit Graph

268 Commits (2022a8abc9f3a477c7a8373973a6b5f886b5fff8)

Author SHA1 Message Date
Vinayak Mehta 4da754ddcb [ENH] Add OCR and better joint detection
* Add iterations for dilation

* Add OCRLattice and OCRStream

* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta dd909e2b53 Fix debug script 2017-04-11 20:26:01 +05:30
Vinayak Mehta 7246e1a73d Parallelize pdf split 2017-04-11 18:30:05 +05:30
Vinayak Mehta 4a87a77003 Remove ncols 2017-04-11 15:50:12 +05:30
Vinayak Mehta 8e8f5bbb3b Add zip of csvs option 2017-04-11 14:14:54 +05:30
Vinayak Mehta 72233f25ce Parameterize thresholding blocksize and constant 2017-04-10 21:15:54 +05:30
Vinayak Mehta 8b07aa2702 Minor fixes 2017-04-10 19:08:39 +05:30
Vinayak Mehta 778366b2dd Remove directory 2017-04-10 19:03:43 +05:30
Vinayak Mehta 84d354ba10 Add deepcopy and debug scripts 2017-04-10 18:59:48 +05:30
Vinayak Mehta 4dd0d2330e Fix shift text 2017-03-21 16:04:55 +05:30
Vinayak Mehta 3651fb2347 Remove ncolumns everywhere 2017-03-01 19:53:48 +05:30
Vinayak Mehta edcf770d93 Remove verbose option 2017-02-07 23:44:01 +05:30
Vinayak Mehta 3eb18ef199 More logs 2017-02-07 22:23:05 +05:30
Vinayak Mehta bc86346154 Don't let processes modify instance attributes 2017-02-07 22:13:33 +05:30
Vinayak Mehta 970256e19d Add OCR support for image based pdfs with lines
* Cosmits

* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value

* Add image attribute in Table and Cell

* Add OCR

* Fix coordinates

* Add table_area

* Add ocr options to cli

* Direct ghostscript call output to /dev/null

* Add ocr dostring

* Add requirements

* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta 70f626373b Cosmits
* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta bd1d57a561 Update version 2017-01-07 15:50:20 +05:30
Vinayak Mehta 10eda3f204 Deprecate Stream ncolumns 2016-11-07 21:30:48 +05:30
Vinayak Mehta 72c2a0020f Minor fix 2016-10-20 18:54:06 +05:30
Vinayak Mehta ed44d603f5 Update README 2016-10-18 18:27:24 +05:30
Vinayak Mehta 5c6a74fb2a Add new params 2016-10-18 18:23:35 +05:30
Vinayak Mehta b01edee337 Handle rotation at entry 2016-10-18 15:33:38 +05:30
Vinayak Mehta 2a203a1865 Log warning when len(header) != len(cols) 2016-10-17 18:16:39 +05:30
Vinayak Mehta adb948d363 Fix column parameter 2016-10-13 16:54:45 +05:30
Vinayak Mehta 40d30c1ab9 Add superscript and subscript flagging
* Add superscript flagging

* Add flagging param

* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta e8b93a9624 Add headers param 2016-10-12 13:59:10 +05:30
Vinayak Mehta a43d5ca2c7 Replace chars with textlines
* Add split function

* Add split_text and shift_text params

* Change get_rotation

* Move get_column_index to utils

* Add split_text and shift_text

* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta 02ef332bd6 Add logo 2016-10-04 20:59:52 +05:30
Vinayak Mehta 52a2876ab1 Fix tarea type conversion 2016-10-04 19:57:53 +05:30
Vinayak Mehta 4b8e96a86a Update docs
* Update README

* Update index.rst

* Update docstrings

* Fix typo

* Edit docs

* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta d46eeeab1a Change jpg to png 2016-09-27 18:37:38 +05:30
Vinayak Mehta 75c7deffaa Minor Stream fix 2016-09-27 17:27:34 +05:30
Vinayak Mehta 79afb45e2e Support for vertical tables in Stream
* Change var names

* Add test pdf

* Add tests for Lattice rotation

* Add support for vertical tables in Stream, test pdfs

* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta 8ce7b74671 Replace imagemagick with ghostscript
* Replace imagemagick with ghostscript

* Add quiet option

* Avoid repetition

* Remove Wand requirement

* Replace jpeg with png
2016-09-13 17:35:07 +05:30
Vinayak Mehta 757ba0444a Remove jtol 2016-09-13 17:28:21 +05:30
Vinayak Mehta 439059817d Update tests with new API
* Update Lattice tests with new API

* Update Stream tests with new API, fix CLI

* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30
Vinayak Mehta a94c350a7b Fix param flow
* Fix param flow

* Add check for None
2016-09-09 14:52:38 +05:30
Vinayak Mehta 766260d5d9 Remove hybrid.py 2016-09-08 21:17:24 +05:30
Vinayak Mehta 98f47d1bd7 Fix table_bbox when no tarea is given 2016-09-05 21:26:16 +05:30
Vinayak Mehta d86630e70b Add table_area
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta 0bb6ce0bf9 CLI debug fix 2016-09-01 02:16:58 +05:30
Vinayak Mehta b2dd5f68fe Fix vertical text detection in cells
* Fix vertical text detection in cells

* Add Cell instance method

* Change var names
2016-09-01 01:42:27 +05:30
Vinayak Mehta 8d56f15130 Add negative tolerance 2016-08-31 22:25:33 +05:30
Vinayak Mehta 2a55621d05 Fix magic grid extension 2016-08-31 21:06:41 +05:30
Vinayak Mehta 552f9cf422 Add various metrics to score the quality of a parse
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta 43a009dab4 Add flow images 2016-08-24 16:53:03 +05:30
Vinayak Mehta d834faeac8 Fix README
Fix README
2016-08-09 18:36:43 +05:30
Vinayak Mehta 7e5804f87d Adds documentation
[MRG] Adds documentation
2016-08-09 17:23:50 +05:30
Vinayak Mehta dda809b286 Fix Makefile spaces to tabs 2016-08-08 17:26:54 +05:30
Vinayak Mehta 8ff04391b7 Add coveragerc and update Makefile 2016-08-08 17:24:13 +05:30