Commit Graph

195 Commits (cff7a9698b6462d140979d722fe738a0f16dbe3a)

Author SHA1 Message Date
Vinayak Mehta bef33c75b1 Fix ValueError 2017-04-21 20:15:35 +05:30
Vinayak Mehta fdb4b0d494 Update version 2017-04-21 15:41:32 +05:30
Vinayak Mehta 5c5bd6199c Fix warnings and exceptions 2017-04-21 14:20:33 +05:30
Vinayak Mehta 18e1a799a1 Remove remove_empty 2017-04-21 13:22:37 +05:30
Vinayak Mehta d28e4b8c1e Change default value for iterations 2017-04-21 13:20:48 +05:30
Vinayak Mehta 4da754ddcb [ENH] Add OCR and better joint detection
* Add iterations for dilation

* Add OCRLattice and OCRStream

* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta 7246e1a73d Parallelize pdf split 2017-04-11 18:30:05 +05:30
Vinayak Mehta 4a87a77003 Remove ncols 2017-04-11 15:50:12 +05:30
Vinayak Mehta 72233f25ce Parameterize thresholding blocksize and constant 2017-04-10 21:15:54 +05:30
Vinayak Mehta 84d354ba10 Add deepcopy and debug scripts 2017-04-10 18:59:48 +05:30
Vinayak Mehta 3eb18ef199 More logs 2017-02-07 22:23:05 +05:30
Vinayak Mehta bc86346154 Don't let processes modify instance attributes 2017-02-07 22:13:33 +05:30
Vinayak Mehta 970256e19d Add OCR support for image based pdfs with lines
* Cosmits

* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value

* Add image attribute in Table and Cell

* Add OCR

* Fix coordinates

* Add table_area

* Add ocr options to cli

* Direct ghostscript call output to /dev/null

* Add ocr dostring

* Add requirements

* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta 70f626373b Cosmits
* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta bd1d57a561 Update version 2017-01-07 15:50:20 +05:30
Vinayak Mehta 10eda3f204 Deprecate Stream ncolumns 2016-11-07 21:30:48 +05:30
Vinayak Mehta 72c2a0020f Minor fix 2016-10-20 18:54:06 +05:30
Vinayak Mehta 5c6a74fb2a Add new params 2016-10-18 18:23:35 +05:30
Vinayak Mehta b01edee337 Handle rotation at entry 2016-10-18 15:33:38 +05:30
Vinayak Mehta 2a203a1865 Log warning when len(header) != len(cols) 2016-10-17 18:16:39 +05:30
Vinayak Mehta adb948d363 Fix column parameter 2016-10-13 16:54:45 +05:30
Vinayak Mehta 40d30c1ab9 Add superscript and subscript flagging
* Add superscript flagging

* Add flagging param

* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta e8b93a9624 Add headers param 2016-10-12 13:59:10 +05:30
Vinayak Mehta a43d5ca2c7 Replace chars with textlines
* Add split function

* Add split_text and shift_text params

* Change get_rotation

* Move get_column_index to utils

* Add split_text and shift_text

* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta 52a2876ab1 Fix tarea type conversion 2016-10-04 19:57:53 +05:30
Vinayak Mehta 4b8e96a86a Update docs
* Update README

* Update index.rst

* Update docstrings

* Fix typo

* Edit docs

* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta d46eeeab1a Change jpg to png 2016-09-27 18:37:38 +05:30
Vinayak Mehta 75c7deffaa Minor Stream fix 2016-09-27 17:27:34 +05:30
Vinayak Mehta 79afb45e2e Support for vertical tables in Stream
* Change var names

* Add test pdf

* Add tests for Lattice rotation

* Add support for vertical tables in Stream, test pdfs

* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta 8ce7b74671 Replace imagemagick with ghostscript
* Replace imagemagick with ghostscript

* Add quiet option

* Avoid repetition

* Remove Wand requirement

* Replace jpeg with png
2016-09-13 17:35:07 +05:30
Vinayak Mehta 757ba0444a Remove jtol 2016-09-13 17:28:21 +05:30
Vinayak Mehta 439059817d Update tests with new API
* Update Lattice tests with new API

* Update Stream tests with new API, fix CLI

* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30
Vinayak Mehta a94c350a7b Fix param flow
* Fix param flow

* Add check for None
2016-09-09 14:52:38 +05:30
Vinayak Mehta 766260d5d9 Remove hybrid.py 2016-09-08 21:17:24 +05:30
Vinayak Mehta 98f47d1bd7 Fix table_bbox when no tarea is given 2016-09-05 21:26:16 +05:30
Vinayak Mehta d86630e70b Add table_area
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta b2dd5f68fe Fix vertical text detection in cells
* Fix vertical text detection in cells

* Add Cell instance method

* Change var names
2016-09-01 01:42:27 +05:30
Vinayak Mehta 8d56f15130 Add negative tolerance 2016-08-31 22:25:33 +05:30
Vinayak Mehta 2a55621d05 Fix magic grid extension 2016-08-31 21:06:41 +05:30
Vinayak Mehta 552f9cf422 Add various metrics to score the quality of a parse
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta 7e5804f87d Adds documentation
[MRG] Adds documentation
2016-08-09 17:23:50 +05:30
Vinayak Mehta 13568865b5 Add verbose 2016-08-03 13:14:19 +05:30
Vinayak Mehta 57917426e8 Fix docstrings 2016-08-03 13:14:11 +05:30
Vinayak Mehta 050107b63d Minor fix 2016-07-29 21:47:20 +05:30
Vinayak Mehta e9602bb353 Create python package
Add version support

Add new test file

[RFC] First phase

[RFC] Second phase

[RFC] Third phase

Add logging

Update README

Add debug

Add debug, fixes

Add pep8 changes

Add fix

Rename CLI tool

Add csv fix

Update README

Add fix for numpages

Update README

Update requirements.txt

Use yield

Add tuple unpacking fix

Fix n00b mistake

Add check for None

Fix check for None

Fix unicode

Add relative imports
2016-07-29 21:09:39 +05:30