Commit Graph

44 Commits (934065ada6af6b29899786678bbf68f75e648859)

Author SHA1 Message Date
Vinayak Mehta ce727d9558 Fix split text bug 2019-03-22 02:28:29 +05:30
Vinayak Mehta 03f301b25c Add table regions support 2019-01-04 19:17:54 +05:30
Vinayak Mehta 9d90cadac0 Fix variable name 2019-01-03 15:47:05 +05:30
Vinayak Mehta f605bd8f94 Fix #239 2019-01-03 14:55:47 +05:30
Vinayak Mehta 62ed4753cd Make python2 compat 2018-12-24 13:10:48 +05:30
Vinayak Mehta 2b3461deab Add support to read from url 2018-12-24 12:55:52 +05:30
Vinayak Mehta 50b4468aff Rename kwargs and add tests 2018-12-21 15:09:37 +05:30
Vinayak Mehta f6aa21c31f Add strip_text 2018-12-20 16:32:16 +05:30
Vinayak Mehta ca6cefa362 Add extra_kwargs 2018-12-17 11:49:05 +05:30
Vinayak Mehta 5e71f0b0e6 Fix #192 2018-12-13 12:50:30 +05:30
Oshawk 90aaba6eec [MRG + 1] Make pep8 (#125)
* Make setup.py pep8

Add new line at end of file, fix bare except, remove unused import.

* Make tests/*.py pep8

Add some newlines at and of files and a visual indent.

* Make docs/*.py pep8

Fix block comments and add new lines at end of files.

* Make camelot/*.py pep8

Fixed unused import, a few weirdly ordered imports, a docstring typo and  many new lines at the end of lines.

* Fix imports

Fix import order and remove a couple more unused imports.

* Fix indents

Fix indentation (no opening delimiter alignment).

* Add newlines
2018-10-05 16:55:43 +05:30
Vinayak Mehta 6e8079df84
[MRG] Add tests for output formats and parser kwargs (#126)
* Remove unused image processing code

* Add opencv back-compat comment

* Add tests for parser special cases

* Fix lattice table area test

* Add tests for output format

* Add openpyxl dep
2018-10-05 16:15:30 +05:30
Vinayak Mehta c5bde5e2ad
[MRG] Add error/warning tests (#113)
* Add unknown flavor test

* Add input kwargs test

* Remove unused utils

* Add unsupported format test

* Add stream unequal tables-columns length test

* Add python3 compat

* Add no tables found test

* Convert util info log to warning
2018-10-02 19:28:42 +05:30
Vinayak Mehta fc0542bd3c
Add Python 3 compatibility (#109)
* Add python3 compat

* Update .gitignore

* Update .gitignore again

* Remove debugging return

* Add unicode_literals import

* Bump version

* Add python3-tk note
2018-09-28 21:58:29 +05:30
Vinayak Mehta 3170a9689f Add flavors 2018-09-23 10:53:32 +05:30
Vinayak Mehta 17ea5f335e Fix docstrings and interlinks 2018-09-11 08:31:37 +05:30
Vinayak Mehta 7bb1aee9b6 Add CLI 2018-09-10 15:16:41 +05:30
Vinayak Mehta d3beaafc99 Add temporary directory context manager 2018-09-09 18:10:55 +05:30
Vinayak Mehta 9a6ed555c8 Fix get_rotation 2018-09-09 10:04:54 +05:30
Vinayak Mehta 9878de4dfc Add docstrings and update docs 2018-09-09 10:00:22 +05:30
Vinayak Mehta 04383920b4 Rename parser keyword arguments 2018-09-08 05:38:43 +05:30
Vinayak Mehta b3f840bba9 Change utils function names 2018-09-07 06:04:45 +05:30
Vinayak Mehta 20acda2259 Fix current logging 2018-09-07 05:53:19 +05:30
Vinayak Mehta b91df8a1b8 Create parsers module 2018-09-06 06:13:58 +05:30
Vinayak Mehta 96af09d9cd Add BaseParser and refactor extract_tables 2018-09-06 05:28:34 +05:30
Vinayak Mehta a4d3165e94 Add docstring stubs 2018-09-05 19:35:46 +05:30
Vinayak Mehta bf63432494 Remove docstrings 2018-09-05 19:04:40 +05:30
Vinayak Mehta 9124e3374c Add properties to Table 2018-09-05 18:20:46 +05:30
Vinayak Mehta e252e476b9 Add better y-cuts detection 2017-04-25 18:44:53 +05:30
Vinayak Mehta 5c5bd6199c Fix warnings and exceptions 2017-04-21 14:20:33 +05:30
Vinayak Mehta 4da754ddcb [ENH] Add OCR and better joint detection
* Add iterations for dilation

* Add OCRLattice and OCRStream

* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta 72233f25ce Parameterize thresholding blocksize and constant 2017-04-10 21:15:54 +05:30
Vinayak Mehta bc86346154 Don't let processes modify instance attributes 2017-02-07 22:13:33 +05:30
Vinayak Mehta 70f626373b Cosmits
* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta b01edee337 Handle rotation at entry 2016-10-18 15:33:38 +05:30
Vinayak Mehta 2a203a1865 Log warning when len(header) != len(cols) 2016-10-17 18:16:39 +05:30
Vinayak Mehta 40d30c1ab9 Add superscript and subscript flagging
* Add superscript flagging

* Add flagging param

* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta a43d5ca2c7 Replace chars with textlines
* Add split function

* Add split_text and shift_text params

* Change get_rotation

* Move get_column_index to utils

* Add split_text and shift_text

* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta 4b8e96a86a Update docs
* Update README

* Update index.rst

* Update docstrings

* Fix typo

* Edit docs

* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta 79afb45e2e Support for vertical tables in Stream
* Change var names

* Add test pdf

* Add tests for Lattice rotation

* Add support for vertical tables in Stream, test pdfs

* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta d86630e70b Add table_area
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta b2dd5f68fe Fix vertical text detection in cells
* Fix vertical text detection in cells

* Add Cell instance method

* Change var names
2016-09-01 01:42:27 +05:30
Vinayak Mehta 552f9cf422 Add various metrics to score the quality of a parse
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta e9602bb353 Create python package
Add version support

Add new test file

[RFC] First phase

[RFC] Second phase

[RFC] Third phase

Add logging

Update README

Add debug

Add debug, fixes

Add pep8 changes

Add fix

Rename CLI tool

Add csv fix

Update README

Add fix for numpages

Update README

Update requirements.txt

Use yield

Add tuple unpacking fix

Fix n00b mistake

Add check for None

Fix check for None

Fix unicode

Add relative imports
2016-07-29 21:09:39 +05:30