Commit Graph

27 Commits (b43aca8ff523b8ab9014c516ade123ea1fb85d94)

Author SHA1 Message Date
Frh b43aca8ff5 Merge branch 'master' into hybrid-parser 2020-06-14 08:53:43 -07:00
Frh 9abdd00cec Enable process_background option for hybrid
Trim empty cols and lines
2020-06-11 17:20:37 -07:00
Frh 4a761611bf WIP: Introduce actual hybrid parser
Create hybrid parser leverage both lattice and network techniques.
Simplify plotting of pdf in lattice case.
Rename "parser.table_bbox" into "parser.table_bbox_parses", since it
represents not a bbox but a dict of bbox to corresponding parsing data.

Still missing: more unit tests, plotting of steps.
2020-06-11 17:20:37 -07:00
Frh edad1efd1b Rename WIP parser "network", actual Hybrid to come 2020-06-11 17:20:37 -07:00
Frh ada4809a59 Improve column detection for hybrid flavor
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
2020-06-11 17:20:37 -07:00
Frh 81de841ca0 Plot improvements, address 132
Plot takes an optional axes parameter, allowing notebooks more
flexibility.
Header heuristic in hybrid won't include headers which span the
entire table.
Added unit test for issue #132

Fixes https://github.com/camelot-dev/camelot/issues/132
2020-06-11 17:20:36 -07:00
Frh d3d625a08d Unit test fixes 2020-06-11 17:20:36 -07:00
Frh 13268beb6f Unit test fix 2020-06-11 17:20:36 -07:00
Frh 549ab0ebe6 Unit test fix 2020-06-11 17:20:36 -07:00
Frh 1a47c3df89 Prettier plotting, improve gaps calculation 2020-06-11 17:20:36 -07:00
Frh e0e3ff4e07 Add support for region/area for hybrid 2020-06-11 17:20:36 -07:00
Frh f5fe92c22e Interim check-in, test failing and lots of todos 2020-06-11 17:20:36 -07:00
Frh bd2aab5b2d Fix unit tests, lint, drop Python 2 support
Drop EOL Python 2 support. Resolve unit test discrepancies.
Update unit tests to pass in Travis across all supported Py.
Linting.
2020-06-11 17:20:35 -07:00
Vinayak Mehta f725f04223
Remove future imports 2020-05-24 17:33:13 +05:30
Vinayak Mehta 3afb72b872
Fix read_pdf(url) and test data 2020-05-24 17:26:52 +05:30
Dimiter Naydenov 240ea6c411 Fixed strip_text argument getting ignored 2019-07-04 12:12:52 +03:00
Vinayak Mehta d064f716e9 Add lattice test 2019-01-04 20:22:14 +05:30
Vinayak Mehta 50b4468aff Rename kwargs and add tests 2018-12-21 15:09:37 +05:30
Vinayak Mehta 17d48be46e Add test 2018-12-19 18:31:54 +05:30
Vinayak Mehta d83d5fae42 Fix tests
Fix tests
2018-12-13 16:06:48 +05:30
Vinayak Mehta ff4d8ce228 Add test for arabic 2018-12-13 13:13:07 +05:30
Vinayak Mehta 1f71513004 Fix no table found warning and add tests for two tables 2018-11-23 19:28:55 +05:30
Vinayak Mehta bf894116d2 Update test data 2018-11-23 04:25:04 +05:30
Parth P Panchal 32df09ad1c Renames the keyword `table_area` to `table_areas` (#171)
`table_areas` sounds more apt since it is a list and there can be
multiple table areas on a page.

Closes #165
2018-10-24 23:06:53 +05:30
Oshawk 90aaba6eec [MRG + 1] Make pep8 (#125)
* Make setup.py pep8

Add new line at end of file, fix bare except, remove unused import.

* Make tests/*.py pep8

Add some newlines at and of files and a visual indent.

* Make docs/*.py pep8

Fix block comments and add new lines at end of files.

* Make camelot/*.py pep8

Fixed unused import, a few weirdly ordered imports, a docstring typo and  many new lines at the end of lines.

* Fix imports

Fix import order and remove a couple more unused imports.

* Fix indents

Fix indentation (no opening delimiter alignment).

* Add newlines
2018-10-05 16:55:43 +05:30
Vinayak Mehta 6e8079df84
[MRG] Add tests for output formats and parser kwargs (#126)
* Remove unused image processing code

* Add opencv back-compat comment

* Add tests for parser special cases

* Fix lattice table area test

* Add tests for output format

* Add openpyxl dep
2018-10-05 16:15:30 +05:30
Vinayak Mehta 9537143fe0 Add pytest-cov
Add fix for coverage

Add source and omit to coveragerc

Update coveragerc

Update coveragerc

Add source to coveragerc

Update coveragerc source

Add init to tests

Fix ImportError

Fix ImportError again
2018-10-02 22:37:38 +05:30