Commit Graph

87 Commits (77d289bd865fa6febcc042eb46a0bd4be7ce35b0)

Author SHA1 Message Date
Frh 77d289bd86 WIP: Introduce actual hybrid parser
Create hybrid parser leverage both lattice and network techniques.
Simplify plotting of pdf in lattice case.
Rename "parser.table_bbox" into "parser.table_bbox_parses", since it
represents not a bbox but a dict of bbox to corresponding parsing data.

Still missing: more unit tests, plotting of steps.
2020-05-04 16:27:01 -07:00
Frh 6711f877bf Rename WIP parser "network", actual Hybrid to come 2020-05-02 16:14:03 -07:00
Frh c7ab3a4c32 Raise tolerance of plot differences 2020-04-30 17:06:45 -07:00
Frh d663dd18fd Fix plotting unit tests
Enforce order of textline plotting for unit test consistency in 3.6.
Create wrapper around camelot plot that enforces backwards consistency
with older versions of matplotlib.
2020-04-30 16:54:37 -07:00
Frh f3aded5b17 Linting 2020-04-29 13:52:58 -07:00
Frh c0903b8ca9 Improve column detection for hybrid flavor
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
2020-04-29 11:46:40 -07:00
Frh 918416e7e4 Improve hybrid table body discovery algo
While searching for table body boundaries, exclude rows that include
cells crossing previously discovered rows.
2020-04-28 22:43:55 -07:00
Frh 3220b02ebc Create notebook to help debug hybrid parser algo
Plot vertical col anchors found by hybrid parser
Include vertical text in col/row generation
2020-04-28 12:26:12 -07:00
Frh 6add19ae27 Prep for vertical text improvements
plot.text shows vertical text in red
_generate_columns_and_rows split between hybrid and stream
2020-04-28 11:46:12 -07:00
Frh 016776939e Plot improvements, address 132
Plot takes an optional axes parameter, allowing notebooks more
flexibility.
Header heuristic in hybrid won't include headers which span the
entire table.
Added unit test for issue #132

Fixes https://github.com/camelot-dev/camelot/issues/132
2020-04-25 20:51:00 -07:00
Frh 84ec5c6acd Rename member for clarity, fixed unit test
_textlines_alignments becomes _textline_to_alignments
2020-04-25 17:15:16 -07:00
Frh 22f4287788 Improve edgeplot for hybrid 2020-04-25 13:31:10 -07:00
Frh bb842f21b9 Further refactoring 2020-04-24 21:11:31 -07:00
Frh 5290fb6a7d Refactor out _text_bbox 2020-04-24 15:18:38 -07:00
Frh 3ea8d81900 Update test to reflect different order of edges 2020-04-23 14:45:35 -07:00
Frh ec0ca1e009 Unit test fixes 2020-04-22 15:36:37 -07:00
Frh 6962c714f9 Unit test fix 2020-04-22 14:50:59 -07:00
Frh fab13ee5b8 Unit test fix 2020-04-22 14:25:03 -07:00
Frh 9a82408a9a Prettier plotting, improve gaps calculation 2020-04-22 14:08:22 -07:00
Frh cd338ff4e2 Draw parse constraints for easier debug
* Display regions and areas rectangles
2020-04-21 14:24:44 -07:00
Frh fb69bd9299 Improve hybrid plotting
* plot info passed through debug_info
* display each text edge
2020-04-20 16:54:06 -07:00
Frh 175655d31b Add support for region/area for hybrid 2020-04-20 11:20:59 -07:00
Frh 57c5957bad Interim check-in, test failing and lots of todos 2020-04-19 18:26:38 -07:00
Frh d0bd1cfd1f More linting 2020-04-19 17:35:19 -07:00
Frh 69c7728867 More linting 2020-04-19 17:05:33 -07:00
Frh e59b3f5efb Fix unit test 2020-04-19 16:38:25 -07:00
Frh d520a77bb7 Initial Hybrid parser, for now identical to Stream 2020-04-19 16:27:01 -07:00
Frh d673a3b6e0 Fix unit test with plotting 2020-04-19 15:07:59 -07:00
Frh 697289e409 Refactor base classes and improve plotting
Move common code to base class to reduce duplication
Stream plots display pdf background for better context
2020-04-18 23:03:27 -07:00
Frh 816471e426 Fix unit tests, lint, drop Python 2 support
Drop EOL Python 2 support. Resolve unit test discrepancies.
Update unit tests to pass in Travis across all supported Py.
Linting.
2020-04-18 17:25:47 -07:00
Dimiter Naydenov 0f8cda4793
Merge pull request #5 from camelot-dev/fix-cli-group-name
[MRG] No need to monkey-patch Click.HelpFormatter
2019-07-04 18:26:35 +03:00
Dimiter Naydenov 13616c2fb4 No need to monkey-patch Click.HelpFormatter 2019-07-04 13:13:32 +03:00
Dimiter Naydenov 240ea6c411 Fixed strip_text argument getting ignored 2019-07-04 12:12:52 +03:00
Vinayak Mehta 8866eaa3b6 Fix pytest deprecation warning 2019-07-03 22:07:10 +05:30
Vinayak Mehta 477568dea7 Fix test 2019-05-27 22:29:50 +05:30
Vinayak Mehta de3281c1b6 Add test 2019-05-27 22:18:23 +05:30
Vinayak Mehta 88466b8c4e
Rename _mk_table to _make_table 2019-03-08 21:04:34 +05:30
Sym Roe c019e582bf
Add __lt__ to Table to allow sorting
Refs #277
2019-02-25 09:20:09 +00:00
Vinayak Mehta ab5391c76f Merge branch 'master' of github.com:socialcopsdev/camelot into replace-gs-c-api 2019-01-05 11:22:38 +05:30
Vinayak Mehta d064f716e9 Add lattice test 2019-01-04 20:22:14 +05:30
Vinayak Mehta 03f301b25c Add table regions support 2019-01-04 19:17:54 +05:30
Vinayak Mehta 605ffdd444 Add test 2019-01-03 16:13:41 +05:30
Vinayak Mehta 859610e0dc Add pages test 2019-01-02 16:35:49 +05:30
Vinayak Mehta 2b3461deab Add support to read from url 2018-12-24 12:55:52 +05:30
Vinayak Mehta 27fa226c71 Fix merge conflict 2018-12-22 11:07:24 +05:30
Vinayak Mehta be1f0a2884 Update advanced docs 2018-12-21 16:32:44 +05:30
Vinayak Mehta 50b4468aff Rename kwargs and add tests 2018-12-21 15:09:37 +05:30
Vinayak Mehta a38d52c7b2 Fix plot tests 2018-12-20 15:44:28 +05:30
Vinayak Mehta 17d48be46e Add test 2018-12-19 18:31:54 +05:30
Vinayak Mehta 4938c48853 Remove _errors and ghostscript test 2018-12-18 07:43:52 +05:30