Commit Graph

6 Commits (77d289bd865fa6febcc042eb46a0bd4be7ce35b0)

Author SHA1 Message Date
Frh 77d289bd86 WIP: Introduce actual hybrid parser
Create hybrid parser leverage both lattice and network techniques.
Simplify plotting of pdf in lattice case.
Rename "parser.table_bbox" into "parser.table_bbox_parses", since it
represents not a bbox but a dict of bbox to corresponding parsing data.

Still missing: more unit tests, plotting of steps.
2020-05-04 16:27:01 -07:00
Frh 6711f877bf Rename WIP parser "network", actual Hybrid to come 2020-05-02 16:14:03 -07:00
Frh c0903b8ca9 Improve column detection for hybrid flavor
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
2020-04-29 11:46:40 -07:00
Frh 918416e7e4 Improve hybrid table body discovery algo
While searching for table body boundaries, exclude rows that include
cells crossing previously discovered rows.
2020-04-28 22:43:55 -07:00
Frh 3220b02ebc Create notebook to help debug hybrid parser algo
Plot vertical col anchors found by hybrid parser
Include vertical text in col/row generation
2020-04-28 12:26:12 -07:00
Frh a2c5ee7f06 Add parser comparizon notebook 2020-04-25 21:55:21 -07:00