Commit Graph

10 Commits (d392000a5f485b93a646bb7da9dbc9ede3af1bb8)

Author SHA1 Message Date
Frh 42f8321c8c Clean up notebooks, address review comments
* Improve explanations of network, hybrid, and lattice parsers
* Remove dead code from parser comparison notebook
* Clean-up notebook variables to reduce size and make diffs cleaner
* Revert changes that were peripheral to the core changes
2020-07-03 18:28:24 -07:00
Frh 71805f9333 Fix issues following pass across most test cases
* Clean up the parser comparison notebook
* Address issue where hybrid didn't honor the columns parameter
* Fix dropping of empty rows/columns in hybrid
* Hybrid learns table y-dimensions from lattice
2020-06-16 13:04:53 -07:00
Frh 529ea36904 Updated comparison notebook 2020-06-11 17:20:37 -07:00
Frh 63adfd5468 Hybrid parser fixes
Improve parser comparison notebook to flag identical parses, display
multiple tables correctly
Fix tolerance parameter inclusion for hybrid.
2020-06-11 17:20:37 -07:00
Frh 4a761611bf WIP: Introduce actual hybrid parser
Create hybrid parser leverage both lattice and network techniques.
Simplify plotting of pdf in lattice case.
Rename "parser.table_bbox" into "parser.table_bbox_parses", since it
represents not a bbox but a dict of bbox to corresponding parsing data.

Still missing: more unit tests, plotting of steps.
2020-06-11 17:20:37 -07:00
Frh edad1efd1b Rename WIP parser "network", actual Hybrid to come 2020-06-11 17:20:37 -07:00
Frh ada4809a59 Improve column detection for hybrid flavor
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
2020-06-11 17:20:37 -07:00
Frh 21dc6a46a0 Improve hybrid table body discovery algo
While searching for table body boundaries, exclude rows that include
cells crossing previously discovered rows.
2020-06-11 17:20:37 -07:00
Frh a04e7702b2 Create notebook to help debug hybrid parser algo
Plot vertical col anchors found by hybrid parser
Include vertical text in col/row generation
2020-06-11 17:20:37 -07:00
Frh f7aafcd05c Add parser comparizon notebook 2020-06-11 17:20:36 -07:00