Commit Graph

23 Commits (7fae107560a92ef7130d279835122ad8e49fe80e)

Author SHA1 Message Date
Frh 4a761611bf WIP: Introduce actual hybrid parser
Create hybrid parser leverage both lattice and network techniques.
Simplify plotting of pdf in lattice case.
Rename "parser.table_bbox" into "parser.table_bbox_parses", since it
represents not a bbox but a dict of bbox to corresponding parsing data.

Still missing: more unit tests, plotting of steps.
2020-06-11 17:20:37 -07:00
Frh edad1efd1b Rename WIP parser "network", actual Hybrid to come 2020-06-11 17:20:37 -07:00
Frh 9e385bf8fc Fix plotting unit tests
Enforce order of textline plotting for unit test consistency in 3.6.
Create wrapper around camelot plot that enforces backwards consistency
with older versions of matplotlib.
2020-06-11 17:20:37 -07:00
Frh ada4809a59 Improve column detection for hybrid flavor
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
2020-06-11 17:20:37 -07:00
Frh 21dc6a46a0 Improve hybrid table body discovery algo
While searching for table body boundaries, exclude rows that include
cells crossing previously discovered rows.
2020-06-11 17:20:37 -07:00
Frh a04e7702b2 Create notebook to help debug hybrid parser algo
Plot vertical col anchors found by hybrid parser
Include vertical text in col/row generation
2020-06-11 17:20:37 -07:00
Frh 8f5e2bba4d Prep for vertical text improvements
plot.text shows vertical text in red
_generate_columns_and_rows split between hybrid and stream
2020-06-11 17:20:37 -07:00
Frh dbaab66e43 Rename member for clarity, fixed unit test
_textlines_alignments becomes _textline_to_alignments
2020-06-11 17:20:36 -07:00
Frh a0e46916e2 Improve edgeplot for hybrid 2020-06-11 17:20:36 -07:00
Frh c9a73a1ad7 Further refactoring 2020-06-11 17:20:36 -07:00
Frh a401d33fd9 Refactor out _text_bbox 2020-06-11 17:20:36 -07:00
Frh 0b8aac977a Update test to reflect different order of edges 2020-06-11 17:20:36 -07:00
Frh 1a47c3df89 Prettier plotting, improve gaps calculation 2020-06-11 17:20:36 -07:00
Frh d2cf8520cb Draw parse constraints for easier debug
* Display regions and areas rectangles
2020-06-11 17:20:36 -07:00
Frh 1ccaa0630d Improve hybrid plotting
* plot info passed through debug_info
* display each text edge
2020-06-11 17:20:36 -07:00
Frh f9a6543c36 Initial Hybrid parser, for now identical to Stream 2020-06-11 17:20:36 -07:00
Frh 161f71230d Refactor base classes and improve plotting
Move common code to base class to reduce duplication
Stream plots display pdf background for better context
2020-06-11 17:20:36 -07:00
Frh bd2aab5b2d Fix unit tests, lint, drop Python 2 support
Drop EOL Python 2 support. Resolve unit test discrepancies.
Update unit tests to pass in Travis across all supported Py.
Linting.
2020-06-11 17:20:35 -07:00
Vinayak Mehta a38d52c7b2 Fix plot tests 2018-12-20 15:44:28 +05:30
Vinayak Mehta b56d2246ad Add new plot type tests 2018-12-12 08:09:52 +05:30
Vinayak Mehta 619ce2e2a4 Fix grid plot baseline image 2018-12-07 20:22:56 +05:30
Vinayak Mehta db3f8c6897
[MRG] Make matplotlib optional (#190)
* Rename png files

* Convert plot to PlotMethods class and update docs

* Update test

* Update setup.py and docs

* Refactor PlotMethods

* Make matplotlib optional

* Raise ImportError in cli
2018-11-02 23:16:03 +05:30
Suyash Behera c0e9235164 [MRG + 1] Create a new figure and test each plot type #127 (#179)
* [MRG] Create a new figure and test each plot type #127

 - move `plot()` to `plotting.py` as `plot_pdf()`
 - modify plotting functions to return matplotlib figures
 - add `test_plotting.py` and baseline images
 - import `plot_pdf()` in `__init__`
 - update `cli.py` to use `plot_pdf()`
 - update advanced usage docs to reflect changes

* Change matplotlib backend for image comparison tests

* Update plotting and tests
 - use matplotlib rectangle instead of `cv2.rectangle` in
`plot_contour()`
 - set matplotlib backend in `tests/__init__`
 - update contour plot baseline image
 - update `test_plotting` with more checks

* Update plot tests and config
 - remove unnecessary asserts
 - update setup.cfg and makefile with `--mpl`

* Add  to

* Add tolerance

* remove text from baseline plots
update plot tests with `remove_text`

* Change method name, update docs and add pep8

* Update docs
2018-11-02 20:57:02 +05:30