Commit Graph

23 Commits (79ea4adcd13e4169f61173825c36b0ca1779b9bf)

Author SHA1 Message Date
Frh 77d289bd86 WIP: Introduce actual hybrid parser
Create hybrid parser leverage both lattice and network techniques.
Simplify plotting of pdf in lattice case.
Rename "parser.table_bbox" into "parser.table_bbox_parses", since it
represents not a bbox but a dict of bbox to corresponding parsing data.

Still missing: more unit tests, plotting of steps.
2020-05-04 16:27:01 -07:00
Frh 6711f877bf Rename WIP parser "network", actual Hybrid to come 2020-05-02 16:14:03 -07:00
Frh d663dd18fd Fix plotting unit tests
Enforce order of textline plotting for unit test consistency in 3.6.
Create wrapper around camelot plot that enforces backwards consistency
with older versions of matplotlib.
2020-04-30 16:54:37 -07:00
Frh c0903b8ca9 Improve column detection for hybrid flavor
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
2020-04-29 11:46:40 -07:00
Frh 918416e7e4 Improve hybrid table body discovery algo
While searching for table body boundaries, exclude rows that include
cells crossing previously discovered rows.
2020-04-28 22:43:55 -07:00
Frh 3220b02ebc Create notebook to help debug hybrid parser algo
Plot vertical col anchors found by hybrid parser
Include vertical text in col/row generation
2020-04-28 12:26:12 -07:00
Frh 6add19ae27 Prep for vertical text improvements
plot.text shows vertical text in red
_generate_columns_and_rows split between hybrid and stream
2020-04-28 11:46:12 -07:00
Frh 84ec5c6acd Rename member for clarity, fixed unit test
_textlines_alignments becomes _textline_to_alignments
2020-04-25 17:15:16 -07:00
Frh 22f4287788 Improve edgeplot for hybrid 2020-04-25 13:31:10 -07:00
Frh bb842f21b9 Further refactoring 2020-04-24 21:11:31 -07:00
Frh 5290fb6a7d Refactor out _text_bbox 2020-04-24 15:18:38 -07:00
Frh 3ea8d81900 Update test to reflect different order of edges 2020-04-23 14:45:35 -07:00
Frh 9a82408a9a Prettier plotting, improve gaps calculation 2020-04-22 14:08:22 -07:00
Frh cd338ff4e2 Draw parse constraints for easier debug
* Display regions and areas rectangles
2020-04-21 14:24:44 -07:00
Frh fb69bd9299 Improve hybrid plotting
* plot info passed through debug_info
* display each text edge
2020-04-20 16:54:06 -07:00
Frh d520a77bb7 Initial Hybrid parser, for now identical to Stream 2020-04-19 16:27:01 -07:00
Frh 697289e409 Refactor base classes and improve plotting
Move common code to base class to reduce duplication
Stream plots display pdf background for better context
2020-04-18 23:03:27 -07:00
Frh 816471e426 Fix unit tests, lint, drop Python 2 support
Drop EOL Python 2 support. Resolve unit test discrepancies.
Update unit tests to pass in Travis across all supported Py.
Linting.
2020-04-18 17:25:47 -07:00
Vinayak Mehta a38d52c7b2 Fix plot tests 2018-12-20 15:44:28 +05:30
Vinayak Mehta b56d2246ad Add new plot type tests 2018-12-12 08:09:52 +05:30
Vinayak Mehta 619ce2e2a4 Fix grid plot baseline image 2018-12-07 20:22:56 +05:30
Vinayak Mehta db3f8c6897
[MRG] Make matplotlib optional (#190)
* Rename png files

* Convert plot to PlotMethods class and update docs

* Update test

* Update setup.py and docs

* Refactor PlotMethods

* Make matplotlib optional

* Raise ImportError in cli
2018-11-02 23:16:03 +05:30
Suyash Behera c0e9235164 [MRG + 1] Create a new figure and test each plot type #127 (#179)
* [MRG] Create a new figure and test each plot type #127

 - move `plot()` to `plotting.py` as `plot_pdf()`
 - modify plotting functions to return matplotlib figures
 - add `test_plotting.py` and baseline images
 - import `plot_pdf()` in `__init__`
 - update `cli.py` to use `plot_pdf()`
 - update advanced usage docs to reflect changes

* Change matplotlib backend for image comparison tests

* Update plotting and tests
 - use matplotlib rectangle instead of `cv2.rectangle` in
`plot_contour()`
 - set matplotlib backend in `tests/__init__`
 - update contour plot baseline image
 - update `test_plotting` with more checks

* Update plot tests and config
 - remove unnecessary asserts
 - update setup.cfg and makefile with `--mpl`

* Add  to

* Add tolerance

* remove text from baseline plots
update plot tests with `remove_text`

* Change method name, update docs and add pep8

* Update docs
2018-11-02 20:57:02 +05:30