Frh
1ccaa0630d
Improve hybrid plotting
...
* plot info passed through debug_info
* display each text edge
2020-06-11 17:20:36 -07:00
Frh
e0e3ff4e07
Add support for region/area for hybrid
2020-06-11 17:20:36 -07:00
Frh
f5fe92c22e
Interim check-in, test failing and lots of todos
2020-06-11 17:20:36 -07:00
Frh
c1c9358778
More linting
2020-06-11 17:20:36 -07:00
Frh
931b2f20f6
Try to silence bandit messages on valid asserts
2020-06-11 17:20:36 -07:00
Frh
878ef96fa7
More linting
2020-06-11 17:20:36 -07:00
Frh
07e2e1640d
Linting
2020-06-11 17:20:36 -07:00
Frh
e8e80a8cbb
Fix unit test
2020-06-11 17:20:36 -07:00
Frh
f9a6543c36
Initial Hybrid parser, for now identical to Stream
2020-06-11 17:20:36 -07:00
Frh
64576fd836
More refactoring / linting
2020-06-11 17:20:36 -07:00
Frh
8ed4cdf399
Fix unit test with plotting
2020-06-11 17:20:36 -07:00
Frh
f37ed50fed
More linting, refactor
2020-06-11 17:20:36 -07:00
Frh
20f18b478f
Lint, refactor
2020-06-11 17:20:36 -07:00
Frh
ff2ce6f47c
Further refactor
...
Move common parse error stats computation to base parser
Move copy_spanning_text logic to the table
2020-06-11 17:20:36 -07:00
Frh
37483ca202
Prep work for new hybrid parser introduction
...
Refactor parsers by moving common code to the base class
Maintain Python 3.5 compatibility by removing f"{}"
2020-06-11 17:20:36 -07:00
Frh
161f71230d
Refactor base classes and improve plotting
...
Move common code to base class to reduce duplication
Stream plots display pdf background for better context
2020-06-11 17:20:36 -07:00
Frh
bd2aab5b2d
Fix unit tests, lint, drop Python 2 support
...
Drop EOL Python 2 support. Resolve unit test discrepancies.
Update unit tests to pass in Travis across all supported Py.
Linting.
2020-06-11 17:20:35 -07:00
Vinayak Mehta
5efbcdcebb
Update requirements.txt
2020-05-24 19:04:50 +05:30
Vinayak Mehta
189fe58bf2
Update requirements.txt
2020-05-24 19:01:03 +05:30
Vinayak Mehta
1575ec1bf0
Add .readthedocs.yml
2020-05-24 18:56:33 +05:30
Vinayak Mehta
d5d6a5962b
Bump version and update HISTORY.md
2020-05-24 18:36:13 +05:30
Vinayak Mehta
420d5aa624
Merge pull request #146 from camelot-dev/add-python38-travis
...
[MRG] Fix test data and drop python2 support
2020-05-24 18:31:27 +05:30
Vinayak Mehta
a22fa63c4e
Fix syntax errors
2020-05-24 18:19:48 +05:30
Vinayak Mehta
52b2a595b4
Add f-strings and remove python3.5 test job
2020-05-24 18:14:43 +05:30
Vinayak Mehta
afa1ba7c1f
Fix test indent
2020-05-24 17:38:48 +05:30
Vinayak Mehta
f725f04223
Remove future imports
2020-05-24 17:33:13 +05:30
Vinayak Mehta
3afb72b872
Fix read_pdf(url) and test data
2020-05-24 17:26:52 +05:30
Vinayak Mehta
6dd9b6ce01
Create FUNDING.yml
2020-05-24 16:14:43 +05:30
Vinayak Mehta
fc1b6f6227
Add python38 test job for travis
2020-05-24 15:27:48 +05:30
Frh
ba5169b33d
Enable process_background option for hybrid
...
Trim empty cols and lines
2020-05-08 15:08:12 -07:00
Frh
ae429fc248
Hybrid parser fixes
...
Improve parser comparison notebook to flag identical parses, display
multiple tables correctly
Fix tolerance parameter inclusion for hybrid.
2020-05-04 18:52:11 -07:00
Frh
79ea4adcd1
Add baseline test for hybrid
...
Fix first split merge issue
2020-05-04 17:41:57 -07:00
Frh
77d289bd86
WIP: Introduce actual hybrid parser
...
Create hybrid parser leverage both lattice and network techniques.
Simplify plotting of pdf in lattice case.
Rename "parser.table_bbox" into "parser.table_bbox_parses", since it
represents not a bbox but a dict of bbox to corresponding parsing data.
Still missing: more unit tests, plotting of steps.
2020-05-04 16:27:01 -07:00
Frh
6711f877bf
Rename WIP parser "network", actual Hybrid to come
2020-05-02 16:14:03 -07:00
Frh
c7ab3a4c32
Raise tolerance of plot differences
2020-04-30 17:06:45 -07:00
Frh
d663dd18fd
Fix plotting unit tests
...
Enforce order of textline plotting for unit test consistency in 3.6.
Create wrapper around camelot plot that enforces backwards consistency
with older versions of matplotlib.
2020-04-30 16:54:37 -07:00
Frh
f3aded5b17
Linting
2020-04-29 13:52:58 -07:00
Frh
8a63e8e794
Minor linting
2020-04-29 12:31:02 -07:00
Frh
c0903b8ca9
Improve column detection for hybrid flavor
...
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
2020-04-29 11:46:40 -07:00
Frh
04fc542dc3
Fix off by one error in column identification
2020-04-29 09:45:55 -07:00
Frh
918416e7e4
Improve hybrid table body discovery algo
...
While searching for table body boundaries, exclude rows that include
cells crossing previously discovered rows.
2020-04-28 22:43:55 -07:00
Frh
3220b02ebc
Create notebook to help debug hybrid parser algo
...
Plot vertical col anchors found by hybrid parser
Include vertical text in col/row generation
2020-04-28 12:26:12 -07:00
Frh
6add19ae27
Prep for vertical text improvements
...
plot.text shows vertical text in red
_generate_columns_and_rows split between hybrid and stream
2020-04-28 11:46:12 -07:00
Frh
c51c24a416
Linting
2020-04-25 22:47:23 -07:00
Frh
a2c5ee7f06
Add parser comparizon notebook
2020-04-25 21:55:21 -07:00
Frh
30a0b2e4bc
Add Parser comparison notebook to help visualizing
2020-04-25 21:55:01 -07:00
Frh
56dd31090c
Remove another f-string
2020-04-25 21:33:15 -07:00
Frh
2624010197
Remove f-strings, fix url based unit tests
...
f-strings fail unit tests in Python <3.7, removed them for .format.
Made download_url simulate Mozilla/5.0 to restore unit tests, since
server targetted was 403ing.
2020-04-25 21:14:56 -07:00
Frh
016776939e
Plot improvements, address 132
...
Plot takes an optional axes parameter, allowing notebooks more
flexibility.
Header heuristic in hybrid won't include headers which span the
entire table.
Added unit test for issue #132
Fixes https://github.com/camelot-dev/camelot/issues/132
2020-04-25 20:51:00 -07:00
Frh
84ec5c6acd
Rename member for clarity, fixed unit test
...
_textlines_alignments becomes _textline_to_alignments
2020-04-25 17:15:16 -07:00