Commit Graph

22 Commits (26240101974cde1dc5aa87c61e7a58a2804fe401)

Author SHA1 Message Date
Frh 2624010197 Remove f-strings, fix url based unit tests
f-strings fail unit tests in Python <3.7, removed them for .format.
Made download_url simulate Mozilla/5.0 to restore unit tests, since
server targetted was 403ing.
2020-04-25 21:14:56 -07:00
Frh 016776939e Plot improvements, address 132
Plot takes an optional axes parameter, allowing notebooks more
flexibility.
Header heuristic in hybrid won't include headers which span the
entire table.
Added unit test for issue #132

Fixes https://github.com/camelot-dev/camelot/issues/132
2020-04-25 20:51:00 -07:00
Frh 84ec5c6acd Rename member for clarity, fixed unit test
_textlines_alignments becomes _textline_to_alignments
2020-04-25 17:15:16 -07:00
Frh 22f4287788 Improve edgeplot for hybrid 2020-04-25 13:31:10 -07:00
Frh bb842f21b9 Further refactoring 2020-04-24 21:11:31 -07:00
Frh f42557ab8b Common parent TextBaseParser for Stream and Hybrid 2020-04-24 15:54:58 -07:00
Frh 5290fb6a7d Refactor out _text_bbox 2020-04-24 15:18:38 -07:00
Frh 8ad9e569cf Further simplification 2020-04-24 12:48:51 -07:00
Frh efe81292ca Enforce text_edge as subcase of text_alignment
TextNetworks is a list of TextAlignments
2020-04-24 12:42:13 -07:00
Frh 58b2c1d0fd Define TextEdge as a bounded TextAlignment 2020-04-23 18:26:55 -07:00
Frh 5db49d4fde More refactoring across stream and hybrid.
Stream now much faster, whole test is 72s instead of 92s
2020-04-23 14:42:13 -07:00
Frh adb14d3522 Refactoring TextEdges code across hybrid and stream 2020-04-23 12:55:09 -07:00
Frh 414708d8c7 Move generic code to utils 2020-04-22 19:08:06 -07:00
Frh 36d5a09ad6 Refactor common code hybrid / stream 2020-04-22 17:33:15 -07:00
Frh 489e996bd8 Address last unit test 2020-04-22 16:02:49 -07:00
Frh df3d28837d Loosen cells header expansion algorithm
Accept cells if they're at least 50% within the table's bounds.
2020-04-22 14:24:47 -07:00
Frh 9a82408a9a Prettier plotting, improve gaps calculation 2020-04-22 14:08:22 -07:00
Frh fb69bd9299 Improve hybrid plotting
* plot info passed through debug_info
* display each text edge
2020-04-20 16:54:06 -07:00
Frh 175655d31b Add support for region/area for hybrid 2020-04-20 11:20:59 -07:00
Frh 57c5957bad Interim check-in, test failing and lots of todos 2020-04-19 18:26:38 -07:00
Frh 89fe090ec4 Linting 2020-04-19 16:40:14 -07:00
Frh d520a77bb7 Initial Hybrid parser, for now identical to Stream 2020-04-19 16:27:01 -07:00