Commit Graph

224 Commits (016776939ea2dc59da58f6b5e38fe079928985bb)

Author SHA1 Message Date
Frh 016776939e Plot improvements, address 132
Plot takes an optional axes parameter, allowing notebooks more
flexibility.
Header heuristic in hybrid won't include headers which span the
entire table.
Added unit test for issue #132

Fixes https://github.com/camelot-dev/camelot/issues/132
2020-04-25 20:51:00 -07:00
Frh 84ec5c6acd Rename member for clarity, fixed unit test
_textlines_alignments becomes _textline_to_alignments
2020-04-25 17:15:16 -07:00
Frh 22f4287788 Improve edgeplot for hybrid 2020-04-25 13:31:10 -07:00
Frh bb842f21b9 Further refactoring 2020-04-24 21:11:31 -07:00
Frh f42557ab8b Common parent TextBaseParser for Stream and Hybrid 2020-04-24 15:54:58 -07:00
Frh 5290fb6a7d Refactor out _text_bbox 2020-04-24 15:18:38 -07:00
Frh 8ad9e569cf Further simplification 2020-04-24 12:48:51 -07:00
Frh efe81292ca Enforce text_edge as subcase of text_alignment
TextNetworks is a list of TextAlignments
2020-04-24 12:42:13 -07:00
Frh 58b2c1d0fd Define TextEdge as a bounded TextAlignment 2020-04-23 18:26:55 -07:00
Frh 5db49d4fde More refactoring across stream and hybrid.
Stream now much faster, whole test is 72s instead of 92s
2020-04-23 14:42:13 -07:00
Frh adb14d3522 Refactoring TextEdges code across hybrid and stream 2020-04-23 12:55:09 -07:00
Frh 414708d8c7 Move generic code to utils 2020-04-22 19:08:06 -07:00
Frh 36d5a09ad6 Refactor common code hybrid / stream 2020-04-22 17:33:15 -07:00
Frh 489e996bd8 Address last unit test 2020-04-22 16:02:49 -07:00
Frh 7b0ac03f8e Prefer showing diffs at the row level 2020-04-22 14:50:45 -07:00
Frh df3d28837d Loosen cells header expansion algorithm
Accept cells if they're at least 50% within the table's bounds.
2020-04-22 14:24:47 -07:00
Frh 0be58de1cb Fix in table diff 2020-04-22 14:23:52 -07:00
Frh 9a82408a9a Prettier plotting, improve gaps calculation 2020-04-22 14:08:22 -07:00
Frh cd338ff4e2 Draw parse constraints for easier debug
* Display regions and areas rectangles
2020-04-21 14:24:44 -07:00
Frh ad27a11d35 Refactor code in plotting 2020-04-21 13:57:12 -07:00
Frh fb69bd9299 Improve hybrid plotting
* plot info passed through debug_info
* display each text edge
2020-04-20 16:54:06 -07:00
Frh 175655d31b Add support for region/area for hybrid 2020-04-20 11:20:59 -07:00
Frh 57c5957bad Interim check-in, test failing and lots of todos 2020-04-19 18:26:38 -07:00
Frh 69c7728867 More linting 2020-04-19 17:05:33 -07:00
Frh 89fe090ec4 Linting 2020-04-19 16:40:14 -07:00
Frh d520a77bb7 Initial Hybrid parser, for now identical to Stream 2020-04-19 16:27:01 -07:00
Frh 58823e57e9 More refactoring / linting 2020-04-19 15:41:45 -07:00
Frh c27a8026d6 More linting, refactor 2020-04-19 14:42:18 -07:00
Frh 50f11867af Lint, refactor 2020-04-19 14:30:32 -07:00
Frh cff7a9698b Further refactor
Move common parse error stats computation to base parser
Move copy_spanning_text logic to the table
2020-04-19 13:28:17 -07:00
Frh 583868756a Prep work for new hybrid parser introduction
Refactor parsers by moving common code to the base class
Maintain Python 3.5 compatibility by removing f"{}"
2020-04-19 11:32:22 -07:00
Frh 697289e409 Refactor base classes and improve plotting
Move common code to base class to reduce duplication
Stream plots display pdf background for better context
2020-04-18 23:03:27 -07:00
Frh 816471e426 Fix unit tests, lint, drop Python 2 support
Drop EOL Python 2 support. Resolve unit test discrepancies.
Update unit tests to pass in Travis across all supported Py.
Linting.
2020-04-18 17:25:47 -07:00
Dimiter Naydenov b2929a9e92
Merge pull request #34 from KOLANICH/win_ghostscript_callback_fix
Fixed calling convention of callback functions
2019-07-24 13:39:18 +03:00
KOLANICH 5687fbc8b2 Fixed calling convention of callback functions 2019-07-16 21:08:34 +03:00
KOLANICH 9e356b1b0a Fixed library discovery on Windows 2019-07-16 21:07:23 +03:00
Vinayak Mehta 0efb3ca1b0 Update HISTORY.md and bump version 2019-07-07 16:07:28 +05:30
Vinayak Mehta a97b50ef21 Update flavor kwargs 2019-07-06 22:59:51 +05:30
Dimiter Naydenov 0f8cda4793
Merge pull request #5 from camelot-dev/fix-cli-group-name
[MRG] No need to monkey-patch Click.HelpFormatter
2019-07-04 18:26:35 +03:00
Dimiter Naydenov 13616c2fb4 No need to monkey-patch Click.HelpFormatter 2019-07-04 13:13:32 +03:00
Dimiter Naydenov 240ea6c411 Fixed strip_text argument getting ignored 2019-07-04 12:12:52 +03:00
Vinayak Mehta 16ddd10644
Update image_processing.py 2019-07-04 00:06:46 +05:30
Vinayak Mehta 2115a0e177 Blacken code 2019-07-03 23:47:42 +05:30
Vinayak Mehta de3281c1b6 Add test 2019-05-27 22:18:23 +05:30
Vinayak Mehta b2a8348f13 Fix #312 2019-05-26 17:13:59 +05:30
Vinayak Mehta 355ae818a0
Merge branch 'master' into fix-split-bug 2019-04-20 21:06:47 +05:30
Vinayak Mehta ce727d9558 Fix split text bug 2019-03-22 02:28:29 +05:30
Sym Roe 8446271aa4
Always sort TableList after reading PDF 2019-02-25 09:48:47 +00:00
Sym Roe c019e582bf
Add __lt__ to Table to allow sorting
Refs #277
2019-02-25 09:20:09 +00:00
yatintaluja 6c4b468800 Fix #245 2019-01-16 16:33:17 +05:30