Commit Graph

197 Commits (6d62c849544f6dc635e63ac62efea505a047b1a0)

Author SHA1 Message Date
Frh 6d62c84954 Revert "Rename table_bbox (singular) to table_areas"
This reverts commit 49d3f0f3aa.
2020-04-10 17:41:55 -07:00
Frh 49d3f0f3aa Rename table_bbox (singular) to table_areas
The object is an index of bounding boxes, in some cases given by users.
It's called areas in one section of the code making it systematic.
2020-04-10 16:34:30 -07:00
Frh 467c4a3de0 Moved duplicated common code to base objects
* Move table initialization common areas to BaseParser
* Stop relying on intermediate file name for source page index
* Create table comparison utility function to help in debugging
* Generate pdf as images in stream mode plots
* Fix pylint errors
2020-04-10 16:02:00 -07:00
Francois Huet f54e1563e1 Lint and address PDFMiner version impact on tests 2020-04-06 12:47:23 -07:00
Francois Huet f0b2cffb17 Replace constant padding with expansion heuristic
Fixed all unit tests.
Removed constant padding added around tables in the last step of the
initial discovery mode of the stream algorithm.
Replaced it with a heuristic that attempts to expand the table up while
respecting columns identified so far.
Updated unit tests to reflect new behavior, improved rejection of
extraneous information in few cases.
Added unit test covering a use case where the header has vertical test.
Made improvements to better support vertical text in tables.
2020-04-05 17:05:06 -07:00
Francois Huet 00d5d2ede4 [WIP] Remove heuristic of 5* row height
Removed the heuristic that pads height by 5x the row height.
Updated the 4 unit tests that got better results based on it.
Still do: fix the 6 unit tests that got broken, plus my new target.
2020-04-04 14:09:12 -07:00
Dimiter Naydenov b2929a9e92
Merge pull request #34 from KOLANICH/win_ghostscript_callback_fix
Fixed calling convention of callback functions
2019-07-24 13:39:18 +03:00
KOLANICH 5687fbc8b2 Fixed calling convention of callback functions 2019-07-16 21:08:34 +03:00
KOLANICH 9e356b1b0a Fixed library discovery on Windows 2019-07-16 21:07:23 +03:00
Vinayak Mehta 0efb3ca1b0 Update HISTORY.md and bump version 2019-07-07 16:07:28 +05:30
Vinayak Mehta a97b50ef21 Update flavor kwargs 2019-07-06 22:59:51 +05:30
Dimiter Naydenov 0f8cda4793
Merge pull request #5 from camelot-dev/fix-cli-group-name
[MRG] No need to monkey-patch Click.HelpFormatter
2019-07-04 18:26:35 +03:00
Dimiter Naydenov 13616c2fb4 No need to monkey-patch Click.HelpFormatter 2019-07-04 13:13:32 +03:00
Dimiter Naydenov 240ea6c411 Fixed strip_text argument getting ignored 2019-07-04 12:12:52 +03:00
Vinayak Mehta 16ddd10644
Update image_processing.py 2019-07-04 00:06:46 +05:30
Vinayak Mehta 2115a0e177 Blacken code 2019-07-03 23:47:42 +05:30
Vinayak Mehta de3281c1b6 Add test 2019-05-27 22:18:23 +05:30
Vinayak Mehta b2a8348f13 Fix #312 2019-05-26 17:13:59 +05:30
Vinayak Mehta 355ae818a0
Merge branch 'master' into fix-split-bug 2019-04-20 21:06:47 +05:30
Vinayak Mehta ce727d9558 Fix split text bug 2019-03-22 02:28:29 +05:30
Sym Roe 8446271aa4
Always sort TableList after reading PDF 2019-02-25 09:48:47 +00:00
Sym Roe c019e582bf
Add __lt__ to Table to allow sorting
Refs #277
2019-02-25 09:20:09 +00:00
yatintaluja 6c4b468800 Fix #245 2019-01-16 16:33:17 +05:30
yatintaluja 5330620ea2 Bump version 2019-01-16 16:30:05 +05:30
Vinayak Mehta 45ae980988 Bump version 2019-01-06 13:00:08 +05:30
Vinayak Mehta 215e5ea2a5 Move ghostscript import 2019-01-06 01:50:54 +05:30
Vinayak Mehta 9d38b2f5af Bump version 2019-01-05 13:23:31 +05:30
Vinayak Mehta ab5391c76f Merge branch 'master' of github.com:socialcopsdev/camelot into replace-gs-c-api 2019-01-05 11:22:38 +05:30
Vinayak Mehta 506cec7f6b Add sqlite support 2019-01-05 01:50:27 +05:30
Vinayak Mehta f94777038a Update stream table regions logic 2019-01-04 20:27:53 +05:30
Vinayak Mehta eaca147b9d Apply mask at threshold level 2019-01-04 20:15:41 +05:30
Vinayak Mehta 03f301b25c Add table regions support 2019-01-04 19:17:54 +05:30
Vinayak Mehta 605ffdd444 Add test 2019-01-03 16:13:41 +05:30
Vinayak Mehta 9d90cadac0 Fix variable name 2019-01-03 15:47:05 +05:30
Vinayak Mehta f605bd8f94 Fix #239 2019-01-03 14:55:47 +05:30
Vinayak Mehta 7a0acd7929 Update CLI 2019-01-02 16:36:25 +05:30
Vinayak Mehta 859610e0dc Add pages test 2019-01-02 16:35:49 +05:30
Vinayak Mehta ea5747c5c4 Bump version 2018-12-24 15:51:29 +05:30
Vinayak Mehta 62ed4753cd Make python2 compat 2018-12-24 13:10:48 +05:30
Vinayak Mehta 2b3461deab Add support to read from url 2018-12-24 12:55:52 +05:30
Vinayak Mehta 27fa226c71 Fix merge conflict 2018-12-22 11:07:24 +05:30
Vinayak Mehta 50b4468aff Rename kwargs and add tests 2018-12-21 15:09:37 +05:30
Vinayak Mehta f6aa21c31f Add strip_text 2018-12-20 16:32:16 +05:30
Vinayak Mehta 3f5af18738 Add resolution 2018-12-20 15:01:29 +05:30
Vinayak Mehta e0090fbb0a Add edge close tolerance 2018-12-20 13:58:54 +05:30
Vinayak Mehta 48b2dce633 Update advanced docs 2018-12-19 18:19:39 +05:30
Vinayak Mehta 736fb25b56 Change gs resolution 2018-12-18 20:47:09 +05:30
Vinayak Mehta 9e79a795b8 Add GhostscriptError 2018-12-18 09:14:21 +05:30
Vinayak Mehta 4938c48853 Remove _errors and ghostscript test 2018-12-18 07:43:52 +05:30
Vinayak Mehta 9879a87c6f Add ghostscript 2018-12-17 19:09:57 +05:30