Commit Graph

75 Commits (6d62c849544f6dc635e63ac62efea505a047b1a0)

Author SHA1 Message Date
Frh 6d62c84954 Revert "Rename table_bbox (singular) to table_areas"
This reverts commit 49d3f0f3aa.
2020-04-10 17:41:55 -07:00
Frh 49d3f0f3aa Rename table_bbox (singular) to table_areas
The object is an index of bounding boxes, in some cases given by users.
It's called areas in one section of the code making it systematic.
2020-04-10 16:34:30 -07:00
Frh 270c76a3e7 Pylinting of data file 2020-04-10 16:16:37 -07:00
Frh 467c4a3de0 Moved duplicated common code to base objects
* Move table initialization common areas to BaseParser
* Stop relying on intermediate file name for source page index
* Create table comparison utility function to help in debugging
* Generate pdf as images in stream mode plots
* Fix pylint errors
2020-04-10 16:02:00 -07:00
Frh dff9f5cd82 Further linting fixes flagged by DeepSource 2020-04-06 18:41:10 -07:00
Francois Huet 52e67a7f7c Fix plotting unit tests impacted by MatplotLib 2020-04-06 14:38:47 -07:00
Francois Huet f54e1563e1 Lint and address PDFMiner version impact on tests 2020-04-06 12:47:23 -07:00
Francois Huet f0b2cffb17 Replace constant padding with expansion heuristic
Fixed all unit tests.
Removed constant padding added around tables in the last step of the
initial discovery mode of the stream algorithm.
Replaced it with a heuristic that attempts to expand the table up while
respecting columns identified so far.
Updated unit tests to reflect new behavior, improved rejection of
extraneous information in few cases.
Added unit test covering a use case where the header has vertical test.
Made improvements to better support vertical text in tables.
2020-04-05 17:05:06 -07:00
Francois Huet 00d5d2ede4 [WIP] Remove heuristic of 5* row height
Removed the heuristic that pads height by 5x the row height.
Updated the 4 unit tests that got better results based on it.
Still do: fix the 6 unit tests that got broken, plus my new target.
2020-04-04 14:09:12 -07:00
Francois Huet 912efd2c9b Add failing test case for vertical headers
The unit test represents an issue I'm trying to address.
2020-04-04 13:45:16 -07:00
Francois Huet fafdebb02e Updated reference plots for unit tests to pass
Reviewed differences -all appear minor-.
2020-04-04 13:39:26 -07:00
Francois Huet 0af212c483 Fix layout_kwargs unit test 2020-04-04 13:22:21 -07:00
Francois Huet 0f17658f48 Make unit test stream_split_text pass
TODO: the expectations of the test were and are still wrong.
It shouldn't include the header.
2020-04-04 12:44:51 -07:00
Francois Huet 73bedef7b7 Fix expectations for two tables unit test
Removed extraneous header and footer expectations.
Fixed a minor space discrepancy that's unconsequential.
TODO: the expectation of the test is still wrong. It shouldn't include the heading paragraph.
2020-04-04 12:38:45 -07:00
Francois Huet 50fea25567 Fix expectations for health pdf test.
What: Removed the page header from the test expectation.
Why: the page header isn't part of the table.
2020-04-04 12:19:36 -07:00
Milton Arango 8e28a0cac0 Moved the version tests to test_common PR #94
Applied black formatting
2019-11-14 20:26:20 -05:00
Milton Arango 0d1db4b09e Unit Tests for the Version Generation
Unit tests for the __version__.py generate_version method.
2019-10-26 15:41:41 -05:00
Joel Nothman 9eb15c09dc Use assert_frame_equal for more informative errors in tests 2019-08-06 11:38:44 +10:00
Dimiter Naydenov 0f8cda4793
Merge pull request #5 from camelot-dev/fix-cli-group-name
[MRG] No need to monkey-patch Click.HelpFormatter
2019-07-04 18:26:35 +03:00
Dimiter Naydenov 13616c2fb4 No need to monkey-patch Click.HelpFormatter 2019-07-04 13:13:32 +03:00
Dimiter Naydenov 240ea6c411 Fixed strip_text argument getting ignored 2019-07-04 12:12:52 +03:00
Vinayak Mehta 8866eaa3b6 Fix pytest deprecation warning 2019-07-03 22:07:10 +05:30
Vinayak Mehta 477568dea7 Fix test 2019-05-27 22:29:50 +05:30
Vinayak Mehta de3281c1b6 Add test 2019-05-27 22:18:23 +05:30
Vinayak Mehta 88466b8c4e
Rename _mk_table to _make_table 2019-03-08 21:04:34 +05:30
Sym Roe c019e582bf
Add __lt__ to Table to allow sorting
Refs #277
2019-02-25 09:20:09 +00:00
Vinayak Mehta ab5391c76f Merge branch 'master' of github.com:socialcopsdev/camelot into replace-gs-c-api 2019-01-05 11:22:38 +05:30
Vinayak Mehta d064f716e9 Add lattice test 2019-01-04 20:22:14 +05:30
Vinayak Mehta 03f301b25c Add table regions support 2019-01-04 19:17:54 +05:30
Vinayak Mehta 605ffdd444 Add test 2019-01-03 16:13:41 +05:30
Vinayak Mehta 859610e0dc Add pages test 2019-01-02 16:35:49 +05:30
Vinayak Mehta 2b3461deab Add support to read from url 2018-12-24 12:55:52 +05:30
Vinayak Mehta 27fa226c71 Fix merge conflict 2018-12-22 11:07:24 +05:30
Vinayak Mehta be1f0a2884 Update advanced docs 2018-12-21 16:32:44 +05:30
Vinayak Mehta 50b4468aff Rename kwargs and add tests 2018-12-21 15:09:37 +05:30
Vinayak Mehta a38d52c7b2 Fix plot tests 2018-12-20 15:44:28 +05:30
Vinayak Mehta 17d48be46e Add test 2018-12-19 18:31:54 +05:30
Vinayak Mehta 4938c48853 Remove _errors and ghostscript test 2018-12-18 07:43:52 +05:30
Vinayak Mehta d83d5fae42 Fix tests
Fix tests
2018-12-13 16:06:48 +05:30
Vinayak Mehta ff4d8ce228 Add test for arabic 2018-12-13 13:13:07 +05:30
Vinayak Mehta 591cfd5291 Change kwarg name 2018-12-12 10:15:04 +05:30
Vinayak Mehta e50f9c8847 Change suppress_warnings to verbose 2018-12-12 09:58:34 +05:30
Vinayak Mehta b56d2246ad Add new plot type tests 2018-12-12 08:09:52 +05:30
Vinayak Mehta 619ce2e2a4 Fix grid plot baseline image 2018-12-07 20:22:56 +05:30
Vinayak Mehta 1f71513004 Fix no table found warning and add tests for two tables 2018-11-23 19:28:55 +05:30
Vinayak Mehta bf894116d2 Update test data 2018-11-23 04:25:04 +05:30
Vinayak Mehta db3f8c6897
[MRG] Make matplotlib optional (#190)
* Rename png files

* Convert plot to PlotMethods class and update docs

* Update test

* Update setup.py and docs

* Refactor PlotMethods

* Make matplotlib optional

* Raise ImportError in cli
2018-11-02 23:16:03 +05:30
Suyash Behera c0e9235164 [MRG + 1] Create a new figure and test each plot type #127 (#179)
* [MRG] Create a new figure and test each plot type #127

 - move `plot()` to `plotting.py` as `plot_pdf()`
 - modify plotting functions to return matplotlib figures
 - add `test_plotting.py` and baseline images
 - import `plot_pdf()` in `__init__`
 - update `cli.py` to use `plot_pdf()`
 - update advanced usage docs to reflect changes

* Change matplotlib backend for image comparison tests

* Update plotting and tests
 - use matplotlib rectangle instead of `cv2.rectangle` in
`plot_contour()`
 - set matplotlib backend in `tests/__init__`
 - update contour plot baseline image
 - update `test_plotting` with more checks

* Update plot tests and config
 - remove unnecessary asserts
 - update setup.cfg and makefile with `--mpl`

* Add  to

* Add tolerance

* remove text from baseline plots
update plot tests with `remove_text`

* Change method name, update docs and add pep8

* Update docs
2018-11-02 20:57:02 +05:30
rbares 429640feea [MRG + 1] Add basic support for encrypted PDF files (#180)
* [MRG] Add basic support for encrypted PDF files

Update API and CLI to accept ASCII passwords to decrypt PDFs
encrypted by algorithm code 1 or 2 (limited by support from PyPDF2).
Update documentation and unit tests accordingly.

Example document health_protected.pdf generated as follows:
qpdf --encrypt userpass ownerpass 128 -- health.pdf health_protected.pdf

Issue #162

* Support encrypted PDF files in python3

Issue #162

* Address review comments

Explicitly check passwords for None rather than falsey.
Correct read_pdf documentation for Owner/User password.

Issue #162

* Correct API documentation changes for consistency

Issue #162

* Move error tests from test_common to test_errors

Issue #162

* Add qpdf example

* Remove password is not None check

* Fix merge conflict

* Fix pages example
2018-10-28 22:01:10 +05:30
Parth P Panchal 32df09ad1c Renames the keyword `table_area` to `table_areas` (#171)
`table_areas` sounds more apt since it is a list and there can be
multiple table areas on a page.

Closes #165
2018-10-24 23:06:53 +05:30