Commit Graph

24 Commits (6d62c849544f6dc635e63ac62efea505a047b1a0)

Author SHA1 Message Date
Frh 270c76a3e7 Pylinting of data file 2020-04-10 16:16:37 -07:00
Frh 467c4a3de0 Moved duplicated common code to base objects
* Move table initialization common areas to BaseParser
* Stop relying on intermediate file name for source page index
* Create table comparison utility function to help in debugging
* Generate pdf as images in stream mode plots
* Fix pylint errors
2020-04-10 16:02:00 -07:00
Frh dff9f5cd82 Further linting fixes flagged by DeepSource 2020-04-06 18:41:10 -07:00
Francois Huet 52e67a7f7c Fix plotting unit tests impacted by MatplotLib 2020-04-06 14:38:47 -07:00
Francois Huet f54e1563e1 Lint and address PDFMiner version impact on tests 2020-04-06 12:47:23 -07:00
Francois Huet f0b2cffb17 Replace constant padding with expansion heuristic
Fixed all unit tests.
Removed constant padding added around tables in the last step of the
initial discovery mode of the stream algorithm.
Replaced it with a heuristic that attempts to expand the table up while
respecting columns identified so far.
Updated unit tests to reflect new behavior, improved rejection of
extraneous information in few cases.
Added unit test covering a use case where the header has vertical test.
Made improvements to better support vertical text in tables.
2020-04-05 17:05:06 -07:00
Francois Huet 00d5d2ede4 [WIP] Remove heuristic of 5* row height
Removed the heuristic that pads height by 5x the row height.
Updated the 4 unit tests that got better results based on it.
Still do: fix the 6 unit tests that got broken, plus my new target.
2020-04-04 14:09:12 -07:00
Francois Huet 912efd2c9b Add failing test case for vertical headers
The unit test represents an issue I'm trying to address.
2020-04-04 13:45:16 -07:00
Francois Huet 0af212c483 Fix layout_kwargs unit test 2020-04-04 13:22:21 -07:00
Francois Huet 0f17658f48 Make unit test stream_split_text pass
TODO: the expectations of the test were and are still wrong.
It shouldn't include the header.
2020-04-04 12:44:51 -07:00
Francois Huet 73bedef7b7 Fix expectations for two tables unit test
Removed extraneous header and footer expectations.
Fixed a minor space discrepancy that's unconsequential.
TODO: the expectation of the test is still wrong. It shouldn't include the heading paragraph.
2020-04-04 12:38:45 -07:00
Francois Huet 50fea25567 Fix expectations for health pdf test.
What: Removed the page header from the test expectation.
Why: the page header isn't part of the table.
2020-04-04 12:19:36 -07:00
Dimiter Naydenov 240ea6c411 Fixed strip_text argument getting ignored 2019-07-04 12:12:52 +03:00
Vinayak Mehta d064f716e9 Add lattice test 2019-01-04 20:22:14 +05:30
Vinayak Mehta 50b4468aff Rename kwargs and add tests 2018-12-21 15:09:37 +05:30
Vinayak Mehta 17d48be46e Add test 2018-12-19 18:31:54 +05:30
Vinayak Mehta d83d5fae42 Fix tests
Fix tests
2018-12-13 16:06:48 +05:30
Vinayak Mehta ff4d8ce228 Add test for arabic 2018-12-13 13:13:07 +05:30
Vinayak Mehta 1f71513004 Fix no table found warning and add tests for two tables 2018-11-23 19:28:55 +05:30
Vinayak Mehta bf894116d2 Update test data 2018-11-23 04:25:04 +05:30
Parth P Panchal 32df09ad1c Renames the keyword `table_area` to `table_areas` (#171)
`table_areas` sounds more apt since it is a list and there can be
multiple table areas on a page.

Closes #165
2018-10-24 23:06:53 +05:30
Oshawk 90aaba6eec [MRG + 1] Make pep8 (#125)
* Make setup.py pep8

Add new line at end of file, fix bare except, remove unused import.

* Make tests/*.py pep8

Add some newlines at and of files and a visual indent.

* Make docs/*.py pep8

Fix block comments and add new lines at end of files.

* Make camelot/*.py pep8

Fixed unused import, a few weirdly ordered imports, a docstring typo and  many new lines at the end of lines.

* Fix imports

Fix import order and remove a couple more unused imports.

* Fix indents

Fix indentation (no opening delimiter alignment).

* Add newlines
2018-10-05 16:55:43 +05:30
Vinayak Mehta 6e8079df84
[MRG] Add tests for output formats and parser kwargs (#126)
* Remove unused image processing code

* Add opencv back-compat comment

* Add tests for parser special cases

* Fix lattice table area test

* Add tests for output format

* Add openpyxl dep
2018-10-05 16:15:30 +05:30
Vinayak Mehta 9537143fe0 Add pytest-cov
Add fix for coverage

Add source and omit to coveragerc

Update coveragerc

Update coveragerc

Add source to coveragerc

Update coveragerc source

Add init to tests

Fix ImportError

Fix ImportError again
2018-10-02 22:37:38 +05:30