Frh
4b3eee4b05
Linting
2020-06-11 17:20:37 -07:00
Frh
55fd459634
Minor linting
2020-06-11 17:20:37 -07:00
Frh
ada4809a59
Improve column detection for hybrid flavor
...
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
2020-06-11 17:20:37 -07:00
Frh
e31e978ebe
Fix off by one error in column identification
2020-06-11 17:20:37 -07:00
Frh
21dc6a46a0
Improve hybrid table body discovery algo
...
While searching for table body boundaries, exclude rows that include
cells crossing previously discovered rows.
2020-06-11 17:20:37 -07:00
Frh
e1572a10c9
Linting
2020-06-11 17:20:36 -07:00
Frh
9eb4f65fc9
Remove f-strings, fix url based unit tests
...
f-strings fail unit tests in Python <3.7, removed them for .format.
Made download_url simulate Mozilla/5.0 to restore unit tests, since
server targetted was 403ing.
2020-06-11 17:20:36 -07:00
Frh
a401d33fd9
Refactor out _text_bbox
2020-06-11 17:20:36 -07:00
Frh
7ad5b843ab
Move generic code to utils
2020-06-11 17:20:36 -07:00
Frh
14cd328644
Refactor common code hybrid / stream
2020-06-11 17:20:36 -07:00
Frh
db645627ff
Prefer showing diffs at the row level
2020-06-11 17:20:36 -07:00
Frh
a2a831110e
Fix in table diff
2020-06-11 17:20:36 -07:00
Frh
1a47c3df89
Prettier plotting, improve gaps calculation
2020-06-11 17:20:36 -07:00
Frh
e0e3ff4e07
Add support for region/area for hybrid
2020-06-11 17:20:36 -07:00
Frh
64576fd836
More refactoring / linting
2020-06-11 17:20:36 -07:00
Frh
f37ed50fed
More linting, refactor
2020-06-11 17:20:36 -07:00
Frh
20f18b478f
Lint, refactor
2020-06-11 17:20:36 -07:00
Frh
37483ca202
Prep work for new hybrid parser introduction
...
Refactor parsers by moving common code to the base class
Maintain Python 3.5 compatibility by removing f"{}"
2020-06-11 17:20:36 -07:00
Frh
161f71230d
Refactor base classes and improve plotting
...
Move common code to base class to reduce duplication
Stream plots display pdf background for better context
2020-06-11 17:20:36 -07:00
Frh
bd2aab5b2d
Fix unit tests, lint, drop Python 2 support
...
Drop EOL Python 2 support. Resolve unit test discrepancies.
Update unit tests to pass in Travis across all supported Py.
Linting.
2020-06-11 17:20:35 -07:00
Vinayak Mehta
a97b50ef21
Update flavor kwargs
2019-07-06 22:59:51 +05:30
Dimiter Naydenov
240ea6c411
Fixed strip_text argument getting ignored
2019-07-04 12:12:52 +03:00
Vinayak Mehta
2115a0e177
Blacken code
2019-07-03 23:47:42 +05:30
Vinayak Mehta
ce727d9558
Fix split text bug
2019-03-22 02:28:29 +05:30
Vinayak Mehta
03f301b25c
Add table regions support
2019-01-04 19:17:54 +05:30
Vinayak Mehta
9d90cadac0
Fix variable name
2019-01-03 15:47:05 +05:30
Vinayak Mehta
f605bd8f94
Fix #239
2019-01-03 14:55:47 +05:30
Vinayak Mehta
62ed4753cd
Make python2 compat
2018-12-24 13:10:48 +05:30
Vinayak Mehta
2b3461deab
Add support to read from url
2018-12-24 12:55:52 +05:30
Vinayak Mehta
50b4468aff
Rename kwargs and add tests
2018-12-21 15:09:37 +05:30
Vinayak Mehta
f6aa21c31f
Add strip_text
2018-12-20 16:32:16 +05:30
Vinayak Mehta
ca6cefa362
Add extra_kwargs
2018-12-17 11:49:05 +05:30
Vinayak Mehta
5e71f0b0e6
Fix #192
2018-12-13 12:50:30 +05:30
Oshawk
90aaba6eec
[MRG + 1] Make pep8 ( #125 )
...
* Make setup.py pep8
Add new line at end of file, fix bare except, remove unused import.
* Make tests/*.py pep8
Add some newlines at and of files and a visual indent.
* Make docs/*.py pep8
Fix block comments and add new lines at end of files.
* Make camelot/*.py pep8
Fixed unused import, a few weirdly ordered imports, a docstring typo and many new lines at the end of lines.
* Fix imports
Fix import order and remove a couple more unused imports.
* Fix indents
Fix indentation (no opening delimiter alignment).
* Add newlines
2018-10-05 16:55:43 +05:30
Vinayak Mehta
6e8079df84
[MRG] Add tests for output formats and parser kwargs ( #126 )
...
* Remove unused image processing code
* Add opencv back-compat comment
* Add tests for parser special cases
* Fix lattice table area test
* Add tests for output format
* Add openpyxl dep
2018-10-05 16:15:30 +05:30
Vinayak Mehta
c5bde5e2ad
[MRG] Add error/warning tests ( #113 )
...
* Add unknown flavor test
* Add input kwargs test
* Remove unused utils
* Add unsupported format test
* Add stream unequal tables-columns length test
* Add python3 compat
* Add no tables found test
* Convert util info log to warning
2018-10-02 19:28:42 +05:30
Vinayak Mehta
fc0542bd3c
Add Python 3 compatibility ( #109 )
...
* Add python3 compat
* Update .gitignore
* Update .gitignore again
* Remove debugging return
* Add unicode_literals import
* Bump version
* Add python3-tk note
2018-09-28 21:58:29 +05:30
Vinayak Mehta
3170a9689f
Add flavors
2018-09-23 10:53:32 +05:30
Vinayak Mehta
17ea5f335e
Fix docstrings and interlinks
2018-09-11 08:31:37 +05:30
Vinayak Mehta
7bb1aee9b6
Add CLI
2018-09-10 15:16:41 +05:30
Vinayak Mehta
d3beaafc99
Add temporary directory context manager
2018-09-09 18:10:55 +05:30
Vinayak Mehta
9a6ed555c8
Fix get_rotation
2018-09-09 10:04:54 +05:30
Vinayak Mehta
9878de4dfc
Add docstrings and update docs
2018-09-09 10:00:22 +05:30
Vinayak Mehta
04383920b4
Rename parser keyword arguments
2018-09-08 05:38:43 +05:30
Vinayak Mehta
b3f840bba9
Change utils function names
2018-09-07 06:04:45 +05:30
Vinayak Mehta
20acda2259
Fix current logging
2018-09-07 05:53:19 +05:30
Vinayak Mehta
b91df8a1b8
Create parsers module
2018-09-06 06:13:58 +05:30
Vinayak Mehta
96af09d9cd
Add BaseParser and refactor extract_tables
2018-09-06 05:28:34 +05:30
Vinayak Mehta
a4d3165e94
Add docstring stubs
2018-09-05 19:35:46 +05:30
Vinayak Mehta
bf63432494
Remove docstrings
2018-09-05 19:04:40 +05:30