Vinayak Mehta
ec21904595
Merge pull request #219 from Arnie97/master
...
[MRG] Add line_overlap and boxes_flow to LAParams
2021-06-15 03:53:40 +05:30
Vinayak Mehta
2c59e7b0f7
Blacken code
2021-06-15 03:29:35 +05:30
Arnie97
0dee385578
Add line_overlap and boxes_flow to LAParams
2020-12-17 22:12:24 +08:00
Eduardo Gonzalez Lopez de Murillas
7695d35449
Fix #15 extraction of cell data discarding overlapping text boxes
2020-10-27 18:06:57 +01:00
pevisscher
aae2c6b3d4
use correct re.sub signature
...
`text_strip` currently passes the regex flags as the count parameters, which is hardcoded to `re.UNICODE` (value 32), and thus only replaces the first 32 values.
see https://docs.python.org/3/library/re.html#re.sub for the signature
2020-08-24 16:51:06 +02:00
Vinayak Mehta
fbe576ffcb
Revert the changes in v0.8.1
2020-07-27 17:38:14 +05:30
Vinayak Mehta
a13e2f6f1f
Change error name and update pdfminer.six version
2020-07-21 21:21:01 +05:30
Vinayak Mehta
52b2a595b4
Add f-strings and remove python3.5 test job
2020-05-24 18:14:43 +05:30
Vinayak Mehta
f725f04223
Remove future imports
2020-05-24 17:33:13 +05:30
Vinayak Mehta
3afb72b872
Fix read_pdf(url) and test data
2020-05-24 17:26:52 +05:30
Vinayak Mehta
a97b50ef21
Update flavor kwargs
2019-07-06 22:59:51 +05:30
Dimiter Naydenov
240ea6c411
Fixed strip_text argument getting ignored
2019-07-04 12:12:52 +03:00
Vinayak Mehta
2115a0e177
Blacken code
2019-07-03 23:47:42 +05:30
Vinayak Mehta
ce727d9558
Fix split text bug
2019-03-22 02:28:29 +05:30
Vinayak Mehta
03f301b25c
Add table regions support
2019-01-04 19:17:54 +05:30
Vinayak Mehta
9d90cadac0
Fix variable name
2019-01-03 15:47:05 +05:30
Vinayak Mehta
f605bd8f94
Fix #239
2019-01-03 14:55:47 +05:30
Vinayak Mehta
62ed4753cd
Make python2 compat
2018-12-24 13:10:48 +05:30
Vinayak Mehta
2b3461deab
Add support to read from url
2018-12-24 12:55:52 +05:30
Vinayak Mehta
50b4468aff
Rename kwargs and add tests
2018-12-21 15:09:37 +05:30
Vinayak Mehta
f6aa21c31f
Add strip_text
2018-12-20 16:32:16 +05:30
Vinayak Mehta
ca6cefa362
Add extra_kwargs
2018-12-17 11:49:05 +05:30
Vinayak Mehta
5e71f0b0e6
Fix #192
2018-12-13 12:50:30 +05:30
Oshawk
90aaba6eec
[MRG + 1] Make pep8 ( #125 )
...
* Make setup.py pep8
Add new line at end of file, fix bare except, remove unused import.
* Make tests/*.py pep8
Add some newlines at and of files and a visual indent.
* Make docs/*.py pep8
Fix block comments and add new lines at end of files.
* Make camelot/*.py pep8
Fixed unused import, a few weirdly ordered imports, a docstring typo and many new lines at the end of lines.
* Fix imports
Fix import order and remove a couple more unused imports.
* Fix indents
Fix indentation (no opening delimiter alignment).
* Add newlines
2018-10-05 16:55:43 +05:30
Vinayak Mehta
6e8079df84
[MRG] Add tests for output formats and parser kwargs ( #126 )
...
* Remove unused image processing code
* Add opencv back-compat comment
* Add tests for parser special cases
* Fix lattice table area test
* Add tests for output format
* Add openpyxl dep
2018-10-05 16:15:30 +05:30
Vinayak Mehta
c5bde5e2ad
[MRG] Add error/warning tests ( #113 )
...
* Add unknown flavor test
* Add input kwargs test
* Remove unused utils
* Add unsupported format test
* Add stream unequal tables-columns length test
* Add python3 compat
* Add no tables found test
* Convert util info log to warning
2018-10-02 19:28:42 +05:30
Vinayak Mehta
fc0542bd3c
Add Python 3 compatibility ( #109 )
...
* Add python3 compat
* Update .gitignore
* Update .gitignore again
* Remove debugging return
* Add unicode_literals import
* Bump version
* Add python3-tk note
2018-09-28 21:58:29 +05:30
Vinayak Mehta
3170a9689f
Add flavors
2018-09-23 10:53:32 +05:30
Vinayak Mehta
17ea5f335e
Fix docstrings and interlinks
2018-09-11 08:31:37 +05:30
Vinayak Mehta
7bb1aee9b6
Add CLI
2018-09-10 15:16:41 +05:30
Vinayak Mehta
d3beaafc99
Add temporary directory context manager
2018-09-09 18:10:55 +05:30
Vinayak Mehta
9a6ed555c8
Fix get_rotation
2018-09-09 10:04:54 +05:30
Vinayak Mehta
9878de4dfc
Add docstrings and update docs
2018-09-09 10:00:22 +05:30
Vinayak Mehta
04383920b4
Rename parser keyword arguments
2018-09-08 05:38:43 +05:30
Vinayak Mehta
b3f840bba9
Change utils function names
2018-09-07 06:04:45 +05:30
Vinayak Mehta
20acda2259
Fix current logging
2018-09-07 05:53:19 +05:30
Vinayak Mehta
b91df8a1b8
Create parsers module
2018-09-06 06:13:58 +05:30
Vinayak Mehta
96af09d9cd
Add BaseParser and refactor extract_tables
2018-09-06 05:28:34 +05:30
Vinayak Mehta
a4d3165e94
Add docstring stubs
2018-09-05 19:35:46 +05:30
Vinayak Mehta
bf63432494
Remove docstrings
2018-09-05 19:04:40 +05:30
Vinayak Mehta
9124e3374c
Add properties to Table
2018-09-05 18:20:46 +05:30
Vinayak Mehta
e252e476b9
Add better y-cuts detection
2017-04-25 18:44:53 +05:30
Vinayak Mehta
5c5bd6199c
Fix warnings and exceptions
2017-04-21 14:20:33 +05:30
Vinayak Mehta
4da754ddcb
[ENH] Add OCR and better joint detection
...
* Add iterations for dilation
* Add OCRLattice and OCRStream
* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta
72233f25ce
Parameterize thresholding blocksize and constant
2017-04-10 21:15:54 +05:30
Vinayak Mehta
bc86346154
Don't let processes modify instance attributes
2017-02-07 22:13:33 +05:30
Vinayak Mehta
70f626373b
Cosmits
...
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta
b01edee337
Handle rotation at entry
2016-10-18 15:33:38 +05:30
Vinayak Mehta
2a203a1865
Log warning when len(header) != len(cols)
2016-10-17 18:16:39 +05:30
Vinayak Mehta
40d30c1ab9
Add superscript and subscript flagging
...
* Add superscript flagging
* Add flagging param
* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30