Vinayak Mehta
42d7a4ac02
Add import os
2018-09-06 06:15:13 +05:30
Vinayak Mehta
b91df8a1b8
Create parsers module
2018-09-06 06:13:58 +05:30
Vinayak Mehta
d0005101a7
Add BaseParser docstring stub
2018-09-06 05:55:05 +05:30
Vinayak Mehta
96af09d9cd
Add BaseParser and refactor extract_tables
2018-09-06 05:28:34 +05:30
Vinayak Mehta
a4d3165e94
Add docstring stubs
2018-09-05 19:35:46 +05:30
Vinayak Mehta
bf63432494
Remove docstrings
2018-09-05 19:04:40 +05:30
Vinayak Mehta
08cbababca
Add properties to GeometryList
2018-09-05 19:00:30 +05:30
Vinayak Mehta
73e52939f5
Add parsing_report property
2018-09-05 18:50:10 +05:30
Vinayak Mehta
9124e3374c
Add properties to Table
2018-09-05 18:20:46 +05:30
Vinayak Mehta
b9d77cb983
Decouple debug geometry from tables
2018-09-05 15:18:31 +05:30
Vinayak Mehta
941994f0bf
Make present code work with new API
2018-09-04 23:34:49 +05:30
Vinayak Mehta
e3aabb720f
Add stream and lattice to parsers
2018-09-04 21:28:37 +05:30
Vinayak Mehta
5d29f0c21c
Move Pdf class to core as FileHandler
2018-09-04 07:02:30 +05:30
Vinayak Mehta
c689735da2
Move cell and table to core
2018-09-04 03:49:43 +05:30
Vinayak Mehta
72c42c74db
Remove ocr
2018-09-01 16:23:54 +05:30
Vinayak Mehta
861ed0b64e
Fix lattice fill
2017-05-05 15:02:29 +05:30
Vinayak Mehta
e252e476b9
Add better y-cuts detection
2017-04-25 18:44:53 +05:30
Vinayak Mehta
76e1d32417
Add minor fix
...
Minor fix
2017-04-24 16:53:54 +05:30
Vinayak Mehta
bef33c75b1
Fix ValueError
2017-04-21 20:15:35 +05:30
Vinayak Mehta
fdb4b0d494
Update version
2017-04-21 15:41:32 +05:30
Vinayak Mehta
5c5bd6199c
Fix warnings and exceptions
2017-04-21 14:20:33 +05:30
Vinayak Mehta
18e1a799a1
Remove remove_empty
2017-04-21 13:22:37 +05:30
Vinayak Mehta
d28e4b8c1e
Change default value for iterations
2017-04-21 13:20:48 +05:30
Vinayak Mehta
4da754ddcb
[ENH] Add OCR and better joint detection
...
* Add iterations for dilation
* Add OCRLattice and OCRStream
* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta
7246e1a73d
Parallelize pdf split
2017-04-11 18:30:05 +05:30
Vinayak Mehta
4a87a77003
Remove ncols
2017-04-11 15:50:12 +05:30
Vinayak Mehta
72233f25ce
Parameterize thresholding blocksize and constant
2017-04-10 21:15:54 +05:30
Vinayak Mehta
84d354ba10
Add deepcopy and debug scripts
2017-04-10 18:59:48 +05:30
Vinayak Mehta
3eb18ef199
More logs
2017-02-07 22:23:05 +05:30
Vinayak Mehta
bc86346154
Don't let processes modify instance attributes
2017-02-07 22:13:33 +05:30
Vinayak Mehta
970256e19d
Add OCR support for image based pdfs with lines
...
* Cosmits
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
* Add image attribute in Table and Cell
* Add OCR
* Fix coordinates
* Add table_area
* Add ocr options to cli
* Direct ghostscript call output to /dev/null
* Add ocr dostring
* Add requirements
* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta
70f626373b
Cosmits
...
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta
bd1d57a561
Update version
2017-01-07 15:50:20 +05:30
Vinayak Mehta
10eda3f204
Deprecate Stream ncolumns
2016-11-07 21:30:48 +05:30
Vinayak Mehta
72c2a0020f
Minor fix
2016-10-20 18:54:06 +05:30
Vinayak Mehta
5c6a74fb2a
Add new params
2016-10-18 18:23:35 +05:30
Vinayak Mehta
b01edee337
Handle rotation at entry
2016-10-18 15:33:38 +05:30
Vinayak Mehta
2a203a1865
Log warning when len(header) != len(cols)
2016-10-17 18:16:39 +05:30
Vinayak Mehta
adb948d363
Fix column parameter
2016-10-13 16:54:45 +05:30
Vinayak Mehta
40d30c1ab9
Add superscript and subscript flagging
...
* Add superscript flagging
* Add flagging param
* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta
e8b93a9624
Add headers param
2016-10-12 13:59:10 +05:30
Vinayak Mehta
a43d5ca2c7
Replace chars with textlines
...
* Add split function
* Add split_text and shift_text params
* Change get_rotation
* Move get_column_index to utils
* Add split_text and shift_text
* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta
52a2876ab1
Fix tarea type conversion
2016-10-04 19:57:53 +05:30
Vinayak Mehta
4b8e96a86a
Update docs
...
* Update README
* Update index.rst
* Update docstrings
* Fix typo
* Edit docs
* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta
d46eeeab1a
Change jpg to png
2016-09-27 18:37:38 +05:30
Vinayak Mehta
75c7deffaa
Minor Stream fix
2016-09-27 17:27:34 +05:30
Vinayak Mehta
79afb45e2e
Support for vertical tables in Stream
...
* Change var names
* Add test pdf
* Add tests for Lattice rotation
* Add support for vertical tables in Stream, test pdfs
* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta
8ce7b74671
Replace imagemagick with ghostscript
...
* Replace imagemagick with ghostscript
* Add quiet option
* Avoid repetition
* Remove Wand requirement
* Replace jpeg with png
2016-09-13 17:35:07 +05:30
Vinayak Mehta
757ba0444a
Remove jtol
2016-09-13 17:28:21 +05:30
Vinayak Mehta
439059817d
Update tests with new API
...
* Update Lattice tests with new API
* Update Stream tests with new API, fix CLI
* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30