Vinayak Mehta
5d29f0c21c
Move Pdf class to core as FileHandler
2018-09-04 07:02:30 +05:30
Vinayak Mehta
c689735da2
Move cell and table to core
2018-09-04 03:49:43 +05:30
Vinayak Mehta
72c42c74db
Remove ocr
2018-09-01 16:23:54 +05:30
Vinayak Mehta
861ed0b64e
Fix lattice fill
2017-05-05 15:02:29 +05:30
Vinayak Mehta
e252e476b9
Add better y-cuts detection
2017-04-25 18:44:53 +05:30
Vinayak Mehta
76e1d32417
Add minor fix
...
Minor fix
2017-04-24 16:53:54 +05:30
Vinayak Mehta
bef33c75b1
Fix ValueError
2017-04-21 20:15:35 +05:30
Vinayak Mehta
fdb4b0d494
Update version
2017-04-21 15:41:32 +05:30
Vinayak Mehta
5c5bd6199c
Fix warnings and exceptions
2017-04-21 14:20:33 +05:30
Vinayak Mehta
18e1a799a1
Remove remove_empty
2017-04-21 13:22:37 +05:30
Vinayak Mehta
d28e4b8c1e
Change default value for iterations
2017-04-21 13:20:48 +05:30
Vinayak Mehta
4da754ddcb
[ENH] Add OCR and better joint detection
...
* Add iterations for dilation
* Add OCRLattice and OCRStream
* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta
7246e1a73d
Parallelize pdf split
2017-04-11 18:30:05 +05:30
Vinayak Mehta
4a87a77003
Remove ncols
2017-04-11 15:50:12 +05:30
Vinayak Mehta
72233f25ce
Parameterize thresholding blocksize and constant
2017-04-10 21:15:54 +05:30
Vinayak Mehta
84d354ba10
Add deepcopy and debug scripts
2017-04-10 18:59:48 +05:30
Vinayak Mehta
3eb18ef199
More logs
2017-02-07 22:23:05 +05:30
Vinayak Mehta
bc86346154
Don't let processes modify instance attributes
2017-02-07 22:13:33 +05:30
Vinayak Mehta
970256e19d
Add OCR support for image based pdfs with lines
...
* Cosmits
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
* Add image attribute in Table and Cell
* Add OCR
* Fix coordinates
* Add table_area
* Add ocr options to cli
* Direct ghostscript call output to /dev/null
* Add ocr dostring
* Add requirements
* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta
70f626373b
Cosmits
...
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta
bd1d57a561
Update version
2017-01-07 15:50:20 +05:30
Vinayak Mehta
10eda3f204
Deprecate Stream ncolumns
2016-11-07 21:30:48 +05:30
Vinayak Mehta
72c2a0020f
Minor fix
2016-10-20 18:54:06 +05:30
Vinayak Mehta
5c6a74fb2a
Add new params
2016-10-18 18:23:35 +05:30
Vinayak Mehta
b01edee337
Handle rotation at entry
2016-10-18 15:33:38 +05:30
Vinayak Mehta
2a203a1865
Log warning when len(header) != len(cols)
2016-10-17 18:16:39 +05:30
Vinayak Mehta
adb948d363
Fix column parameter
2016-10-13 16:54:45 +05:30
Vinayak Mehta
40d30c1ab9
Add superscript and subscript flagging
...
* Add superscript flagging
* Add flagging param
* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta
e8b93a9624
Add headers param
2016-10-12 13:59:10 +05:30
Vinayak Mehta
a43d5ca2c7
Replace chars with textlines
...
* Add split function
* Add split_text and shift_text params
* Change get_rotation
* Move get_column_index to utils
* Add split_text and shift_text
* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta
52a2876ab1
Fix tarea type conversion
2016-10-04 19:57:53 +05:30
Vinayak Mehta
4b8e96a86a
Update docs
...
* Update README
* Update index.rst
* Update docstrings
* Fix typo
* Edit docs
* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta
d46eeeab1a
Change jpg to png
2016-09-27 18:37:38 +05:30
Vinayak Mehta
75c7deffaa
Minor Stream fix
2016-09-27 17:27:34 +05:30
Vinayak Mehta
79afb45e2e
Support for vertical tables in Stream
...
* Change var names
* Add test pdf
* Add tests for Lattice rotation
* Add support for vertical tables in Stream, test pdfs
* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta
8ce7b74671
Replace imagemagick with ghostscript
...
* Replace imagemagick with ghostscript
* Add quiet option
* Avoid repetition
* Remove Wand requirement
* Replace jpeg with png
2016-09-13 17:35:07 +05:30
Vinayak Mehta
757ba0444a
Remove jtol
2016-09-13 17:28:21 +05:30
Vinayak Mehta
439059817d
Update tests with new API
...
* Update Lattice tests with new API
* Update Stream tests with new API, fix CLI
* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30
Vinayak Mehta
a94c350a7b
Fix param flow
...
* Fix param flow
* Add check for None
2016-09-09 14:52:38 +05:30
Vinayak Mehta
766260d5d9
Remove hybrid.py
2016-09-08 21:17:24 +05:30
Vinayak Mehta
98f47d1bd7
Fix table_bbox when no tarea is given
2016-09-05 21:26:16 +05:30
Vinayak Mehta
d86630e70b
Add table_area
...
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta
b2dd5f68fe
Fix vertical text detection in cells
...
* Fix vertical text detection in cells
* Add Cell instance method
* Change var names
2016-09-01 01:42:27 +05:30
Vinayak Mehta
8d56f15130
Add negative tolerance
2016-08-31 22:25:33 +05:30
Vinayak Mehta
2a55621d05
Fix magic grid extension
2016-08-31 21:06:41 +05:30
Vinayak Mehta
552f9cf422
Add various metrics to score the quality of a parse
...
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta
7e5804f87d
Adds documentation
...
[MRG] Adds documentation
2016-08-09 17:23:50 +05:30
Vinayak Mehta
13568865b5
Add verbose
2016-08-03 13:14:19 +05:30
Vinayak Mehta
57917426e8
Fix docstrings
2016-08-03 13:14:11 +05:30
Vinayak Mehta
050107b63d
Minor fix
2016-07-29 21:47:20 +05:30