Vinayak Mehta
094be1a1dd
Add better table detection image
2018-09-12 02:29:25 +05:30
Vinayak Mehta
dc533e73e2
Add agstat to benchmark
2018-09-12 02:05:34 +05:30
Vinayak Mehta
17ea5f335e
Fix docstrings and interlinks
2018-09-11 08:31:37 +05:30
Vinayak Mehta
656808b8e2
Fix setup.py
2018-09-11 08:31:37 +05:30
Vinayak Mehta
118aac47bc
Merge pull request #99 from socialcopsdev/cli
...
Add CLI
2018-09-10 16:06:14 +05:30
Vinayak Mehta
544e0c9c3f
Update CLI help and README
2018-09-10 16:05:51 +05:30
Vinayak Mehta
7bb1aee9b6
Add CLI
2018-09-10 15:16:41 +05:30
Vinayak Mehta
1b013178a8
Add docstrings to table to_format methods
2018-09-09 18:41:40 +05:30
Vinayak Mehta
d3beaafc99
Add temporary directory context manager
2018-09-09 18:10:55 +05:30
Vinayak Mehta
9a6ed555c8
Fix get_rotation
2018-09-09 10:04:54 +05:30
Vinayak Mehta
9878de4dfc
Add docstrings and update docs
2018-09-09 10:00:22 +05:30
Vinayak Mehta
c91a9bb36d
Add future import
2018-09-09 05:36:07 +05:30
Vinayak Mehta
7c3e531b07
Port tests
2018-09-09 05:29:24 +05:30
Vinayak Mehta
04383920b4
Rename parser keyword arguments
2018-09-08 05:38:43 +05:30
Vinayak Mehta
e615580e55
Fix plot_geometry
2018-09-07 06:25:13 +05:30
Vinayak Mehta
b3f840bba9
Change utils function names
2018-09-07 06:04:45 +05:30
Vinayak Mehta
20acda2259
Fix current logging
2018-09-07 05:53:19 +05:30
Vinayak Mehta
09ac8f4640
Add property n to TableList
2018-09-07 05:17:09 +05:30
Vinayak Mehta
0c329634e7
Add export to TableList and Table
2018-09-07 05:13:34 +05:30
Vinayak Mehta
557189da24
Refactor core
2018-09-06 07:42:41 +05:30
Vinayak Mehta
ffeb853c55
Rename plot.py to plotting.py
2018-09-06 06:21:54 +05:30
Vinayak Mehta
42d7a4ac02
Add import os
2018-09-06 06:15:13 +05:30
Vinayak Mehta
b91df8a1b8
Create parsers module
2018-09-06 06:13:58 +05:30
Vinayak Mehta
d0005101a7
Add BaseParser docstring stub
2018-09-06 05:55:05 +05:30
Vinayak Mehta
96af09d9cd
Add BaseParser and refactor extract_tables
2018-09-06 05:28:34 +05:30
Vinayak Mehta
a4d3165e94
Add docstring stubs
2018-09-05 19:35:46 +05:30
Vinayak Mehta
bf63432494
Remove docstrings
2018-09-05 19:04:40 +05:30
Vinayak Mehta
08cbababca
Add properties to GeometryList
2018-09-05 19:00:30 +05:30
Vinayak Mehta
73e52939f5
Add parsing_report property
2018-09-05 18:50:10 +05:30
Vinayak Mehta
9124e3374c
Add properties to Table
2018-09-05 18:20:46 +05:30
Vinayak Mehta
b9d77cb983
Decouple debug geometry from tables
2018-09-05 15:18:31 +05:30
Vinayak Mehta
941994f0bf
Make present code work with new API
2018-09-04 23:34:49 +05:30
Vinayak Mehta
e3aabb720f
Add stream and lattice to parsers
2018-09-04 21:28:37 +05:30
Vinayak Mehta
5d29f0c21c
Move Pdf class to core as FileHandler
2018-09-04 07:02:30 +05:30
Vinayak Mehta
c689735da2
Move cell and table to core
2018-09-04 03:49:43 +05:30
Vinayak Mehta
72c42c74db
Remove ocr
2018-09-01 16:23:54 +05:30
Vinayak Mehta
861ed0b64e
Fix lattice fill
2017-05-05 15:02:29 +05:30
Vinayak Mehta
e252e476b9
Add better y-cuts detection
2017-04-25 18:44:53 +05:30
Vinayak Mehta
76e1d32417
Add minor fix
...
Minor fix
2017-04-24 16:53:54 +05:30
Vinayak Mehta
bef33c75b1
Fix ValueError
2017-04-21 20:15:35 +05:30
Vinayak Mehta
fdb4b0d494
Update version
2017-04-21 15:41:32 +05:30
Vinayak Mehta
5c5bd6199c
Fix warnings and exceptions
2017-04-21 14:20:33 +05:30
Vinayak Mehta
18e1a799a1
Remove remove_empty
2017-04-21 13:22:37 +05:30
Vinayak Mehta
d28e4b8c1e
Change default value for iterations
2017-04-21 13:20:48 +05:30
Vinayak Mehta
4da754ddcb
[ENH] Add OCR and better joint detection
...
* Add iterations for dilation
* Add OCRLattice and OCRStream
* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta
7246e1a73d
Parallelize pdf split
2017-04-11 18:30:05 +05:30
Vinayak Mehta
4a87a77003
Remove ncols
2017-04-11 15:50:12 +05:30
Vinayak Mehta
72233f25ce
Parameterize thresholding blocksize and constant
2017-04-10 21:15:54 +05:30
Vinayak Mehta
84d354ba10
Add deepcopy and debug scripts
2017-04-10 18:59:48 +05:30
Vinayak Mehta
3eb18ef199
More logs
2017-02-07 22:23:05 +05:30
Vinayak Mehta
bc86346154
Don't let processes modify instance attributes
2017-02-07 22:13:33 +05:30
Vinayak Mehta
970256e19d
Add OCR support for image based pdfs with lines
...
* Cosmits
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
* Add image attribute in Table and Cell
* Add OCR
* Fix coordinates
* Add table_area
* Add ocr options to cli
* Direct ghostscript call output to /dev/null
* Add ocr dostring
* Add requirements
* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta
70f626373b
Cosmits
...
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta
bd1d57a561
Update version
2017-01-07 15:50:20 +05:30
Vinayak Mehta
10eda3f204
Deprecate Stream ncolumns
2016-11-07 21:30:48 +05:30
Vinayak Mehta
72c2a0020f
Minor fix
2016-10-20 18:54:06 +05:30
Vinayak Mehta
5c6a74fb2a
Add new params
2016-10-18 18:23:35 +05:30
Vinayak Mehta
b01edee337
Handle rotation at entry
2016-10-18 15:33:38 +05:30
Vinayak Mehta
2a203a1865
Log warning when len(header) != len(cols)
2016-10-17 18:16:39 +05:30
Vinayak Mehta
adb948d363
Fix column parameter
2016-10-13 16:54:45 +05:30
Vinayak Mehta
40d30c1ab9
Add superscript and subscript flagging
...
* Add superscript flagging
* Add flagging param
* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta
e8b93a9624
Add headers param
2016-10-12 13:59:10 +05:30
Vinayak Mehta
a43d5ca2c7
Replace chars with textlines
...
* Add split function
* Add split_text and shift_text params
* Change get_rotation
* Move get_column_index to utils
* Add split_text and shift_text
* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta
52a2876ab1
Fix tarea type conversion
2016-10-04 19:57:53 +05:30
Vinayak Mehta
4b8e96a86a
Update docs
...
* Update README
* Update index.rst
* Update docstrings
* Fix typo
* Edit docs
* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta
d46eeeab1a
Change jpg to png
2016-09-27 18:37:38 +05:30
Vinayak Mehta
75c7deffaa
Minor Stream fix
2016-09-27 17:27:34 +05:30
Vinayak Mehta
79afb45e2e
Support for vertical tables in Stream
...
* Change var names
* Add test pdf
* Add tests for Lattice rotation
* Add support for vertical tables in Stream, test pdfs
* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta
8ce7b74671
Replace imagemagick with ghostscript
...
* Replace imagemagick with ghostscript
* Add quiet option
* Avoid repetition
* Remove Wand requirement
* Replace jpeg with png
2016-09-13 17:35:07 +05:30
Vinayak Mehta
757ba0444a
Remove jtol
2016-09-13 17:28:21 +05:30
Vinayak Mehta
439059817d
Update tests with new API
...
* Update Lattice tests with new API
* Update Stream tests with new API, fix CLI
* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30
Vinayak Mehta
a94c350a7b
Fix param flow
...
* Fix param flow
* Add check for None
2016-09-09 14:52:38 +05:30
Vinayak Mehta
766260d5d9
Remove hybrid.py
2016-09-08 21:17:24 +05:30
Vinayak Mehta
98f47d1bd7
Fix table_bbox when no tarea is given
2016-09-05 21:26:16 +05:30
Vinayak Mehta
d86630e70b
Add table_area
...
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta
b2dd5f68fe
Fix vertical text detection in cells
...
* Fix vertical text detection in cells
* Add Cell instance method
* Change var names
2016-09-01 01:42:27 +05:30
Vinayak Mehta
8d56f15130
Add negative tolerance
2016-08-31 22:25:33 +05:30
Vinayak Mehta
2a55621d05
Fix magic grid extension
2016-08-31 21:06:41 +05:30
Vinayak Mehta
552f9cf422
Add various metrics to score the quality of a parse
...
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta
7e5804f87d
Adds documentation
...
[MRG] Adds documentation
2016-08-09 17:23:50 +05:30
Vinayak Mehta
13568865b5
Add verbose
2016-08-03 13:14:19 +05:30
Vinayak Mehta
57917426e8
Fix docstrings
2016-08-03 13:14:11 +05:30
Vinayak Mehta
050107b63d
Minor fix
2016-07-29 21:47:20 +05:30
Vinayak Mehta
e9602bb353
Create python package
...
Add version support
Add new test file
[RFC] First phase
[RFC] Second phase
[RFC] Third phase
Add logging
Update README
Add debug
Add debug, fixes
Add pep8 changes
Add fix
Rename CLI tool
Add csv fix
Update README
Add fix for numpages
Update README
Update requirements.txt
Use yield
Add tuple unpacking fix
Fix n00b mistake
Add check for None
Fix check for None
Fix unicode
Add relative imports
2016-07-29 21:09:39 +05:30