Commit Graph

71 Commits (c4d3cac4fb0f00965a8faf5c806f29def3e18029)

Author SHA1 Message Date
Vinayak Mehta 04383920b4 Rename parser keyword arguments 2018-09-08 05:38:43 +05:30
Vinayak Mehta e615580e55 Fix plot_geometry 2018-09-07 06:25:13 +05:30
Vinayak Mehta b3f840bba9 Change utils function names 2018-09-07 06:04:45 +05:30
Vinayak Mehta 20acda2259 Fix current logging 2018-09-07 05:53:19 +05:30
Vinayak Mehta 09ac8f4640 Add property n to TableList 2018-09-07 05:17:09 +05:30
Vinayak Mehta 0c329634e7 Add export to TableList and Table 2018-09-07 05:13:34 +05:30
Vinayak Mehta 557189da24 Refactor core 2018-09-06 07:42:41 +05:30
Vinayak Mehta ffeb853c55 Rename plot.py to plotting.py 2018-09-06 06:21:54 +05:30
Vinayak Mehta 42d7a4ac02 Add import os 2018-09-06 06:15:13 +05:30
Vinayak Mehta b91df8a1b8 Create parsers module 2018-09-06 06:13:58 +05:30
Vinayak Mehta d0005101a7 Add BaseParser docstring stub 2018-09-06 05:55:05 +05:30
Vinayak Mehta 96af09d9cd Add BaseParser and refactor extract_tables 2018-09-06 05:28:34 +05:30
Vinayak Mehta a4d3165e94 Add docstring stubs 2018-09-05 19:35:46 +05:30
Vinayak Mehta bf63432494 Remove docstrings 2018-09-05 19:04:40 +05:30
Vinayak Mehta 08cbababca Add properties to GeometryList 2018-09-05 19:00:30 +05:30
Vinayak Mehta 73e52939f5 Add parsing_report property 2018-09-05 18:50:10 +05:30
Vinayak Mehta 9124e3374c Add properties to Table 2018-09-05 18:20:46 +05:30
Vinayak Mehta b9d77cb983 Decouple debug geometry from tables 2018-09-05 15:18:31 +05:30
Vinayak Mehta 941994f0bf Make present code work with new API 2018-09-04 23:34:49 +05:30
Vinayak Mehta e3aabb720f Add stream and lattice to parsers 2018-09-04 21:28:37 +05:30
Vinayak Mehta 5d29f0c21c Move Pdf class to core as FileHandler 2018-09-04 07:02:30 +05:30
Vinayak Mehta c689735da2 Move cell and table to core 2018-09-04 03:49:43 +05:30
Vinayak Mehta 72c42c74db Remove ocr 2018-09-01 16:23:54 +05:30
Vinayak Mehta 861ed0b64e Fix lattice fill 2017-05-05 15:02:29 +05:30
Vinayak Mehta e252e476b9 Add better y-cuts detection 2017-04-25 18:44:53 +05:30
Vinayak Mehta 76e1d32417 Add minor fix
Minor fix
2017-04-24 16:53:54 +05:30
Vinayak Mehta bef33c75b1 Fix ValueError 2017-04-21 20:15:35 +05:30
Vinayak Mehta fdb4b0d494 Update version 2017-04-21 15:41:32 +05:30
Vinayak Mehta 5c5bd6199c Fix warnings and exceptions 2017-04-21 14:20:33 +05:30
Vinayak Mehta 18e1a799a1 Remove remove_empty 2017-04-21 13:22:37 +05:30
Vinayak Mehta d28e4b8c1e Change default value for iterations 2017-04-21 13:20:48 +05:30
Vinayak Mehta 4da754ddcb [ENH] Add OCR and better joint detection
* Add iterations for dilation

* Add OCRLattice and OCRStream

* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta 7246e1a73d Parallelize pdf split 2017-04-11 18:30:05 +05:30
Vinayak Mehta 4a87a77003 Remove ncols 2017-04-11 15:50:12 +05:30
Vinayak Mehta 72233f25ce Parameterize thresholding blocksize and constant 2017-04-10 21:15:54 +05:30
Vinayak Mehta 84d354ba10 Add deepcopy and debug scripts 2017-04-10 18:59:48 +05:30
Vinayak Mehta 3eb18ef199 More logs 2017-02-07 22:23:05 +05:30
Vinayak Mehta bc86346154 Don't let processes modify instance attributes 2017-02-07 22:13:33 +05:30
Vinayak Mehta 970256e19d Add OCR support for image based pdfs with lines
* Cosmits

* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value

* Add image attribute in Table and Cell

* Add OCR

* Fix coordinates

* Add table_area

* Add ocr options to cli

* Direct ghostscript call output to /dev/null

* Add ocr dostring

* Add requirements

* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta 70f626373b Cosmits
* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta bd1d57a561 Update version 2017-01-07 15:50:20 +05:30
Vinayak Mehta 10eda3f204 Deprecate Stream ncolumns 2016-11-07 21:30:48 +05:30
Vinayak Mehta 72c2a0020f Minor fix 2016-10-20 18:54:06 +05:30
Vinayak Mehta 5c6a74fb2a Add new params 2016-10-18 18:23:35 +05:30
Vinayak Mehta b01edee337 Handle rotation at entry 2016-10-18 15:33:38 +05:30
Vinayak Mehta 2a203a1865 Log warning when len(header) != len(cols) 2016-10-17 18:16:39 +05:30
Vinayak Mehta adb948d363 Fix column parameter 2016-10-13 16:54:45 +05:30
Vinayak Mehta 40d30c1ab9 Add superscript and subscript flagging
* Add superscript flagging

* Add flagging param

* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta e8b93a9624 Add headers param 2016-10-12 13:59:10 +05:30
Vinayak Mehta a43d5ca2c7 Replace chars with textlines
* Add split function

* Add split_text and shift_text params

* Change get_rotation

* Move get_column_index to utils

* Add split_text and shift_text

* Fix split_text
2016-10-12 13:17:02 +05:30