Commit Graph

151 Commits (2b3461deab161e6be7af6de75a1e7ad50cae2fa1)

Author SHA1 Message Date
Vinayak Mehta c5bde5e2ad
[MRG] Add error/warning tests (#113)
* Add unknown flavor test

* Add input kwargs test

* Remove unused utils

* Add unsupported format test

* Add stream unequal tables-columns length test

* Add python3 compat

* Add no tables found test

* Convert util info log to warning
2018-10-02 19:28:42 +05:30
Vinayak Mehta fc0542bd3c
Add Python 3 compatibility (#109)
* Add python3 compat

* Update .gitignore

* Update .gitignore again

* Remove debugging return

* Add unicode_literals import

* Bump version

* Add python3-tk note
2018-09-28 21:58:29 +05:30
Vinayak Mehta dfb0d4fb4c Fix TableList repr 2018-09-27 04:42:23 +05:30
Vinayak Mehta 759e635a3c Bump version 2018-09-25 12:32:01 +05:30
Vinayak Mehta 7731497a5b Fix relative links
Fix broken links
2018-09-24 22:15:43 +05:30
Vinayak Mehta be2733ebd2 Add utf8 header 2018-09-24 16:27:26 +05:30
Vinayak Mehta 93b4dabcc2 Update CLI 2018-09-24 01:00:30 +05:30
Vinayak Mehta a70befe528 Update docs 2018-09-23 14:04:21 +05:30
Vinayak Mehta 959a252aa3 Fix CLI 2018-09-23 12:45:01 +05:30
Vinayak Mehta 7aaa7b2460 Deprecate debug and add plot docstrings 2018-09-23 11:56:40 +05:30
Vinayak Mehta 71d91fbebd Fix plot_text 2018-09-23 11:45:20 +05:30
Vinayak Mehta 3170a9689f Add flavors 2018-09-23 10:53:32 +05:30
Vinayak Mehta 021aca8f97
Update __version__.py 2018-09-15 03:34:04 +05:30
Vinayak Mehta a4fcdc7781 Add advanced guide illustrations 2018-09-13 21:12:25 +05:30
Vinayak Mehta 3a980a46c1 Add quickstart 2018-09-13 15:50:30 +05:30
Vinayak Mehta 0ba3469d21 Add Stream benchmarks 2018-09-12 07:21:35 +05:30
Vinayak Mehta b276909a4f Add Lattice benchmarks 2018-09-12 05:58:22 +05:30
Vinayak Mehta 094be1a1dd Add better table detection image 2018-09-12 02:29:25 +05:30
Vinayak Mehta dc533e73e2 Add agstat to benchmark 2018-09-12 02:05:34 +05:30
Vinayak Mehta 17ea5f335e Fix docstrings and interlinks 2018-09-11 08:31:37 +05:30
Vinayak Mehta 656808b8e2 Fix setup.py 2018-09-11 08:31:37 +05:30
Vinayak Mehta 118aac47bc
Merge pull request #99 from socialcopsdev/cli
Add CLI
2018-09-10 16:06:14 +05:30
Vinayak Mehta 544e0c9c3f Update CLI help and README 2018-09-10 16:05:51 +05:30
Vinayak Mehta 7bb1aee9b6 Add CLI 2018-09-10 15:16:41 +05:30
Vinayak Mehta 1b013178a8 Add docstrings to table to_format methods 2018-09-09 18:41:40 +05:30
Vinayak Mehta d3beaafc99 Add temporary directory context manager 2018-09-09 18:10:55 +05:30
Vinayak Mehta 9a6ed555c8 Fix get_rotation 2018-09-09 10:04:54 +05:30
Vinayak Mehta 9878de4dfc Add docstrings and update docs 2018-09-09 10:00:22 +05:30
Vinayak Mehta c91a9bb36d Add future import 2018-09-09 05:36:07 +05:30
Vinayak Mehta 7c3e531b07 Port tests 2018-09-09 05:29:24 +05:30
Vinayak Mehta 04383920b4 Rename parser keyword arguments 2018-09-08 05:38:43 +05:30
Vinayak Mehta e615580e55 Fix plot_geometry 2018-09-07 06:25:13 +05:30
Vinayak Mehta b3f840bba9 Change utils function names 2018-09-07 06:04:45 +05:30
Vinayak Mehta 20acda2259 Fix current logging 2018-09-07 05:53:19 +05:30
Vinayak Mehta 09ac8f4640 Add property n to TableList 2018-09-07 05:17:09 +05:30
Vinayak Mehta 0c329634e7 Add export to TableList and Table 2018-09-07 05:13:34 +05:30
Vinayak Mehta 557189da24 Refactor core 2018-09-06 07:42:41 +05:30
Vinayak Mehta ffeb853c55 Rename plot.py to plotting.py 2018-09-06 06:21:54 +05:30
Vinayak Mehta 42d7a4ac02 Add import os 2018-09-06 06:15:13 +05:30
Vinayak Mehta b91df8a1b8 Create parsers module 2018-09-06 06:13:58 +05:30
Vinayak Mehta d0005101a7 Add BaseParser docstring stub 2018-09-06 05:55:05 +05:30
Vinayak Mehta 96af09d9cd Add BaseParser and refactor extract_tables 2018-09-06 05:28:34 +05:30
Vinayak Mehta a4d3165e94 Add docstring stubs 2018-09-05 19:35:46 +05:30
Vinayak Mehta bf63432494 Remove docstrings 2018-09-05 19:04:40 +05:30
Vinayak Mehta 08cbababca Add properties to GeometryList 2018-09-05 19:00:30 +05:30
Vinayak Mehta 73e52939f5 Add parsing_report property 2018-09-05 18:50:10 +05:30
Vinayak Mehta 9124e3374c Add properties to Table 2018-09-05 18:20:46 +05:30
Vinayak Mehta b9d77cb983 Decouple debug geometry from tables 2018-09-05 15:18:31 +05:30
Vinayak Mehta 941994f0bf Make present code work with new API 2018-09-04 23:34:49 +05:30
Vinayak Mehta e3aabb720f Add stream and lattice to parsers 2018-09-04 21:28:37 +05:30
Vinayak Mehta 5d29f0c21c Move Pdf class to core as FileHandler 2018-09-04 07:02:30 +05:30
Vinayak Mehta c689735da2 Move cell and table to core 2018-09-04 03:49:43 +05:30
Vinayak Mehta 72c42c74db Remove ocr 2018-09-01 16:23:54 +05:30
Vinayak Mehta 861ed0b64e Fix lattice fill 2017-05-05 15:02:29 +05:30
Vinayak Mehta e252e476b9 Add better y-cuts detection 2017-04-25 18:44:53 +05:30
Vinayak Mehta 76e1d32417 Add minor fix
Minor fix
2017-04-24 16:53:54 +05:30
Vinayak Mehta bef33c75b1 Fix ValueError 2017-04-21 20:15:35 +05:30
Vinayak Mehta fdb4b0d494 Update version 2017-04-21 15:41:32 +05:30
Vinayak Mehta 5c5bd6199c Fix warnings and exceptions 2017-04-21 14:20:33 +05:30
Vinayak Mehta 18e1a799a1 Remove remove_empty 2017-04-21 13:22:37 +05:30
Vinayak Mehta d28e4b8c1e Change default value for iterations 2017-04-21 13:20:48 +05:30
Vinayak Mehta 4da754ddcb [ENH] Add OCR and better joint detection
* Add iterations for dilation

* Add OCRLattice and OCRStream

* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta 7246e1a73d Parallelize pdf split 2017-04-11 18:30:05 +05:30
Vinayak Mehta 4a87a77003 Remove ncols 2017-04-11 15:50:12 +05:30
Vinayak Mehta 72233f25ce Parameterize thresholding blocksize and constant 2017-04-10 21:15:54 +05:30
Vinayak Mehta 84d354ba10 Add deepcopy and debug scripts 2017-04-10 18:59:48 +05:30
Vinayak Mehta 3eb18ef199 More logs 2017-02-07 22:23:05 +05:30
Vinayak Mehta bc86346154 Don't let processes modify instance attributes 2017-02-07 22:13:33 +05:30
Vinayak Mehta 970256e19d Add OCR support for image based pdfs with lines
* Cosmits

* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value

* Add image attribute in Table and Cell

* Add OCR

* Fix coordinates

* Add table_area

* Add ocr options to cli

* Direct ghostscript call output to /dev/null

* Add ocr dostring

* Add requirements

* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta 70f626373b Cosmits
* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta bd1d57a561 Update version 2017-01-07 15:50:20 +05:30
Vinayak Mehta 10eda3f204 Deprecate Stream ncolumns 2016-11-07 21:30:48 +05:30
Vinayak Mehta 72c2a0020f Minor fix 2016-10-20 18:54:06 +05:30
Vinayak Mehta 5c6a74fb2a Add new params 2016-10-18 18:23:35 +05:30
Vinayak Mehta b01edee337 Handle rotation at entry 2016-10-18 15:33:38 +05:30
Vinayak Mehta 2a203a1865 Log warning when len(header) != len(cols) 2016-10-17 18:16:39 +05:30
Vinayak Mehta adb948d363 Fix column parameter 2016-10-13 16:54:45 +05:30
Vinayak Mehta 40d30c1ab9 Add superscript and subscript flagging
* Add superscript flagging

* Add flagging param

* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta e8b93a9624 Add headers param 2016-10-12 13:59:10 +05:30
Vinayak Mehta a43d5ca2c7 Replace chars with textlines
* Add split function

* Add split_text and shift_text params

* Change get_rotation

* Move get_column_index to utils

* Add split_text and shift_text

* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta 52a2876ab1 Fix tarea type conversion 2016-10-04 19:57:53 +05:30
Vinayak Mehta 4b8e96a86a Update docs
* Update README

* Update index.rst

* Update docstrings

* Fix typo

* Edit docs

* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta d46eeeab1a Change jpg to png 2016-09-27 18:37:38 +05:30
Vinayak Mehta 75c7deffaa Minor Stream fix 2016-09-27 17:27:34 +05:30
Vinayak Mehta 79afb45e2e Support for vertical tables in Stream
* Change var names

* Add test pdf

* Add tests for Lattice rotation

* Add support for vertical tables in Stream, test pdfs

* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta 8ce7b74671 Replace imagemagick with ghostscript
* Replace imagemagick with ghostscript

* Add quiet option

* Avoid repetition

* Remove Wand requirement

* Replace jpeg with png
2016-09-13 17:35:07 +05:30
Vinayak Mehta 757ba0444a Remove jtol 2016-09-13 17:28:21 +05:30
Vinayak Mehta 439059817d Update tests with new API
* Update Lattice tests with new API

* Update Stream tests with new API, fix CLI

* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30
Vinayak Mehta a94c350a7b Fix param flow
* Fix param flow

* Add check for None
2016-09-09 14:52:38 +05:30
Vinayak Mehta 766260d5d9 Remove hybrid.py 2016-09-08 21:17:24 +05:30
Vinayak Mehta 98f47d1bd7 Fix table_bbox when no tarea is given 2016-09-05 21:26:16 +05:30
Vinayak Mehta d86630e70b Add table_area
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta b2dd5f68fe Fix vertical text detection in cells
* Fix vertical text detection in cells

* Add Cell instance method

* Change var names
2016-09-01 01:42:27 +05:30
Vinayak Mehta 8d56f15130 Add negative tolerance 2016-08-31 22:25:33 +05:30
Vinayak Mehta 2a55621d05 Fix magic grid extension 2016-08-31 21:06:41 +05:30
Vinayak Mehta 552f9cf422 Add various metrics to score the quality of a parse
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta 7e5804f87d Adds documentation
[MRG] Adds documentation
2016-08-09 17:23:50 +05:30
Vinayak Mehta 13568865b5 Add verbose 2016-08-03 13:14:19 +05:30
Vinayak Mehta 57917426e8 Fix docstrings 2016-08-03 13:14:11 +05:30
Vinayak Mehta 050107b63d Minor fix 2016-07-29 21:47:20 +05:30