Commit Graph

103 Commits (09ac8f46403a51658cfddf63d62f6c5eef9fc621)

Author SHA1 Message Date
Vinayak Mehta 09ac8f4640 Add property n to TableList 2018-09-07 05:17:09 +05:30
Vinayak Mehta c482869e73 Merge branch 'refactor' of github.com:socialcopsdev/camelot into refactor 2018-09-07 05:14:28 +05:30
Vinayak Mehta 0c329634e7 Add export to TableList and Table 2018-09-07 05:13:34 +05:30
Vinayak Mehta ed1641e26f
Update README.md 2018-09-07 02:34:59 +05:30
Vinayak Mehta 557189da24 Refactor core 2018-09-06 07:42:41 +05:30
Vinayak Mehta ffeb853c55 Rename plot.py to plotting.py 2018-09-06 06:21:54 +05:30
Vinayak Mehta 42d7a4ac02 Add import os 2018-09-06 06:15:13 +05:30
Vinayak Mehta b91df8a1b8 Create parsers module 2018-09-06 06:13:58 +05:30
Vinayak Mehta d0005101a7 Add BaseParser docstring stub 2018-09-06 05:55:05 +05:30
Vinayak Mehta 96af09d9cd Add BaseParser and refactor extract_tables 2018-09-06 05:28:34 +05:30
Vinayak Mehta a4d3165e94 Add docstring stubs 2018-09-05 19:35:46 +05:30
Vinayak Mehta bf63432494 Remove docstrings 2018-09-05 19:04:40 +05:30
Vinayak Mehta 08cbababca Add properties to GeometryList 2018-09-05 19:00:30 +05:30
Vinayak Mehta 73e52939f5 Add parsing_report property 2018-09-05 18:50:10 +05:30
Vinayak Mehta 9124e3374c Add properties to Table 2018-09-05 18:20:46 +05:30
Vinayak Mehta b9d77cb983 Decouple debug geometry from tables 2018-09-05 15:18:31 +05:30
Vinayak Mehta 941994f0bf Make present code work with new API 2018-09-04 23:34:49 +05:30
Vinayak Mehta e3aabb720f Add stream and lattice to parsers 2018-09-04 21:28:37 +05:30
Vinayak Mehta 5d29f0c21c Move Pdf class to core as FileHandler 2018-09-04 07:02:30 +05:30
Vinayak Mehta 0c9e21d881 Update README 2018-09-04 03:53:30 +05:30
Vinayak Mehta c689735da2 Move cell and table to core 2018-09-04 03:49:43 +05:30
Vinayak Mehta ae64264d3e Update README and requirements 2018-09-02 19:04:24 +05:30
Vinayak Mehta d65ee180e5 Update README 2018-09-01 16:26:15 +05:30
Vinayak Mehta 72c42c74db Remove ocr 2018-09-01 16:23:54 +05:30
Vinayak Mehta 9753889ea2 Add option to specify end in page range 2017-08-16 14:53:15 +05:30
Vinayak Mehta 861ed0b64e Fix lattice fill 2017-05-05 15:02:29 +05:30
Vinayak Mehta e252e476b9 Add better y-cuts detection 2017-04-25 18:44:53 +05:30
Vinayak Mehta 76e1d32417 Add minor fix
Minor fix
2017-04-24 16:53:54 +05:30
Vinayak Mehta bef33c75b1 Fix ValueError 2017-04-21 20:15:35 +05:30
Vinayak Mehta fdb4b0d494 Update version 2017-04-21 15:41:32 +05:30
Vinayak Mehta 5c5bd6199c Fix warnings and exceptions 2017-04-21 14:20:33 +05:30
Vinayak Mehta 18e1a799a1 Remove remove_empty 2017-04-21 13:22:37 +05:30
Vinayak Mehta d28e4b8c1e Change default value for iterations 2017-04-21 13:20:48 +05:30
Vinayak Mehta 4b3e7fb6f6 Add debug script 2017-04-18 18:32:18 +05:30
Vinayak Mehta ae83972f80 Update README 2017-04-18 18:27:38 +05:30
Vinayak Mehta 4da754ddcb [ENH] Add OCR and better joint detection
* Add iterations for dilation

* Add OCRLattice and OCRStream

* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta dd909e2b53 Fix debug script 2017-04-11 20:26:01 +05:30
Vinayak Mehta 7246e1a73d Parallelize pdf split 2017-04-11 18:30:05 +05:30
Vinayak Mehta 4a87a77003 Remove ncols 2017-04-11 15:50:12 +05:30
Vinayak Mehta 8e8f5bbb3b Add zip of csvs option 2017-04-11 14:14:54 +05:30
Vinayak Mehta 72233f25ce Parameterize thresholding blocksize and constant 2017-04-10 21:15:54 +05:30
Vinayak Mehta 8b07aa2702 Minor fixes 2017-04-10 19:08:39 +05:30
Vinayak Mehta 778366b2dd Remove directory 2017-04-10 19:03:43 +05:30
Vinayak Mehta 84d354ba10 Add deepcopy and debug scripts 2017-04-10 18:59:48 +05:30
Vinayak Mehta 4dd0d2330e Fix shift text 2017-03-21 16:04:55 +05:30
Vinayak Mehta 3651fb2347 Remove ncolumns everywhere 2017-03-01 19:53:48 +05:30
Vinayak Mehta edcf770d93 Remove verbose option 2017-02-07 23:44:01 +05:30
Vinayak Mehta 3eb18ef199 More logs 2017-02-07 22:23:05 +05:30
Vinayak Mehta bc86346154 Don't let processes modify instance attributes 2017-02-07 22:13:33 +05:30
Vinayak Mehta 970256e19d Add OCR support for image based pdfs with lines
* Cosmits

* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value

* Add image attribute in Table and Cell

* Add OCR

* Fix coordinates

* Add table_area

* Add ocr options to cli

* Direct ghostscript call output to /dev/null

* Add ocr dostring

* Add requirements

* Update README
2017-01-07 16:37:56 +05:30