Commit Graph

215 Commits (4eba7b6486a4cfdf22f8b66837943a6c9ebf6980)

Author SHA1 Message Date
gison93 4366313484 Clarify example for argument pages in read_pdf (#177) 2018-10-28 14:41:04 +05:30
Vinicius Mesel 39cf65ffef [MRG + 1] Convert filename to lowercase to check for extension (#169)
* Creates a new variable that stores a lowercase version of the filename

* Remove variable
2018-10-24 23:53:54 +05:30
Parth P Panchal 32df09ad1c Renames the keyword `table_area` to `table_areas` (#171)
`table_areas` sounds more apt since it is a list and there can be
multiple table areas on a page.

Closes #165
2018-10-24 23:06:53 +05:30
Vinayak Mehta a78ef7f841
[MRG] Use find_executable for gs and raise error if not found (#166)
* Use find_executable for gs and raise error if not found

* Remove unused variable

* Add test

* Use pytest monkeypatch
2018-10-23 21:12:43 +05:30
Parth P Panchal 61963aabb6 [MRG + 1] Add __main__ (#159)
* Renames camelot.cli to camelot.__main__

Closes #154

* Keep __main__ and cli separate

* Monkey patch click HelpFormatter
2018-10-23 15:01:20 +05:30
Jonathan Lloyd 3def4a5aea [MRG + 1] Add suppress_warnings flag (#155)
* Add suppress_warnings flag

* Add --quiet flag to cli (to suppress warnings)

* Remove TODO and update comment
2018-10-19 16:55:00 +05:30
Vinayak Mehta 45e7f7570e Bump version 2018-10-08 03:54:21 +05:30
Vinayak Mehta fe68328ef2
Move opencv-python to extra_requires (#134) 2018-10-08 01:10:48 +05:30
Vinayak Mehta 2527512f63 Replace gs subprocess call (Wand experiment)
Replace gs subprocess call

Update requirements.txt
2018-10-07 13:39:44 +05:30
Vinayak Mehta 9b2fc53e58 Bump version 2018-10-05 20:22:46 +05:30
Vaibhav Mule c53ea795fd [MRG + 1] Add tests for repr (#128)
* add tests for repr

* remove repr for Cell

* add round for repr of Cell

* change decimal places to 2

* change tests for 2 decimal places
2018-10-05 20:19:24 +05:30
Oshawk 90aaba6eec [MRG + 1] Make pep8 (#125)
* Make setup.py pep8

Add new line at end of file, fix bare except, remove unused import.

* Make tests/*.py pep8

Add some newlines at and of files and a visual indent.

* Make docs/*.py pep8

Fix block comments and add new lines at end of files.

* Make camelot/*.py pep8

Fixed unused import, a few weirdly ordered imports, a docstring typo and  many new lines at the end of lines.

* Fix imports

Fix import order and remove a couple more unused imports.

* Fix indents

Fix indentation (no opening delimiter alignment).

* Add newlines
2018-10-05 16:55:43 +05:30
Vinayak Mehta 6e8079df84
[MRG] Add tests for output formats and parser kwargs (#126)
* Remove unused image processing code

* Add opencv back-compat comment

* Add tests for parser special cases

* Fix lattice table area test

* Add tests for output format

* Add openpyxl dep
2018-10-05 16:15:30 +05:30
Vinayak Mehta cf7823f33c
[MRG] Add ghostscript fix for windows (#124)
* Add ghostscript fix for windows

* Add python2 fix

* Update install.rst
2018-10-05 02:06:37 +05:30
Vinayak Mehta c5bde5e2ad
[MRG] Add error/warning tests (#113)
* Add unknown flavor test

* Add input kwargs test

* Remove unused utils

* Add unsupported format test

* Add stream unequal tables-columns length test

* Add python3 compat

* Add no tables found test

* Convert util info log to warning
2018-10-02 19:28:42 +05:30
Vinayak Mehta fc0542bd3c
Add Python 3 compatibility (#109)
* Add python3 compat

* Update .gitignore

* Update .gitignore again

* Remove debugging return

* Add unicode_literals import

* Bump version

* Add python3-tk note
2018-09-28 21:58:29 +05:30
Vinayak Mehta dfb0d4fb4c Fix TableList repr 2018-09-27 04:42:23 +05:30
Vinayak Mehta 759e635a3c Bump version 2018-09-25 12:32:01 +05:30
Vinayak Mehta 7731497a5b Fix relative links
Fix broken links
2018-09-24 22:15:43 +05:30
Vinayak Mehta be2733ebd2 Add utf8 header 2018-09-24 16:27:26 +05:30
Vinayak Mehta 93b4dabcc2 Update CLI 2018-09-24 01:00:30 +05:30
Vinayak Mehta a70befe528 Update docs 2018-09-23 14:04:21 +05:30
Vinayak Mehta 959a252aa3 Fix CLI 2018-09-23 12:45:01 +05:30
Vinayak Mehta 7aaa7b2460 Deprecate debug and add plot docstrings 2018-09-23 11:56:40 +05:30
Vinayak Mehta 71d91fbebd Fix plot_text 2018-09-23 11:45:20 +05:30
Vinayak Mehta 3170a9689f Add flavors 2018-09-23 10:53:32 +05:30
Vinayak Mehta 021aca8f97
Update __version__.py 2018-09-15 03:34:04 +05:30
Vinayak Mehta a4fcdc7781 Add advanced guide illustrations 2018-09-13 21:12:25 +05:30
Vinayak Mehta 3a980a46c1 Add quickstart 2018-09-13 15:50:30 +05:30
Vinayak Mehta 0ba3469d21 Add Stream benchmarks 2018-09-12 07:21:35 +05:30
Vinayak Mehta b276909a4f Add Lattice benchmarks 2018-09-12 05:58:22 +05:30
Vinayak Mehta 094be1a1dd Add better table detection image 2018-09-12 02:29:25 +05:30
Vinayak Mehta dc533e73e2 Add agstat to benchmark 2018-09-12 02:05:34 +05:30
Vinayak Mehta 17ea5f335e Fix docstrings and interlinks 2018-09-11 08:31:37 +05:30
Vinayak Mehta 656808b8e2 Fix setup.py 2018-09-11 08:31:37 +05:30
Vinayak Mehta 118aac47bc
Merge pull request #99 from socialcopsdev/cli
Add CLI
2018-09-10 16:06:14 +05:30
Vinayak Mehta 544e0c9c3f Update CLI help and README 2018-09-10 16:05:51 +05:30
Vinayak Mehta 7bb1aee9b6 Add CLI 2018-09-10 15:16:41 +05:30
Vinayak Mehta 1b013178a8 Add docstrings to table to_format methods 2018-09-09 18:41:40 +05:30
Vinayak Mehta d3beaafc99 Add temporary directory context manager 2018-09-09 18:10:55 +05:30
Vinayak Mehta 9a6ed555c8 Fix get_rotation 2018-09-09 10:04:54 +05:30
Vinayak Mehta 9878de4dfc Add docstrings and update docs 2018-09-09 10:00:22 +05:30
Vinayak Mehta c91a9bb36d Add future import 2018-09-09 05:36:07 +05:30
Vinayak Mehta 7c3e531b07 Port tests 2018-09-09 05:29:24 +05:30
Vinayak Mehta 04383920b4 Rename parser keyword arguments 2018-09-08 05:38:43 +05:30
Vinayak Mehta e615580e55 Fix plot_geometry 2018-09-07 06:25:13 +05:30
Vinayak Mehta b3f840bba9 Change utils function names 2018-09-07 06:04:45 +05:30
Vinayak Mehta 20acda2259 Fix current logging 2018-09-07 05:53:19 +05:30
Vinayak Mehta 09ac8f4640 Add property n to TableList 2018-09-07 05:17:09 +05:30
Vinayak Mehta 0c329634e7 Add export to TableList and Table 2018-09-07 05:13:34 +05:30
Vinayak Mehta 557189da24 Refactor core 2018-09-06 07:42:41 +05:30
Vinayak Mehta ffeb853c55 Rename plot.py to plotting.py 2018-09-06 06:21:54 +05:30
Vinayak Mehta 42d7a4ac02 Add import os 2018-09-06 06:15:13 +05:30
Vinayak Mehta b91df8a1b8 Create parsers module 2018-09-06 06:13:58 +05:30
Vinayak Mehta d0005101a7 Add BaseParser docstring stub 2018-09-06 05:55:05 +05:30
Vinayak Mehta 96af09d9cd Add BaseParser and refactor extract_tables 2018-09-06 05:28:34 +05:30
Vinayak Mehta a4d3165e94 Add docstring stubs 2018-09-05 19:35:46 +05:30
Vinayak Mehta bf63432494 Remove docstrings 2018-09-05 19:04:40 +05:30
Vinayak Mehta 08cbababca Add properties to GeometryList 2018-09-05 19:00:30 +05:30
Vinayak Mehta 73e52939f5 Add parsing_report property 2018-09-05 18:50:10 +05:30
Vinayak Mehta 9124e3374c Add properties to Table 2018-09-05 18:20:46 +05:30
Vinayak Mehta b9d77cb983 Decouple debug geometry from tables 2018-09-05 15:18:31 +05:30
Vinayak Mehta 941994f0bf Make present code work with new API 2018-09-04 23:34:49 +05:30
Vinayak Mehta e3aabb720f Add stream and lattice to parsers 2018-09-04 21:28:37 +05:30
Vinayak Mehta 5d29f0c21c Move Pdf class to core as FileHandler 2018-09-04 07:02:30 +05:30
Vinayak Mehta c689735da2 Move cell and table to core 2018-09-04 03:49:43 +05:30
Vinayak Mehta 72c42c74db Remove ocr 2018-09-01 16:23:54 +05:30
Vinayak Mehta 861ed0b64e Fix lattice fill 2017-05-05 15:02:29 +05:30
Vinayak Mehta e252e476b9 Add better y-cuts detection 2017-04-25 18:44:53 +05:30
Vinayak Mehta 76e1d32417 Add minor fix
Minor fix
2017-04-24 16:53:54 +05:30
Vinayak Mehta bef33c75b1 Fix ValueError 2017-04-21 20:15:35 +05:30
Vinayak Mehta fdb4b0d494 Update version 2017-04-21 15:41:32 +05:30
Vinayak Mehta 5c5bd6199c Fix warnings and exceptions 2017-04-21 14:20:33 +05:30
Vinayak Mehta 18e1a799a1 Remove remove_empty 2017-04-21 13:22:37 +05:30
Vinayak Mehta d28e4b8c1e Change default value for iterations 2017-04-21 13:20:48 +05:30
Vinayak Mehta 4da754ddcb [ENH] Add OCR and better joint detection
* Add iterations for dilation

* Add OCRLattice and OCRStream

* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta 7246e1a73d Parallelize pdf split 2017-04-11 18:30:05 +05:30
Vinayak Mehta 4a87a77003 Remove ncols 2017-04-11 15:50:12 +05:30
Vinayak Mehta 72233f25ce Parameterize thresholding blocksize and constant 2017-04-10 21:15:54 +05:30
Vinayak Mehta 84d354ba10 Add deepcopy and debug scripts 2017-04-10 18:59:48 +05:30
Vinayak Mehta 3eb18ef199 More logs 2017-02-07 22:23:05 +05:30
Vinayak Mehta bc86346154 Don't let processes modify instance attributes 2017-02-07 22:13:33 +05:30
Vinayak Mehta 970256e19d Add OCR support for image based pdfs with lines
* Cosmits

* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value

* Add image attribute in Table and Cell

* Add OCR

* Fix coordinates

* Add table_area

* Add ocr options to cli

* Direct ghostscript call output to /dev/null

* Add ocr dostring

* Add requirements

* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta 70f626373b Cosmits
* Remove unnecessary kwargs

* Direct ghostscript call output to /dev/null

* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta bd1d57a561 Update version 2017-01-07 15:50:20 +05:30
Vinayak Mehta 10eda3f204 Deprecate Stream ncolumns 2016-11-07 21:30:48 +05:30
Vinayak Mehta 72c2a0020f Minor fix 2016-10-20 18:54:06 +05:30
Vinayak Mehta 5c6a74fb2a Add new params 2016-10-18 18:23:35 +05:30
Vinayak Mehta b01edee337 Handle rotation at entry 2016-10-18 15:33:38 +05:30
Vinayak Mehta 2a203a1865 Log warning when len(header) != len(cols) 2016-10-17 18:16:39 +05:30
Vinayak Mehta adb948d363 Fix column parameter 2016-10-13 16:54:45 +05:30
Vinayak Mehta 40d30c1ab9 Add superscript and subscript flagging
* Add superscript flagging

* Add flagging param

* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta e8b93a9624 Add headers param 2016-10-12 13:59:10 +05:30
Vinayak Mehta a43d5ca2c7 Replace chars with textlines
* Add split function

* Add split_text and shift_text params

* Change get_rotation

* Move get_column_index to utils

* Add split_text and shift_text

* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta 52a2876ab1 Fix tarea type conversion 2016-10-04 19:57:53 +05:30
Vinayak Mehta 4b8e96a86a Update docs
* Update README

* Update index.rst

* Update docstrings

* Fix typo

* Edit docs

* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta d46eeeab1a Change jpg to png 2016-09-27 18:37:38 +05:30
Vinayak Mehta 75c7deffaa Minor Stream fix 2016-09-27 17:27:34 +05:30
Vinayak Mehta 79afb45e2e Support for vertical tables in Stream
* Change var names

* Add test pdf

* Add tests for Lattice rotation

* Add support for vertical tables in Stream, test pdfs

* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta 8ce7b74671 Replace imagemagick with ghostscript
* Replace imagemagick with ghostscript

* Add quiet option

* Avoid repetition

* Remove Wand requirement

* Replace jpeg with png
2016-09-13 17:35:07 +05:30
Vinayak Mehta 757ba0444a Remove jtol 2016-09-13 17:28:21 +05:30
Vinayak Mehta 439059817d Update tests with new API
* Update Lattice tests with new API

* Update Stream tests with new API, fix CLI

* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30
Vinayak Mehta a94c350a7b Fix param flow
* Fix param flow

* Add check for None
2016-09-09 14:52:38 +05:30
Vinayak Mehta 766260d5d9 Remove hybrid.py 2016-09-08 21:17:24 +05:30
Vinayak Mehta 98f47d1bd7 Fix table_bbox when no tarea is given 2016-09-05 21:26:16 +05:30
Vinayak Mehta d86630e70b Add table_area
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta b2dd5f68fe Fix vertical text detection in cells
* Fix vertical text detection in cells

* Add Cell instance method

* Change var names
2016-09-01 01:42:27 +05:30
Vinayak Mehta 8d56f15130 Add negative tolerance 2016-08-31 22:25:33 +05:30
Vinayak Mehta 2a55621d05 Fix magic grid extension 2016-08-31 21:06:41 +05:30
Vinayak Mehta 552f9cf422 Add various metrics to score the quality of a parse
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta 7e5804f87d Adds documentation
[MRG] Adds documentation
2016-08-09 17:23:50 +05:30
Vinayak Mehta 13568865b5 Add verbose 2016-08-03 13:14:19 +05:30
Vinayak Mehta 57917426e8 Fix docstrings 2016-08-03 13:14:11 +05:30
Vinayak Mehta 050107b63d Minor fix 2016-07-29 21:47:20 +05:30
Vinayak Mehta e9602bb353 Create python package
Add version support

Add new test file

[RFC] First phase

[RFC] Second phase

[RFC] Third phase

Add logging

Update README

Add debug

Add debug, fixes

Add pep8 changes

Add fix

Rename CLI tool

Add csv fix

Update README

Add fix for numpages

Update README

Update requirements.txt

Use yield

Add tuple unpacking fix

Fix n00b mistake

Add check for None

Fix check for None

Fix unicode

Add relative imports
2016-07-29 21:09:39 +05:30