gison93
4366313484
Clarify example for argument pages in read_pdf ( #177 )
2018-10-28 14:41:04 +05:30
Vinicius Mesel
39cf65ffef
[MRG + 1] Convert filename to lowercase to check for extension ( #169 )
...
* Creates a new variable that stores a lowercase version of the filename
* Remove variable
2018-10-24 23:53:54 +05:30
Parth P Panchal
32df09ad1c
Renames the keyword `table_area` to `table_areas` ( #171 )
...
`table_areas` sounds more apt since it is a list and there can be
multiple table areas on a page.
Closes #165
2018-10-24 23:06:53 +05:30
Vinayak Mehta
a78ef7f841
[MRG] Use find_executable for gs and raise error if not found ( #166 )
...
* Use find_executable for gs and raise error if not found
* Remove unused variable
* Add test
* Use pytest monkeypatch
2018-10-23 21:12:43 +05:30
Parth P Panchal
61963aabb6
[MRG + 1] Add __main__ ( #159 )
...
* Renames camelot.cli to camelot.__main__
Closes #154
* Keep __main__ and cli separate
* Monkey patch click HelpFormatter
2018-10-23 15:01:20 +05:30
Jonathan Lloyd
3def4a5aea
[MRG + 1] Add suppress_warnings flag ( #155 )
...
* Add suppress_warnings flag
* Add --quiet flag to cli (to suppress warnings)
* Remove TODO and update comment
2018-10-19 16:55:00 +05:30
Vinayak Mehta
45e7f7570e
Bump version
2018-10-08 03:54:21 +05:30
Vinayak Mehta
fe68328ef2
Move opencv-python to extra_requires ( #134 )
2018-10-08 01:10:48 +05:30
Vinayak Mehta
2527512f63
Replace gs subprocess call (Wand experiment)
...
Replace gs subprocess call
Update requirements.txt
2018-10-07 13:39:44 +05:30
Vinayak Mehta
9b2fc53e58
Bump version
2018-10-05 20:22:46 +05:30
Vaibhav Mule
c53ea795fd
[MRG + 1] Add tests for repr ( #128 )
...
* add tests for repr
* remove repr for Cell
* add round for repr of Cell
* change decimal places to 2
* change tests for 2 decimal places
2018-10-05 20:19:24 +05:30
Oshawk
90aaba6eec
[MRG + 1] Make pep8 ( #125 )
...
* Make setup.py pep8
Add new line at end of file, fix bare except, remove unused import.
* Make tests/*.py pep8
Add some newlines at and of files and a visual indent.
* Make docs/*.py pep8
Fix block comments and add new lines at end of files.
* Make camelot/*.py pep8
Fixed unused import, a few weirdly ordered imports, a docstring typo and many new lines at the end of lines.
* Fix imports
Fix import order and remove a couple more unused imports.
* Fix indents
Fix indentation (no opening delimiter alignment).
* Add newlines
2018-10-05 16:55:43 +05:30
Vinayak Mehta
6e8079df84
[MRG] Add tests for output formats and parser kwargs ( #126 )
...
* Remove unused image processing code
* Add opencv back-compat comment
* Add tests for parser special cases
* Fix lattice table area test
* Add tests for output format
* Add openpyxl dep
2018-10-05 16:15:30 +05:30
Vinayak Mehta
cf7823f33c
[MRG] Add ghostscript fix for windows ( #124 )
...
* Add ghostscript fix for windows
* Add python2 fix
* Update install.rst
2018-10-05 02:06:37 +05:30
Vinayak Mehta
c5bde5e2ad
[MRG] Add error/warning tests ( #113 )
...
* Add unknown flavor test
* Add input kwargs test
* Remove unused utils
* Add unsupported format test
* Add stream unequal tables-columns length test
* Add python3 compat
* Add no tables found test
* Convert util info log to warning
2018-10-02 19:28:42 +05:30
Vinayak Mehta
fc0542bd3c
Add Python 3 compatibility ( #109 )
...
* Add python3 compat
* Update .gitignore
* Update .gitignore again
* Remove debugging return
* Add unicode_literals import
* Bump version
* Add python3-tk note
2018-09-28 21:58:29 +05:30
Vinayak Mehta
dfb0d4fb4c
Fix TableList repr
2018-09-27 04:42:23 +05:30
Vinayak Mehta
759e635a3c
Bump version
2018-09-25 12:32:01 +05:30
Vinayak Mehta
7731497a5b
Fix relative links
...
Fix broken links
2018-09-24 22:15:43 +05:30
Vinayak Mehta
be2733ebd2
Add utf8 header
2018-09-24 16:27:26 +05:30
Vinayak Mehta
93b4dabcc2
Update CLI
2018-09-24 01:00:30 +05:30
Vinayak Mehta
a70befe528
Update docs
2018-09-23 14:04:21 +05:30
Vinayak Mehta
959a252aa3
Fix CLI
2018-09-23 12:45:01 +05:30
Vinayak Mehta
7aaa7b2460
Deprecate debug and add plot docstrings
2018-09-23 11:56:40 +05:30
Vinayak Mehta
71d91fbebd
Fix plot_text
2018-09-23 11:45:20 +05:30
Vinayak Mehta
3170a9689f
Add flavors
2018-09-23 10:53:32 +05:30
Vinayak Mehta
021aca8f97
Update __version__.py
2018-09-15 03:34:04 +05:30
Vinayak Mehta
a4fcdc7781
Add advanced guide illustrations
2018-09-13 21:12:25 +05:30
Vinayak Mehta
3a980a46c1
Add quickstart
2018-09-13 15:50:30 +05:30
Vinayak Mehta
0ba3469d21
Add Stream benchmarks
2018-09-12 07:21:35 +05:30
Vinayak Mehta
b276909a4f
Add Lattice benchmarks
2018-09-12 05:58:22 +05:30
Vinayak Mehta
094be1a1dd
Add better table detection image
2018-09-12 02:29:25 +05:30
Vinayak Mehta
dc533e73e2
Add agstat to benchmark
2018-09-12 02:05:34 +05:30
Vinayak Mehta
17ea5f335e
Fix docstrings and interlinks
2018-09-11 08:31:37 +05:30
Vinayak Mehta
656808b8e2
Fix setup.py
2018-09-11 08:31:37 +05:30
Vinayak Mehta
118aac47bc
Merge pull request #99 from socialcopsdev/cli
...
Add CLI
2018-09-10 16:06:14 +05:30
Vinayak Mehta
544e0c9c3f
Update CLI help and README
2018-09-10 16:05:51 +05:30
Vinayak Mehta
7bb1aee9b6
Add CLI
2018-09-10 15:16:41 +05:30
Vinayak Mehta
1b013178a8
Add docstrings to table to_format methods
2018-09-09 18:41:40 +05:30
Vinayak Mehta
d3beaafc99
Add temporary directory context manager
2018-09-09 18:10:55 +05:30
Vinayak Mehta
9a6ed555c8
Fix get_rotation
2018-09-09 10:04:54 +05:30
Vinayak Mehta
9878de4dfc
Add docstrings and update docs
2018-09-09 10:00:22 +05:30
Vinayak Mehta
c91a9bb36d
Add future import
2018-09-09 05:36:07 +05:30
Vinayak Mehta
7c3e531b07
Port tests
2018-09-09 05:29:24 +05:30
Vinayak Mehta
04383920b4
Rename parser keyword arguments
2018-09-08 05:38:43 +05:30
Vinayak Mehta
e615580e55
Fix plot_geometry
2018-09-07 06:25:13 +05:30
Vinayak Mehta
b3f840bba9
Change utils function names
2018-09-07 06:04:45 +05:30
Vinayak Mehta
20acda2259
Fix current logging
2018-09-07 05:53:19 +05:30
Vinayak Mehta
09ac8f4640
Add property n to TableList
2018-09-07 05:17:09 +05:30
Vinayak Mehta
0c329634e7
Add export to TableList and Table
2018-09-07 05:13:34 +05:30
Vinayak Mehta
557189da24
Refactor core
2018-09-06 07:42:41 +05:30
Vinayak Mehta
ffeb853c55
Rename plot.py to plotting.py
2018-09-06 06:21:54 +05:30
Vinayak Mehta
42d7a4ac02
Add import os
2018-09-06 06:15:13 +05:30
Vinayak Mehta
b91df8a1b8
Create parsers module
2018-09-06 06:13:58 +05:30
Vinayak Mehta
d0005101a7
Add BaseParser docstring stub
2018-09-06 05:55:05 +05:30
Vinayak Mehta
96af09d9cd
Add BaseParser and refactor extract_tables
2018-09-06 05:28:34 +05:30
Vinayak Mehta
a4d3165e94
Add docstring stubs
2018-09-05 19:35:46 +05:30
Vinayak Mehta
bf63432494
Remove docstrings
2018-09-05 19:04:40 +05:30
Vinayak Mehta
08cbababca
Add properties to GeometryList
2018-09-05 19:00:30 +05:30
Vinayak Mehta
73e52939f5
Add parsing_report property
2018-09-05 18:50:10 +05:30
Vinayak Mehta
9124e3374c
Add properties to Table
2018-09-05 18:20:46 +05:30
Vinayak Mehta
b9d77cb983
Decouple debug geometry from tables
2018-09-05 15:18:31 +05:30
Vinayak Mehta
941994f0bf
Make present code work with new API
2018-09-04 23:34:49 +05:30
Vinayak Mehta
e3aabb720f
Add stream and lattice to parsers
2018-09-04 21:28:37 +05:30
Vinayak Mehta
5d29f0c21c
Move Pdf class to core as FileHandler
2018-09-04 07:02:30 +05:30
Vinayak Mehta
c689735da2
Move cell and table to core
2018-09-04 03:49:43 +05:30
Vinayak Mehta
72c42c74db
Remove ocr
2018-09-01 16:23:54 +05:30
Vinayak Mehta
861ed0b64e
Fix lattice fill
2017-05-05 15:02:29 +05:30
Vinayak Mehta
e252e476b9
Add better y-cuts detection
2017-04-25 18:44:53 +05:30
Vinayak Mehta
76e1d32417
Add minor fix
...
Minor fix
2017-04-24 16:53:54 +05:30
Vinayak Mehta
bef33c75b1
Fix ValueError
2017-04-21 20:15:35 +05:30
Vinayak Mehta
fdb4b0d494
Update version
2017-04-21 15:41:32 +05:30
Vinayak Mehta
5c5bd6199c
Fix warnings and exceptions
2017-04-21 14:20:33 +05:30
Vinayak Mehta
18e1a799a1
Remove remove_empty
2017-04-21 13:22:37 +05:30
Vinayak Mehta
d28e4b8c1e
Change default value for iterations
2017-04-21 13:20:48 +05:30
Vinayak Mehta
4da754ddcb
[ENH] Add OCR and better joint detection
...
* Add iterations for dilation
* Add OCRLattice and OCRStream
* Add debug
2017-04-18 18:25:47 +05:30
Vinayak Mehta
7246e1a73d
Parallelize pdf split
2017-04-11 18:30:05 +05:30
Vinayak Mehta
4a87a77003
Remove ncols
2017-04-11 15:50:12 +05:30
Vinayak Mehta
72233f25ce
Parameterize thresholding blocksize and constant
2017-04-10 21:15:54 +05:30
Vinayak Mehta
84d354ba10
Add deepcopy and debug scripts
2017-04-10 18:59:48 +05:30
Vinayak Mehta
3eb18ef199
More logs
2017-02-07 22:23:05 +05:30
Vinayak Mehta
bc86346154
Don't let processes modify instance attributes
2017-02-07 22:13:33 +05:30
Vinayak Mehta
970256e19d
Add OCR support for image based pdfs with lines
...
* Cosmits
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
* Add image attribute in Table and Cell
* Add OCR
* Fix coordinates
* Add table_area
* Add ocr options to cli
* Direct ghostscript call output to /dev/null
* Add ocr dostring
* Add requirements
* Update README
2017-01-07 16:37:56 +05:30
Vinayak Mehta
70f626373b
Cosmits
...
* Remove unnecessary kwargs
* Direct ghostscript call output to /dev/null
* Change char_margin's default value
2017-01-07 15:58:45 +05:30
Vinayak Mehta
bd1d57a561
Update version
2017-01-07 15:50:20 +05:30
Vinayak Mehta
10eda3f204
Deprecate Stream ncolumns
2016-11-07 21:30:48 +05:30
Vinayak Mehta
72c2a0020f
Minor fix
2016-10-20 18:54:06 +05:30
Vinayak Mehta
5c6a74fb2a
Add new params
2016-10-18 18:23:35 +05:30
Vinayak Mehta
b01edee337
Handle rotation at entry
2016-10-18 15:33:38 +05:30
Vinayak Mehta
2a203a1865
Log warning when len(header) != len(cols)
2016-10-17 18:16:39 +05:30
Vinayak Mehta
adb948d363
Fix column parameter
2016-10-13 16:54:45 +05:30
Vinayak Mehta
40d30c1ab9
Add superscript and subscript flagging
...
* Add superscript flagging
* Add flagging param
* Add np.round to account for rotation error
2016-10-12 19:27:18 +05:30
Vinayak Mehta
e8b93a9624
Add headers param
2016-10-12 13:59:10 +05:30
Vinayak Mehta
a43d5ca2c7
Replace chars with textlines
...
* Add split function
* Add split_text and shift_text params
* Change get_rotation
* Move get_column_index to utils
* Add split_text and shift_text
* Fix split_text
2016-10-12 13:17:02 +05:30
Vinayak Mehta
52a2876ab1
Fix tarea type conversion
2016-10-04 19:57:53 +05:30
Vinayak Mehta
4b8e96a86a
Update docs
...
* Update README
* Update index.rst
* Update docstrings
* Fix typo
* Edit docs
* Add error messages
2016-10-04 17:50:48 +05:30
Vinayak Mehta
d46eeeab1a
Change jpg to png
2016-09-27 18:37:38 +05:30
Vinayak Mehta
75c7deffaa
Minor Stream fix
2016-09-27 17:27:34 +05:30
Vinayak Mehta
79afb45e2e
Support for vertical tables in Stream
...
* Change var names
* Add test pdf
* Add tests for Lattice rotation
* Add support for vertical tables in Stream, test pdfs
* Add tests for Stream rotation
2016-09-15 20:51:59 +05:30
Vinayak Mehta
8ce7b74671
Replace imagemagick with ghostscript
...
* Replace imagemagick with ghostscript
* Add quiet option
* Avoid repetition
* Remove Wand requirement
* Replace jpeg with png
2016-09-13 17:35:07 +05:30
Vinayak Mehta
757ba0444a
Remove jtol
2016-09-13 17:28:21 +05:30
Vinayak Mehta
439059817d
Update tests with new API
...
* Update Lattice tests with new API
* Update Stream tests with new API, fix CLI
* Add table_area test, Stream fixes
2016-09-09 16:56:25 +05:30
Vinayak Mehta
a94c350a7b
Fix param flow
...
* Fix param flow
* Add check for None
2016-09-09 14:52:38 +05:30
Vinayak Mehta
766260d5d9
Remove hybrid.py
2016-09-08 21:17:24 +05:30
Vinayak Mehta
98f47d1bd7
Fix table_bbox when no tarea is given
2016-09-05 21:26:16 +05:30
Vinayak Mehta
d86630e70b
Add table_area
...
[MRG] Add table_area
2016-09-05 18:51:59 +05:30
Vinayak Mehta
b2dd5f68fe
Fix vertical text detection in cells
...
* Fix vertical text detection in cells
* Add Cell instance method
* Change var names
2016-09-01 01:42:27 +05:30
Vinayak Mehta
8d56f15130
Add negative tolerance
2016-08-31 22:25:33 +05:30
Vinayak Mehta
2a55621d05
Fix magic grid extension
2016-08-31 21:06:41 +05:30
Vinayak Mehta
552f9cf422
Add various metrics to score the quality of a parse
...
Add various metrics to score the quality of a parse
2016-08-30 14:52:49 +05:30
Vinayak Mehta
7e5804f87d
Adds documentation
...
[MRG] Adds documentation
2016-08-09 17:23:50 +05:30
Vinayak Mehta
13568865b5
Add verbose
2016-08-03 13:14:19 +05:30
Vinayak Mehta
57917426e8
Fix docstrings
2016-08-03 13:14:11 +05:30
Vinayak Mehta
050107b63d
Minor fix
2016-07-29 21:47:20 +05:30
Vinayak Mehta
e9602bb353
Create python package
...
Add version support
Add new test file
[RFC] First phase
[RFC] Second phase
[RFC] Third phase
Add logging
Update README
Add debug
Add debug, fixes
Add pep8 changes
Add fix
Rename CLI tool
Add csv fix
Update README
Add fix for numpages
Update README
Update requirements.txt
Use yield
Add tuple unpacking fix
Fix n00b mistake
Add check for None
Fix check for None
Fix unicode
Add relative imports
2016-07-29 21:09:39 +05:30