A Python library to extract tabular data from PDFs
 
 
Go to file
Vinayak Mehta dda809b286 Fix Makefile spaces to tabs 2016-08-08 17:26:54 +05:30
camelot Add verbose 2016-08-03 13:14:19 +05:30
docs Fix Makefile spaces to tabs 2016-08-08 17:26:54 +05:30
tests Create python package 2016-07-29 21:09:39 +05:30
tools Add verbose 2016-08-03 13:14:19 +05:30
.coveragerc Add coveragerc and update Makefile 2016-08-08 17:24:13 +05:30
.gitignore Add Makefile 2016-08-08 16:32:05 +05:30
Makefile Fix Makefile spaces to tabs 2016-08-08 17:26:54 +05:30
README.md Create python package 2016-07-29 21:09:39 +05:30
requirements.txt Create python package 2016-07-29 21:09:39 +05:30
setup.py Create python package 2016-07-29 21:09:39 +05:30

README.md

camelot

Dependencies

Currently, camelot works under Python 2.7.

The required dependencies include numpy, opencv, and imagemagick.

Install

Make sure you have the required dependencies installed on your system. If you're working in a virtual environment, copy the cv2.so file from your system's site-packages to the virtualenv's site-packages. After that, cd into the project directory and issue the following command.

python setup.py install

Usage

from camelot import *

extractor = Lattice(Pdf("/path/to/pdf", pagenos=[{'start': 2, 'end': 4}]))
tables = extractor.get_tables()
camelot parses tables from PDFs!

usage:
 camelot.py [options]  [...]

options:
 -h, --help                      Show this screen.
 -v, --version                   Show version.
 -p, --pages <pageno>            Comma-separated list of page numbers.
                                 Example: -p 1,3-6,10  [default: 1]
 -f, --format <format>           Output format. (csv,tsv,html,json,xlsx) [default: csv]
 -l, --log                       Print log to file.
 -o, --output <directory>        Output directory.

camelot methods:
 lattice  Looks for lines between data.
 stream   Looks for spaces between data.

See 'camelot  -h' for more information on a specific method.

Development

Code

You can check the latest sources with the command:

git clone https://github.com/socialcopsdev/camelot.git

Contributing

The preferred way to contribute to camelot is to fork this repository, and then submit a "pull request" (PR):

  1. Create an account on GitHub if you don't already have one.
  2. Fork the project repository: click on the Fork button near the top of the page. This creates a copy of the code under your account on the GitHub server.
  3. Clone this copy to your local disk.
  4. Create a branch to hold your changes:
git checkout -b my-feature

and start making changes. Never work in the master branch! 5. Work on this copy, on your computer, using Git to do the version control. When youre done editing, do:

$ git add modified_files
$ git commit

to record your changes in Git, then push them to GitHub with:

$ git push -u origin my-feature

Finally, go to the web page of the your fork of the camelot repo, and click Pull request to send your changes to the maintainers for review.

Testing

License