camelot-py/README.md

3.4 KiB

camelot

Camelot is a Python 2.7 library and command-line tool for getting tables out of PDF files.

Usage

from camelot.pdf import Pdf
from camelot.lattice import Lattice

extractor = Lattice(Pdf("/path/to/pdf", pagenos=[{'start': 2, 'end': 4}]))
tables = extractor.get_tables()

Camelot comes with a command-line tool in which you can specify the output format (csv, tsv, html, json, and xlsx), page numbers you want to parse and the output directory in which you want the output files to be placed. By default, the output files are placed in the same directory as the PDF.

camelot parses tables from PDFs!

usage:
 camelot.py [options]  [...]

options:
 -h, --help                      Show this screen.
 -v, --version                   Show version.
 -p, --pages <pageno>            Comma-separated list of page numbers.
                                 Example: -p 1,3-6,10  [default: 1]
 -f, --format <format>           Output format. (csv,tsv,html,json,xlsx) [default: csv]
 -l, --log                       Print log to file.
 -o, --output <directory>        Output directory.

camelot methods:
 lattice  Looks for lines between data.
 stream   Looks for spaces between data.

See 'camelot <method> -h' for more information on a specific method.

Dependencies

Currently, camelot works under Python 2.7.

The required dependencies include numpy, OpenCV and ImageMagick.

Installation

Make sure you have the most updated versions for pip and setuptools. You can update them by

pip install -U pip, setuptools

We strongly recommend that you use a virtual environment to install Camelot. If you don't want to use a virtual environment, then skip the next section.

Installing virtualenvwrapper

You'll need to install virtualenvwrapper.

pip install virtualenvwrapper

or

sudo pip install virtualenvwrapper

After installing virtualenvwrapper, add the following lines to your .bashrc and source it.

export WORKON_HOME=$HOME/.virtualenvs
source /usr/bin/virtualenvwrapper.sh

The path to virtualenvwrapper.sh could be different on your system.

Finally make a virtual environment using

mkvirtualenv camelot

Installing dependencies

numpy can be install using pip.

pip install numpy

OpenCV and imagemagick can be installed using your system's default package manager.

Linux

  • Arch Linux
sudo pacman -S opencv imagemagick
  • Ubuntu
sudo apt-get install libopencv-dev python-opencv imagemagick

OS X

brew install homebrew/science/opencv imagemagick

If you're working in a virtualenv, you'll need to create a symbolic link for the OpenCV shared object file

sudo ln -s /path/to/system/site-packages/cv2.so ~/path/to/virtualenv/site-packages/cv2.so

Finally, cd into the project directory and install by doing

make install

Development

Code

You can check the latest sources with the command:

git clone https://github.com/socialcopsdev/camelot.git

Contributing

See Contributing doc.

Testing

make test

License

BSD License