[MRG] Adds documentation |
||
|---|---|---|
| camelot | ||
| docs | ||
| examples | ||
| tests | ||
| tools | ||
| .coveragerc | ||
| .gitignore | ||
| Makefile | ||
| README.md | ||
| requirements.txt | ||
| setup.py | ||
README.md
camelot
Camelot is a Python 2.7 library and command-line tool for getting tables out of PDF files.
Usage
from camelot.pdf import Pdf
from camelot.lattice import Lattice
extractor = Lattice(Pdf("/path/to/pdf", pagenos=[{'start': 2, 'end': 4}]))
tables = extractor.get_tables()
Camelot comes with a command-line tool in which you can specify the output format (csv, tsv, html, json, and xlsx), page numbers you want to parse and the output directory in which you want the output files to be placed. By default, the output files are placed in the same directory as the PDF.
camelot parses tables from PDFs!
usage:
camelot.py [options] [...]
options:
-h, --help Show this screen.
-v, --version Show version.
-p, --pages <pageno> Comma-separated list of page numbers.
Example: -p 1,3-6,10 [default: 1]
-f, --format <format> Output format. (csv,tsv,html,json,xlsx) [default: csv]
-l, --log Print log to file.
-o, --output <directory> Output directory.
camelot methods:
lattice Looks for lines between data.
stream Looks for spaces between data.
See 'camelot -h' for more information on a specific method.
Dependencies
Currently, camelot works under Python 2.7.
The required dependencies include numpy, OpenCV and ImageMagick.
Installation
Make sure you have the most updated versions for pip and setuptools. You can update them by
pip install -U pip, setuptools
We strongly recommend that you use a virtual environment to install Camelot. If you don't want to use a virtual environment, then skip the next section.
Installing virtualenvwrapper
You'll need to install virtualenvwrapper.
pip install virtualenvwrapper
or
sudo pip install virtualenvwrapper
After installing virtualenvwrapper, add the following lines to your .bashrc and source it.
export WORKON_HOME=$HOME/.virtualenvs source /usr/bin/virtualenvwrapper.sh
The path to virtualenvwrapper.sh could be different on your system.
Finally make a virtual environment using
mkvirtualenv camelot
Installing dependencies
numpy can be install using pip.
pip install numpy
OpenCV and imagemagick can be installed using your system's default package manager.
Linux
- Arch Linux
sudo pacman -S opencv imagemagick
- Ubuntu
sudo apt-get install libopencv-dev python-opencv imagemagick
OS X
brew install homebrew/science/opencv imagemagick
If you're working in a virtualenv, you'll need to create a symbolic link for the OpenCV shared object file
sudo ln -s /path/to/system/site-packages/cv2.so ~/path/to/virtualenv/site-packages/cv2.so
Finally, cd into the project directory and install by doing
make install
Development
Code
You can check the latest sources with the command:
git clone https://github.com/socialcopsdev/camelot.git
Contributing
See Contributing doc.
Testing
make test
License
BSD License