A Python library to extract tabular data from PDFs
 
 
Go to file
Vinayak Mehta 7f1ea11ff1
Update README
2018-09-09 10:08:49 +05:30
camelot Fix get_rotation 2018-09-09 10:04:54 +05:30
docs Add docstrings and update docs 2018-09-09 10:00:22 +05:30
tests Port tests 2018-09-09 05:29:24 +05:30
.coveragerc Add coveragerc and update Makefile 2016-08-08 17:24:13 +05:30
.gitignore Add docstrings and update docs 2018-09-09 10:00:22 +05:30
README.md Update README 2018-09-09 10:08:49 +05:30
requirements-dev.txt Add docstrings and update docs 2018-09-09 10:00:22 +05:30
requirements.txt Add docstrings and update docs 2018-09-09 10:00:22 +05:30
setup.cfg Add setup.cfg 2018-09-09 05:41:42 +05:30
setup.py Add docstrings and update docs 2018-09-09 10:00:22 +05:30

README.md

Camelot: PDF Table Parsing for Humans

Camelot is a Python 2.7 library and command-line tool for extracting tabular data from PDF files.

Usage

>>> import camelot
>>> tables = camelot.read_pdf("foo.pdf")
>>> tables
<TableList n=2>
>>> tables.export("foo.csv", f="csv", compress=True) # json, excel, html
>>> tables[0]
<Table shape=(3,4)>
>>> tables[0].to_csv("foo.csv") # to_json, to_excel, to_html
>>> tables[0].parsing_report
{
    "accuracy": 96,
    "whitespace": 80,
    "order": 1,
    "page": 1
}
>>> df = tables[0].df

Dependencies

The dependencies include tk and ghostscript.

Installation

Make sure you have the most updated versions for pip and setuptools. You can update them by

pip install -U pip setuptools

Installing dependencies

tk and ghostscript can be installed using your system's default package manager.

Linux

  • Ubuntu
sudo apt-get install python-opencv python-tk ghostscript
  • Arch Linux
sudo pacman -S tk ghostscript

OS X

brew install tcl-tk ghostscript

Finally, cd into the project directory and install by

python setup.py install

Development

Code

You can check the latest sources with the command:

git clone https://github.com/socialcopsdev/camelot.git

Contributing

See Contributing guidelines.

Testing

python setup.py test

License

BSD License