A Python library to extract tabular data from PDFs

Go to file

Vinayak Mehta fcef880e6c Merge pull request #98 from socialcopsdev/temp_dir Add temporary directory context manager		2018-09-09 18:11:33 +05:30
camelot	Add temporary directory context manager	2018-09-09 18:10:55 +05:30
docs	Add docstrings and update docs	2018-09-09 10:00:22 +05:30
tests	Port tests	2018-09-09 05:29:24 +05:30
.coveragerc	Add coveragerc and update Makefile	2016-08-08 17:24:13 +05:30
.gitignore	Add docstrings and update docs	2018-09-09 10:00:22 +05:30
README.md	Update README	2018-09-09 10:09:26 +05:30
requirements-dev.txt	Add docstrings and update docs	2018-09-09 10:00:22 +05:30
requirements.txt	Add docstrings and update docs	2018-09-09 10:00:22 +05:30
setup.cfg	Add setup.cfg	2018-09-09 05:41:42 +05:30
setup.py	Add docstrings and update docs	2018-09-09 10:00:22 +05:30

README.md

Camelot: PDF Table Parsing for Humans

Camelot is a Python 2.7 library and command-line tool for extracting tabular data from PDF files.

Usage

>>> import camelot
>>> tables = camelot.read_pdf("foo.pdf")
>>> tables
<TableList n=2>
>>> tables.export("foo.csv", f="csv", compress=True) # json, excel, html
>>> tables[0]
<Table shape=(3,4)>
>>> tables[0].to_csv("foo.csv") # to_json, to_excel, to_html
>>> tables[0].parsing_report
{
    "accuracy": 96,
    "whitespace": 80,
    "order": 1,
    "page": 1
}
>>> df = tables[0].df

Dependencies

The dependencies include tk and ghostscript.

Installation

Make sure you have the most updated versions for pip and setuptools. You can update them by

pip install -U pip setuptools

Installing dependencies

tk and ghostscript can be installed using your system's default package manager.

Linux

Ubuntu

sudo apt-get install python-tk ghostscript

Arch Linux

sudo pacman -S tk ghostscript

OS X

brew install tcl-tk ghostscript

Finally, cd into the project directory and install by

python setup.py install

Development

Code

You can check the latest sources with the command:

git clone https://github.com/socialcopsdev/camelot.git

Contributing

See Contributing guidelines.

Testing

python setup.py test

License

BSD License