A Python library to extract tabular data from PDFs
 
 
Go to file
Vinayak Mehta 81bdef4173 Update README 2018-09-12 18:41:52 +05:30
camelot Add Stream benchmarks 2018-09-12 07:21:35 +05:30
docs Add Stream benchmarks 2018-09-12 07:21:35 +05:30
tests Port tests 2018-09-09 05:29:24 +05:30
.coveragerc Add coveragerc and update Makefile 2016-08-08 17:24:13 +05:30
.gitignore Add docstrings and update docs 2018-09-09 10:00:22 +05:30
LICENSE Add LICENSE and _templates 2018-09-11 18:47:29 +05:30
README.md Update README 2018-09-12 18:41:52 +05:30
requirements-dev.txt Fix setup.py 2018-09-11 08:31:37 +05:30
requirements.txt Fix setup.py 2018-09-11 08:31:37 +05:30
setup.cfg Add setup.cfg 2018-09-09 05:41:42 +05:30
setup.py Fix setup.py 2018-09-11 08:31:37 +05:30

README.md

Camelot: PDF Table Parsing for Humans

Camelot is a Python library and command-line tool for extracting tables from PDF files.

Usage

>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>> tables
<TableList n=2>
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html
>>> tables[0]
<Table shape=(3,4)>
>>> tables[0].parsing_report
{
    'accuracy': 96,
    'whitespace': 80,
    'order': 1,
    'page': 1
}
>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html
>>> tables[0].df # get a pandas dataframe!

There's a command-line tool too!

Installation

After installing dependencies, you can simply use pip:

$ pip install camelot-py

Documentation

Th documentation is available at link.

Development

The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.

Source code

You can check the latest sources with the command:

$ git clone https://www.github.com/socialcopsdev/camelot.git

Setting up development environment

You can install the development dependencies with the command:

$ pip install camelot-py[dev]

Testing

After installation, you can run tests using:

$ python setup.py test

License

This project is licensed under the MIT License.