|
|
||
|---|---|---|
| camelot | ||
| docs | ||
| tests | ||
| .coveragerc | ||
| .gitignore | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| README.md | ||
| requirements-dev.txt | ||
| requirements.txt | ||
| setup.cfg | ||
| setup.py | ||
README.md
Camelot: PDF Table Parsing for Humans
Camelot is a Python library which makes it easy for anyone to extract tables from PDF files!
Usage
>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>> tables
<TableList n=2>
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html
>>> tables[0]
<Table shape=(3,4)>
>>> tables[0].parsing_report
{
'accuracy': 96,
'whitespace': 80,
'order': 1,
'page': 1
}
>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html
>>> tables[0].df # get a pandas DataFrame!
There's a command-line interface too!
Installation
After installing dependencies, you can simply use pip:
$ pip install camelot-py
Documentation
Th documentation is available at link.
Development
The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.
Source code
You can check the latest sources with the command:
$ git clone https://www.github.com/socialcopsdev/camelot.git
Setting up development environment
You can install the development dependencies with the command:
$ pip install camelot-py[dev]
Testing
After installation, you can run tests using:
$ python setup.py test
License
This project is licensed under the MIT License.