Add temporary directory context manager |
||
|---|---|---|
| camelot | ||
| docs | ||
| tests | ||
| .coveragerc | ||
| .gitignore | ||
| README.md | ||
| requirements-dev.txt | ||
| requirements.txt | ||
| setup.cfg | ||
| setup.py | ||
README.md
Camelot: PDF Table Parsing for Humans
Camelot is a Python 2.7 library and command-line tool for extracting tabular data from PDF files.
Usage
>>> import camelot
>>> tables = camelot.read_pdf("foo.pdf")
>>> tables
<TableList n=2>
>>> tables.export("foo.csv", f="csv", compress=True) # json, excel, html
>>> tables[0]
<Table shape=(3,4)>
>>> tables[0].to_csv("foo.csv") # to_json, to_excel, to_html
>>> tables[0].parsing_report
{
"accuracy": 96,
"whitespace": 80,
"order": 1,
"page": 1
}
>>> df = tables[0].df
Dependencies
The dependencies include tk and ghostscript.
Installation
Make sure you have the most updated versions for pip and setuptools. You can update them by
pip install -U pip setuptools
Installing dependencies
tk and ghostscript can be installed using your system's default package manager.
Linux
- Ubuntu
sudo apt-get install python-tk ghostscript
- Arch Linux
sudo pacman -S tk ghostscript
OS X
brew install tcl-tk ghostscript
Finally, cd into the project directory and install by
python setup.py install
Development
Code
You can check the latest sources with the command:
git clone https://github.com/socialcopsdev/camelot.git
Contributing
See Contributing guidelines.
Testing
python setup.py test
License
BSD License