camelot-py

A Python library to extract tabular data from PDFs

Go to file

Vinayak Mehta 47da8606a6 Remove .pyc files		2016-07-19 17:02:03 +05:30
README.md	First commit 🔥	2016-07-19 17:02:03 +05:30
basic.py	First commit 🔥	2016-07-19 17:02:03 +05:30
camelot.py	First commit 🔥	2016-07-19 17:02:03 +05:30
cell.py	First commit 🔥	2016-07-19 17:02:03 +05:30
morph_transform.py	First commit 🔥	2016-07-19 17:02:03 +05:30
pdf.py	First commit 🔥	2016-07-19 17:02:03 +05:30
spreadsheet.py	First commit 🔥	2016-07-19 17:02:03 +05:30
table.py	First commit 🔥	2016-07-19 17:02:03 +05:30

Camelot

usage: python2 camelot.py [options] pdf_file

Parse yo pdf!

positional arguments: file

optional arguments: -h, --help show this help message and exit

-p PAGES [PAGES ...] Specify the page numbers and/or page ranges to be parsed. Example: -p="1 3-5 9". (default: -p="1")

-f FORMAT Output format (csv/xlsx). Example: -f="xlsx" (default: -f="csv")

-spreadsheet Extract data stored in pdfs with ruling lines.

-guess [Experimental] Guess the values in empty cells.

-s [SCALE] Scaling factor. Large scaling factor leads to smaller lines being detected. (default: 15)

Under construction...