A Python library to extract tabular data from PDFs
 
 
Go to file
Vinayak Mehta 47da8606a6 Remove .pyc files 2016-07-19 17:02:03 +05:30
README.md First commit 🔥 2016-07-19 17:02:03 +05:30
basic.py First commit 🔥 2016-07-19 17:02:03 +05:30
camelot.py First commit 🔥 2016-07-19 17:02:03 +05:30
cell.py First commit 🔥 2016-07-19 17:02:03 +05:30
morph_transform.py First commit 🔥 2016-07-19 17:02:03 +05:30
pdf.py First commit 🔥 2016-07-19 17:02:03 +05:30
spreadsheet.py First commit 🔥 2016-07-19 17:02:03 +05:30
table.py First commit 🔥 2016-07-19 17:02:03 +05:30

README.md

Camelot

usage: python2 camelot.py [options] pdf_file

Parse yo pdf!

positional arguments: file

optional arguments: -h, --help show this help message and exit

-p PAGES [PAGES ...] Specify the page numbers and/or page ranges to be parsed. Example: -p="1 3-5 9". (default: -p="1")

-f FORMAT Output format (csv/xlsx). Example: -f="xlsx" (default: -f="csv")

-spreadsheet Extract data stored in pdfs with ruling lines.

-guess [Experimental] Guess the values in empty cells.

-s [SCALE] Scaling factor. Large scaling factor leads to smaller lines being detected. (default: 15)

Under construction...