A Python library to extract tabular data from PDFs
 
 
Go to file
Vinayak Mehta f6869a9af4 Improve grid detection and add more options 2016-07-19 17:02:03 +05:30
.gitignore Improve grid detection and add more options 2016-07-19 17:02:03 +05:30
README.md Improve grid detection and add more options 2016-07-19 17:02:03 +05:30
basic.py First commit 🔥 2016-07-19 17:02:03 +05:30
camelot.py Improve grid detection and add more options 2016-07-19 17:02:03 +05:30
cell.py First commit 🔥 2016-07-19 17:02:03 +05:30
morph_transform.py Improve grid detection and add more options 2016-07-19 17:02:03 +05:30
pdf.py First commit 🔥 2016-07-19 17:02:03 +05:30
spreadsheet.py Improve grid detection and add more options 2016-07-19 17:02:03 +05:30
table.py Improve grid detection and add more options 2016-07-19 17:02:03 +05:30

README.md

Camelot

usage: python2 camelot.py [options] pdf_file

Parse yo pdf!

positional arguments: file

optional arguments: -h, --help show this help message and exit

-p PAGES [PAGES ...] Specify the page numbers and/or page ranges to be parsed. Example: -p="1 3-5 9", -p="all" (default: -p="1")

-f FORMAT Output format (csv/xlsx). Example: -f="xlsx" (default: -f="csv")

-spreadsheet Extract data stored in pdfs with ruling lines. (default: False)

-F ORIENTATION Fill the values in empty cells. Example: -F="h", -F="v", -F="hv" (default: None)

-s [SCALE] Scaling factor. Large scaling factor leads to smaller lines being detected. (default: 15)

Under construction...