camelot-py

A Python library to extract tabular data from PDFs

Go to file

Vinayak Mehta f6869a9af4 Improve grid detection and add more options		2016-07-19 17:02:03 +05:30
.gitignore	Improve grid detection and add more options	2016-07-19 17:02:03 +05:30
README.md	Improve grid detection and add more options	2016-07-19 17:02:03 +05:30
basic.py	First commit 🔥	2016-07-19 17:02:03 +05:30
camelot.py	Improve grid detection and add more options	2016-07-19 17:02:03 +05:30
cell.py	First commit 🔥	2016-07-19 17:02:03 +05:30
morph_transform.py	Improve grid detection and add more options	2016-07-19 17:02:03 +05:30
pdf.py	First commit 🔥	2016-07-19 17:02:03 +05:30
spreadsheet.py	Improve grid detection and add more options	2016-07-19 17:02:03 +05:30
table.py	Improve grid detection and add more options	2016-07-19 17:02:03 +05:30

Camelot

usage: python2 camelot.py [options] pdf_file

Parse yo pdf!

positional arguments: file

optional arguments: -h, --help show this help message and exit

-p PAGES [PAGES ...] Specify the page numbers and/or page ranges to be parsed. Example: -p="1 3-5 9", -p="all" (default: -p="1")

-f FORMAT Output format (csv/xlsx). Example: -f="xlsx" (default: -f="csv")

-spreadsheet Extract data stored in pdfs with ruling lines. (default: False)

-F ORIENTATION Fill the values in empty cells. Example: -F="h", -F="v", -F="hv" (default: None)

-s [SCALE] Scaling factor. Large scaling factor leads to smaller lines being detected. (default: 15)

Under construction...