Briq/camelot-py

Fork 0

T

Vinayak Mehta 3d2c9b5435 Update CLI doc

2018-09-14 02:58:04 +05:30

camelot

Add advanced guide illustrations

2018-09-13 21:12:25 +05:30

docs

Update CLI doc

2018-09-14 02:58:04 +05:30

tests

Port tests

2018-09-09 05:29:24 +05:30

.coveragerc

Add coveragerc and update Makefile

2016-08-08 17:24:13 +05:30

.gitignore

Add _static

2018-09-13 16:25:42 +05:30

CODE_OF_CONDUCT.md

Add quickstart

2018-09-13 15:50:30 +05:30

CONTRIBUTING.md

Add quickstart

2018-09-13 15:50:30 +05:30

LICENSE

Add LICENSE and _templates

2018-09-11 18:47:29 +05:30

README.md

Add quickstart

2018-09-13 15:50:30 +05:30

requirements-dev.txt

Fix setup.py

2018-09-11 08:31:37 +05:30

requirements.txt

Fix setup.py

2018-09-11 08:31:37 +05:30

setup.cfg

Add setup.cfg

2018-09-09 05:41:42 +05:30

setup.py

Fix setup.py

2018-09-11 08:31:37 +05:30

README.md

Camelot: PDF Table Parsing for Humans

Camelot is a Python library which makes it easy for anyone to extract tables from PDF files!

Here's how you can extract tables from PDF files. Check out the PDF used in this example, here.

>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf', mesh=True)
>>> tables
<TableList tables=1>
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html
>>> tables[0]
<Table shape=(7, 7)>
>>> tables[0].parsing_report
{
    'accuracy': 99.02,
    'whitespace': 12.24,
    'order': 1,
    'page': 1
}
>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html
>>> tables[0].df # get a pandas DataFrame!

Cycle Name	KI (1/km)	Distance (mi)	Percent Fuel Savings
			Improved Speed	Decreased Accel	Eliminate Stops	Decreased Idle
2012_2	3.30	1.3	5.9%	9.5%	29.2%	17.4%
2145_1	0.68	11.2	2.4%	0.1%	9.5%	2.7%
4234_1	0.59	58.7	8.5%	1.3%	8.5%	3.3%
2032_2	0.17	57.8	21.7%	0.3%	2.7%	1.2%
4171_1	0.07	173.9	58.1%	1.6%	2.1%	0.5%

There's a command-line interface too!

Why Camelot?

You are in control: Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (Since everything in the real world, including PDF table extraction, is fuzzy.)
Metrics: Bad tables can be discarded based on metrics like accuracy and whitespace, without ever having to manually look at each table.
Each table is a pandas DataFrame, which enables seamless integration into data analysis workflows.
Export to multiple formats, including json, excel and html.
Simple and Elegant API, written in Python!

Installation

After installing the dependencies, you can simply use pip to install Camelot:

$ pip install camelot-py

Documentation

The documentation is available at link.

Development

The contribution guidelines has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.

Source code

You can check the latest sources with the command:

$ git clone https://www.github.com/socialcopsdev/camelot.git

Setting up development environment

You can install the development dependencies with the command:

$ pip install camelot-py[dev]

Testing

After installation, you can run tests using:

$ python setup.py test

License

This project is licensed under the MIT License.