Update README and index.rst

pull/196/head
Vinayak Mehta 2020-09-08 00:35:32 +05:30
parent 6b42094db5
commit 2a7a4f5b34
No known key found for this signature in database
GPG Key ID: 2170CDB940114C1D
2 changed files with 37 additions and 66 deletions

View File

@ -10,13 +10,13 @@
[![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) [![image](https://img.shields.io/badge/continous%20quality-deepsource-lightgrey)](https://deepsource.io/gh/camelot-dev/camelot/?ref=repository-badge)
**Camelot** is a Python library that makes it easy for *anyone* to extract tables from PDF files!
**Camelot** is a Python library that can help you extract tables from PDFs!
**Note:** You can also check out [Excalibur](https://github.com/camelot-dev/excalibur), which is a web interface for Camelot!
**Note:** You can also check out [Excalibur](https://github.com/camelot-dev/excalibur), the web interface to Camelot!
---
**Here's how you can extract tables from PDF files.** Check out the PDF used in this example [here](https://github.com/camelot-dev/camelot/blob/master/docs/_static/pdf/foo.pdf).
**Here's how you can extract tables from PDFs.** You can check out the PDF used in this example [here](https://github.com/camelot-dev/camelot/blob/master/docs/_static/pdf/foo.pdf).
<pre>
>>> import camelot
@ -46,24 +46,27 @@
| 2032_2 | 0.17 | 57.8 | 21.7% | 0.3% | 2.7% | 1.2% |
| 4171_1 | 0.07 | 173.9 | 58.1% | 1.6% | 2.1% | 0.5% |
There's a [command-line interface](https://camelot-py.readthedocs.io/en/master/user/cli.html) too!
Camelot also comes packaged with a [command-line interface](https://camelot-py.readthedocs.io/en/master/user/cli.html)!
**Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
## Why Camelot?
- **You are in control.**: Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
- *Bad* tables can be discarded based on **metrics** like accuracy and whitespace, without ever having to manually look at each table.
- Each table is a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873).
- **Export** to multiple formats, including JSON, Excel, HTML and Sqlite.
- **Configurability**: Camelot gives you control over the table extraction process with its [tweakable settings](https://camelot-py.readthedocs.io/en/master/user/advanced.html).
- **Metrics**: Bad tables can be discarded based on metrics like accuracy and whitespace, without having to manually look at each table.
- **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873). You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML and Sqlite.
See [comparison with other PDF table extraction libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).
See [comparison with similar libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).
## Support the development
If Camelot has helped you, please consider supporting its development with a one-time or monthly donation [on OpenCollective](https://opencollective.com/camelot).
## Installation
### Using conda
The easiest way to install Camelot is to install it with [conda](https://conda.io/docs/), which is a package manager and environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution.
The easiest way to install Camelot is with [conda](https://conda.io/docs/), which is a package manager and environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution.
<pre>
$ conda install -c conda-forge camelot-py
@ -71,7 +74,7 @@ $ conda install -c conda-forge camelot-py
### Using pip
After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install-deps.html) ([tk](https://packages.ubuntu.com/bionic/python/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can simply use pip to install Camelot:
After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install-deps.html) ([tk](https://packages.ubuntu.com/bionic/python/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can also just use pip to install Camelot:
<pre>
$ pip install "camelot-py[cv]"
@ -94,40 +97,16 @@ $ pip install ".[cv]"
## Documentation
Great documentation is available at [http://camelot-py.readthedocs.io/](http://camelot-py.readthedocs.io/).
## Development
The [Contributor's Guide](https://camelot-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.
### Source code
You can check the latest sources with:
<pre>
$ git clone https://www.github.com/camelot-dev/camelot
</pre>
### Setting up a development environment
You can install the development dependencies easily, using pip:
<pre>
$ pip install "camelot-py[dev]"
</pre>
### Testing
After installation, you can run tests using:
<pre>
$ python setup.py test
</pre>
The documentation is available at [http://camelot-py.readthedocs.io/](http://camelot-py.readthedocs.io/).
## Wrappers
- [camelot-php](https://github.com/randomstate/camelot-php) provides a [PHP](https://www.php.net/) wrapper on Camelot.
## Contributing
The [Contributor's Guide](https://camelot-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing issues, documentation, code, and tests.
## Versioning
Camelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/camelot-dev/camelot/blob/master/HISTORY.md).
@ -135,9 +114,3 @@ Camelot uses [Semantic Versioning](https://semver.org/). For the available versi
## License
This project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/camelot/blob/master/LICENSE) file for details.
## Support the development
You can support our work on Camelot with a one-time or monthly donation [on OpenCollective](https://opencollective.com/camelot). Organizations who use camelot can also sponsor the project for an acknowledgement on [our documentation site](https://camelot-py.readthedocs.io/en/master/) and this README.
Special thanks to all the users, organizations and contributors that support Camelot!

View File

@ -36,15 +36,15 @@ Release v\ |version|. (:ref:`Installation <install>`)
.. image:: https://img.shields.io/badge/continous%20quality-deepsource-lightgrey
:target: https://deepsource.io/gh/camelot-dev/camelot/?ref=repository-badge
**Camelot** is a Python library that makes it easy for *anyone* to extract tables from PDF files!
**Camelot** is a Python library that can help you extract tables from PDFs!
.. note:: You can also check out `Excalibur`_, which is a web interface for Camelot!
.. note:: You can also check out `Excalibur`_, the web interface to Camelot!
.. _Excalibur: https://github.com/camelot-dev/excalibur
----
**Here's how you can extract tables from PDF files.** Check out the PDF used in this example `here`_.
**Here's how you can extract tables from PDFs.** You can check out the PDF used in this example `here`_.
.. _here: _static/pdf/foo.pdf
@ -70,7 +70,7 @@ Release v\ |version|. (:ref:`Installation <install>`)
.. csv-table::
:file: _static/csv/foo.csv
There's a :ref:`command-line interface <cli>` too!
Camelot also comes packaged with a :ref:`command-line interface <cli>`!
.. note:: Camelot only works with text-based PDFs and not scanned documents. (As Tabula `explains`_, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
@ -79,27 +79,27 @@ There's a :ref:`command-line interface <cli>` too!
Why Camelot?
------------
- **You are in control.** Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
- *Bad* tables can be discarded based on **metrics** like accuracy and whitespace, without ever having to manually look at each table.
- Each table is a **pandas DataFrame**, which seamlessly integrates into `ETL and data analysis workflows`_.
- **Export** to multiple formats, including JSON, Excel and HTML.
See `comparison with other PDF table extraction libraries and tools`_.
- **Configurability**: Camelot gives you control over the table extraction process with its :ref:`tweakable settings <advanced>`.
- **Metrics**: Bad tables can be discarded based on metrics like accuracy and whitespace, without having to manually look at each table.
- **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into `ETL and data analysis workflows`_. You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML and Sqlite.
.. _ETL and data analysis workflows: https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873
.. _comparison with other PDF table extraction libraries and tools: https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
Support us on OpenCollective
----------------------------
See `comparison with similar libraries and tools`_.
If Camelot helped you extract tables from PDFs, please consider supporting its development by `becoming a backer or a sponsor on OpenCollective`_!
.. _comparison with similar libraries and tools: https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
.. _becoming a backer or a sponsor on OpenCollective: https://opencollective.com/camelot
Support the development
-----------------------
If Camelot has helped you, please consider supporting its development with a one-time or monthly donation `on OpenCollective`_!
.. _on OpenCollective: https://opencollective.com/camelot
The User Guide
--------------
This part of the documentation begins with some background information about why Camelot was created, takes a small dip into the implementation details and then focuses on step-by-step instructions for getting the most out of Camelot.
This part of the documentation begins with some background information about why Camelot was created, takes you through some implementation details, and then focuses on step-by-step instructions for getting the most out of Camelot.
.. toctree::
:maxdepth: 2
@ -115,8 +115,7 @@ This part of the documentation begins with some background information about why
The API Documentation/Guide
---------------------------
If you are looking for information on a specific function, class, or method,
this part of the documentation is for you.
If you are looking for information on a specific function, class, or method, this part of the documentation is for you.
.. toctree::
:maxdepth: 2
@ -126,8 +125,7 @@ this part of the documentation is for you.
The Contributor Guide
---------------------
If you want to contribute to the project, this part of the documentation is for
you.
If you want to contribute to the project, this part of the documentation is for you.
.. toctree::
:maxdepth: 2