Bump version and update docs

pull/249/head
Vinayak Mehta 2021-07-07 04:29:23 +05:30
parent d39ca4502b
commit f43235934b
No known key found for this signature in database
GPG Key ID: 2DE013537A15A9A4
4 changed files with 42 additions and 1 deletions

View File

@ -6,6 +6,7 @@ master
**Improvements** **Improvements**
- Add pdftopng for image conversion and use ghostscript as fallback. [#198](https://github.com/camelot-dev/camelot/pull/198) by Vinayak Mehta.
- Add markdown export format. [#222](https://github.com/camelot-dev/camelot/pull/222/) by [Lucas Cimon](https://github.com/Lucas-C). - Add markdown export format. [#222](https://github.com/camelot-dev/camelot/pull/222/) by [Lucas Cimon](https://github.com/Lucas-C).
**Documentation** **Documentation**

View File

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
VERSION = (0, 9, 0) VERSION = (0, 10, 0)
PRERELEASE = None # alpha, beta or rc PRERELEASE = None # alpha, beta or rc
REVISION = None REVISION = None

View File

@ -623,3 +623,29 @@ To deal with such cases, you can tweak PDFMiner's `LAParams kwargs <https://gith
:: ::
>>> tables = camelot.read_pdf('foo.pdf', layout_kwargs={'detect_vertical': False}) >>> tables = camelot.read_pdf('foo.pdf', layout_kwargs={'detect_vertical': False})
.. _image-conversion-backend:
Use alternate image conversion backends
---------------------------------------
When using the :ref:`Lattice <lattice>` flavor, Camelot uses `pdftopng <https://github.com/vinayak-mehta/pdftopng>`_ to convert PDF pages to images for line recognition. This should work out of the box on most operating systems. However, if you get an error, you can supply your own image conversion backend to Camelot::
>>> class ConversionBackend(object):
>>> def convert(pdf_path, png_path):
>>> # read pdf page from pdf_path
>>> # convert pdf page to image
>>> # write image to png_path
>>> pass
>>>
>>> tables = camelot.read_pdf(filename, backend=ConversionBackend())
.. note:: If image conversion using ``pdftopng`` fails, Camelot falls back to ``ghostscript`` to try image conversion again, and if that fails, it raises an error.
In case you want to be explicit about the image conversion backend that Camelot should use, you can supply them like this::
>>> from camelot.backends.poppler_backend import PopplerBackend
>>> from camelot.backends.ghostscript_backend import GhostscriptBackend
>>>
>>> tables = camelot.read_pdf(filename, backend=PopplerBackend())
>>> tables = camelot.read_pdf(filename, backend=GhostscriptBackend())

View File

@ -54,3 +54,17 @@ For more details, check out this code snippet from `@anakin87 <https://github.co
pages_string = str(chunk).replace("[", "").replace("]", "") pages_string = str(chunk).replace("[", "").replace("]", "")
tables = camelot.read_pdf(filepath, pages=pages_string, **params) tables = camelot.read_pdf(filepath, pages=pages_string, **params)
tables.export(f"{export_path}/tables.csv") tables.export(f"{export_path}/tables.csv")
How can I supply my own image conversion backend to Lattice?
------------------------------------------------------------
When using the :ref:`Lattice <lattice>` flavor, you can supply your own :ref:`image conversion backend <image-conversion-backend>` by creating a class with a ``convert`` method as follows::
>>> class ConversionBackend(object):
>>> def convert(pdf_path, png_path):
>>> # read pdf page from pdf_path
>>> # convert pdf page to image
>>> # write image to png_path
>>> pass
>>>
>>> tables = camelot.read_pdf(filename, backend=ConversionBackend())