Bump version and update docs
parent
d39ca4502b
commit
f43235934b
|
|
@ -6,6 +6,7 @@ master
|
|||
|
||||
**Improvements**
|
||||
|
||||
- Add pdftopng for image conversion and use ghostscript as fallback. [#198](https://github.com/camelot-dev/camelot/pull/198) by Vinayak Mehta.
|
||||
- Add markdown export format. [#222](https://github.com/camelot-dev/camelot/pull/222/) by [Lucas Cimon](https://github.com/Lucas-C).
|
||||
|
||||
**Documentation**
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
|
||||
VERSION = (0, 9, 0)
|
||||
VERSION = (0, 10, 0)
|
||||
PRERELEASE = None # alpha, beta or rc
|
||||
REVISION = None
|
||||
|
||||
|
|
|
|||
|
|
@ -623,3 +623,29 @@ To deal with such cases, you can tweak PDFMiner's `LAParams kwargs <https://gith
|
|||
::
|
||||
|
||||
>>> tables = camelot.read_pdf('foo.pdf', layout_kwargs={'detect_vertical': False})
|
||||
|
||||
.. _image-conversion-backend:
|
||||
|
||||
Use alternate image conversion backends
|
||||
---------------------------------------
|
||||
|
||||
When using the :ref:`Lattice <lattice>` flavor, Camelot uses `pdftopng <https://github.com/vinayak-mehta/pdftopng>`_ to convert PDF pages to images for line recognition. This should work out of the box on most operating systems. However, if you get an error, you can supply your own image conversion backend to Camelot::
|
||||
|
||||
>>> class ConversionBackend(object):
|
||||
>>> def convert(pdf_path, png_path):
|
||||
>>> # read pdf page from pdf_path
|
||||
>>> # convert pdf page to image
|
||||
>>> # write image to png_path
|
||||
>>> pass
|
||||
>>>
|
||||
>>> tables = camelot.read_pdf(filename, backend=ConversionBackend())
|
||||
|
||||
.. note:: If image conversion using ``pdftopng`` fails, Camelot falls back to ``ghostscript`` to try image conversion again, and if that fails, it raises an error.
|
||||
|
||||
In case you want to be explicit about the image conversion backend that Camelot should use, you can supply them like this::
|
||||
|
||||
>>> from camelot.backends.poppler_backend import PopplerBackend
|
||||
>>> from camelot.backends.ghostscript_backend import GhostscriptBackend
|
||||
>>>
|
||||
>>> tables = camelot.read_pdf(filename, backend=PopplerBackend())
|
||||
>>> tables = camelot.read_pdf(filename, backend=GhostscriptBackend())
|
||||
|
|
|
|||
|
|
@ -54,3 +54,17 @@ For more details, check out this code snippet from `@anakin87 <https://github.co
|
|||
pages_string = str(chunk).replace("[", "").replace("]", "")
|
||||
tables = camelot.read_pdf(filepath, pages=pages_string, **params)
|
||||
tables.export(f"{export_path}/tables.csv")
|
||||
|
||||
How can I supply my own image conversion backend to Lattice?
|
||||
------------------------------------------------------------
|
||||
|
||||
When using the :ref:`Lattice <lattice>` flavor, you can supply your own :ref:`image conversion backend <image-conversion-backend>` by creating a class with a ``convert`` method as follows::
|
||||
|
||||
>>> class ConversionBackend(object):
|
||||
>>> def convert(pdf_path, png_path):
|
||||
>>> # read pdf page from pdf_path
|
||||
>>> # convert pdf page to image
|
||||
>>> # write image to png_path
|
||||
>>> pass
|
||||
>>>
|
||||
>>> tables = camelot.read_pdf(filename, backend=ConversionBackend())
|
||||
|
|
|
|||
Loading…
Reference in New Issue