Merge pull request #146 from eamanu/Add_usage_examples_in_the_cli_documentation
[MRG + 1] Add CLI usage examplespull/2/head
commit
d918293fea
|
|
@ -24,6 +24,12 @@ To process background lines, you can pass ``process_background=True``.
|
||||||
>>> tables = camelot.read_pdf('background_lines.pdf', process_background=True)
|
>>> tables = camelot.read_pdf('background_lines.pdf', process_background=True)
|
||||||
>>> tables[1].df
|
>>> tables[1].df
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -back background_lines.pdf
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
:file: ../_static/csv/background_lines.csv
|
:file: ../_static/csv/background_lines.csv
|
||||||
|
|
||||||
|
|
@ -63,6 +69,12 @@ Let's plot all the text present on the table's PDF page.
|
||||||
>>> camelot.plot(tables[0], kind='text')
|
>>> camelot.plot(tables[0], kind='text')
|
||||||
>>> plt.show()
|
>>> plt.show()
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -plot text foo.pdf
|
||||||
|
|
||||||
.. figure:: ../_static/png/plot_text.png
|
.. figure:: ../_static/png/plot_text.png
|
||||||
:height: 674
|
:height: 674
|
||||||
:width: 1366
|
:width: 1366
|
||||||
|
|
@ -84,6 +96,12 @@ Let's plot the table (to see if it was detected correctly or not). This plot typ
|
||||||
>>> camelot.plot(tables[0], kind='grid')
|
>>> camelot.plot(tables[0], kind='grid')
|
||||||
>>> plt.show()
|
>>> plt.show()
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -plot grid foo.pdf
|
||||||
|
|
||||||
.. figure:: ../_static/png/plot_table.png
|
.. figure:: ../_static/png/plot_table.png
|
||||||
:height: 674
|
:height: 674
|
||||||
:width: 1366
|
:width: 1366
|
||||||
|
|
@ -103,6 +121,12 @@ Now, let's plot all table boundaries present on the table's PDF page.
|
||||||
>>> camelot.plot(tables[0], kind='contour')
|
>>> camelot.plot(tables[0], kind='contour')
|
||||||
>>> plt.show()
|
>>> plt.show()
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -plot contour foo.pdf
|
||||||
|
|
||||||
.. figure:: ../_static/png/plot_contour.png
|
.. figure:: ../_static/png/plot_contour.png
|
||||||
:height: 674
|
:height: 674
|
||||||
:width: 1366
|
:width: 1366
|
||||||
|
|
@ -120,6 +144,12 @@ Cool, let's plot all line segments present on the table's PDF page.
|
||||||
>>> camelot.plot(tables[0], kind='line')
|
>>> camelot.plot(tables[0], kind='line')
|
||||||
>>> plt.show()
|
>>> plt.show()
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -plot line foo.pdf
|
||||||
|
|
||||||
.. figure:: ../_static/png/plot_line.png
|
.. figure:: ../_static/png/plot_line.png
|
||||||
:height: 674
|
:height: 674
|
||||||
:width: 1366
|
:width: 1366
|
||||||
|
|
@ -137,6 +167,12 @@ Finally, let's plot all line intersections present on the table's PDF page.
|
||||||
>>> camelot.plot(tables[0], kind='joint')
|
>>> camelot.plot(tables[0], kind='joint')
|
||||||
>>> plt.show()
|
>>> plt.show()
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -plot joint foo.pdf
|
||||||
|
|
||||||
.. figure:: ../_static/png/plot_joint.png
|
.. figure:: ../_static/png/plot_joint.png
|
||||||
:height: 674
|
:height: 674
|
||||||
:width: 1366
|
:width: 1366
|
||||||
|
|
@ -154,6 +190,12 @@ You can also visualize the textedges found on a page by specifying ``kind='texte
|
||||||
>>> camelot.plot(tables[0], kind='textedge')
|
>>> camelot.plot(tables[0], kind='textedge')
|
||||||
>>> plt.show()
|
>>> plt.show()
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot stream -plot textedge foo.pdf
|
||||||
|
|
||||||
.. figure:: ../_static/png/plot_textedge.png
|
.. figure:: ../_static/png/plot_textedge.png
|
||||||
:height: 674
|
:height: 674
|
||||||
:width: 1366
|
:width: 1366
|
||||||
|
|
@ -175,6 +217,12 @@ Table areas that you want Camelot to analyze can be passed as a list of comma-se
|
||||||
>>> tables = camelot.read_pdf('table_areas.pdf', flavor='stream', table_areas=['316,499,566,337'])
|
>>> tables = camelot.read_pdf('table_areas.pdf', flavor='stream', table_areas=['316,499,566,337'])
|
||||||
>>> tables[0].df
|
>>> tables[0].df
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot stream -T 316,499,566,337 table_areas.pdf
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
:file: ../_static/csv/table_areas.csv
|
:file: ../_static/csv/table_areas.csv
|
||||||
|
|
||||||
|
|
@ -196,6 +244,12 @@ Let's get back to the *x* coordinates we got from plotting the text that exists
|
||||||
>>> tables = camelot.read_pdf('column_separators.pdf', flavor='stream', columns=['72,95,209,327,442,529,566,606,683'])
|
>>> tables = camelot.read_pdf('column_separators.pdf', flavor='stream', columns=['72,95,209,327,442,529,566,606,683'])
|
||||||
>>> tables[0].df
|
>>> tables[0].df
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot stream -C 72,95,209,327,442,529,566,606,683 column_separators.pdf
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
|
|
||||||
"...","...","...","...","...","...","...","...","...","..."
|
"...","...","...","...","...","...","...","...","...","..."
|
||||||
|
|
@ -215,6 +269,12 @@ To deal with cases like the output from the previous section, you can pass ``spl
|
||||||
>>> tables = camelot.read_pdf('column_separators.pdf', flavor='stream', columns=['72,95,209,327,442,529,566,606,683'], split_text=True)
|
>>> tables = camelot.read_pdf('column_separators.pdf', flavor='stream', columns=['72,95,209,327,442,529,566,606,683'], split_text=True)
|
||||||
>>> tables[0].df
|
>>> tables[0].df
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot -split stream -C 72,95,209,327,442,529,566,606,683 column_separators.pdf
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
|
|
||||||
"...","...","...","...","...","...","...","...","...","..."
|
"...","...","...","...","...","...","...","...","...","..."
|
||||||
|
|
@ -242,6 +302,12 @@ You can solve this by passing ``flag_size=True``, which will enclose the supersc
|
||||||
>>> tables = camelot.read_pdf('superscript.pdf', flavor='stream', flag_size=True)
|
>>> tables = camelot.read_pdf('superscript.pdf', flavor='stream', flag_size=True)
|
||||||
>>> tables[0].df
|
>>> tables[0].df
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot -flag stream superscript.pdf
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
|
|
||||||
"...","...","...","...","...","...","...","...","...","...","..."
|
"...","...","...","...","...","...","...","...","...","...","..."
|
||||||
|
|
@ -274,6 +340,12 @@ You can pass ``row_close_tol=<+int>`` to group the rows closer together, as show
|
||||||
>>> tables = camelot.read_pdf('group_rows.pdf', flavor='stream', row_close_tol=10)
|
>>> tables = camelot.read_pdf('group_rows.pdf', flavor='stream', row_close_tol=10)
|
||||||
>>> tables[0].df
|
>>> tables[0].df
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot stream -r 10 group_rows.pdf
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
|
|
||||||
"Clave","Nombre Entidad","Clave","","Nombre Municipio","Clave","Nombre Localidad"
|
"Clave","Nombre Entidad","Clave","","Nombre Municipio","Clave","Nombre Localidad"
|
||||||
|
|
@ -317,6 +389,12 @@ Clearly, the smaller lines separating the headers, couldn't be detected. Let's t
|
||||||
>>> camelot.plot(tables[0], kind='grid')
|
>>> camelot.plot(tables[0], kind='grid')
|
||||||
>>> plt.show()
|
>>> plt.show()
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -scale 40 -plot grid short_lines.pdf
|
||||||
|
|
||||||
.. figure:: ../_static/png/short_lines_2.png
|
.. figure:: ../_static/png/short_lines_2.png
|
||||||
:alt: An improved plot of the PDF table with short lines
|
:alt: An improved plot of the PDF table with short lines
|
||||||
:align: left
|
:align: left
|
||||||
|
|
@ -380,6 +458,12 @@ No surprises there — it did remain in place (observe the strings "2400" and "A
|
||||||
>>> tables = camelot.read_pdf('short_lines.pdf', line_size_scaling=40, shift_text=['r', 'b'])
|
>>> tables = camelot.read_pdf('short_lines.pdf', line_size_scaling=40, shift_text=['r', 'b'])
|
||||||
>>> tables[0].df
|
>>> tables[0].df
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -scale 40 -shift r -shift b short_lines.pdf
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
|
|
||||||
"Investigations","No. ofHHs","Age/Sex/Physiological Group","Preva-lence","C.I*","RelativePrecision","Sample sizeper State"
|
"Investigations","No. ofHHs","Age/Sex/Physiological Group","Preva-lence","C.I*","RelativePrecision","Sample sizeper State"
|
||||||
|
|
@ -425,6 +509,12 @@ We don't need anything else. Now, let's pass ``copy_text=['v']`` to copy text in
|
||||||
>>> tables = camelot.read_pdf('copy_text.pdf', copy_text=['v'])
|
>>> tables = camelot.read_pdf('copy_text.pdf', copy_text=['v'])
|
||||||
>>> tables[0].df
|
>>> tables[0].df
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot lattice -copy v copy_text.pdf
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
|
|
||||||
"Sl. No.","Name of State/UT","Name of District","Disease/ Illness","No. of Cases","No. of Deaths","Date of start of outbreak","Date of reporting","Current Status","..."
|
"Sl. No.","Name of State/UT","Name of District","Disease/ Illness","No. of Cases","No. of Deaths","Date of start of outbreak","Date of reporting","Current Status","..."
|
||||||
|
|
|
||||||
|
|
@ -15,7 +15,7 @@ You can print the help for the interface by typing ``camelot --help`` in your fa
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
--version Show the version and exit.
|
--version Show the version and exit.
|
||||||
-v, --verbose Verbose.
|
-q, --quiet TEXT Suppress logs and warnings.
|
||||||
-p, --pages TEXT Comma-separated page numbers. Example: 1,3,4
|
-p, --pages TEXT Comma-separated page numbers. Example: 1,3,4
|
||||||
or 1,4-end.
|
or 1,4-end.
|
||||||
-pw, --password TEXT Password for decryption.
|
-pw, --password TEXT Password for decryption.
|
||||||
|
|
|
||||||
|
|
@ -70,6 +70,12 @@ You can also export all tables at once, using the :class:`tables <camelot.core.T
|
||||||
|
|
||||||
>>> tables.export('foo.csv', f='csv')
|
>>> tables.export('foo.csv', f='csv')
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot --format csv --output foo.csv lattice foo.pdf
|
||||||
|
|
||||||
This will export all tables as CSV files at the path specified. Alternatively, you can use ``f='json'``, ``f='excel'`` or ``f='html'``.
|
This will export all tables as CSV files at the path specified. Alternatively, you can use ``f='json'``, ``f='excel'`` or ``f='html'``.
|
||||||
|
|
||||||
.. note:: The :meth:`export() <camelot.core.TableList.export>` method exports files with a ``page-*-table-*`` suffix. In the example above, the single table in the list will be exported to ``foo-page-1-table-1.csv``. If the list contains multiple tables, multiple CSV files will be created. To avoid filling up your path with multiple files, you can use ``compress=True``, which will create a single ZIP file at your path with all the CSV files.
|
.. note:: The :meth:`export() <camelot.core.TableList.export>` method exports files with a ``page-*-table-*`` suffix. In the example above, the single table in the list will be exported to ``foo-page-1-table-1.csv``. If the list contains multiple tables, multiple CSV files will be created. To avoid filling up your path with multiple files, you can use ``compress=True``, which will create a single ZIP file at your path with all the CSV files.
|
||||||
|
|
@ -85,6 +91,12 @@ By default, Camelot only uses the first page of the PDF to extract tables. To sp
|
||||||
|
|
||||||
>>> camelot.read_pdf('your.pdf', pages='1,2,3')
|
>>> camelot.read_pdf('your.pdf', pages='1,2,3')
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot --pages 1,2,3 lattice your.pdf
|
||||||
|
|
||||||
The ``pages`` keyword argument accepts pages as comma-separated string of page numbers. You can also specify page ranges — for example, ``pages=1,4-10,20-30`` or ``pages=1,4-10,20-end``.
|
The ``pages`` keyword argument accepts pages as comma-separated string of page numbers. You can also specify page ranges — for example, ``pages=1,4-10,20-30`` or ``pages=1,4-10,20-end``.
|
||||||
|
|
||||||
Reading encrypted PDFs
|
Reading encrypted PDFs
|
||||||
|
|
@ -98,6 +110,12 @@ To extract tables from encrypted PDF files you must provide a password when call
|
||||||
>>> tables
|
>>> tables
|
||||||
<TableList n=1>
|
<TableList n=1>
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
Here's how you can do the same with the :ref:`command-line interface <cli>`.
|
||||||
|
::
|
||||||
|
|
||||||
|
$ camelot --password userpass lattice foo.pdf
|
||||||
|
|
||||||
Currently Camelot only supports PDFs encrypted with ASCII passwords and algorithm `code 1 or 2`_. An exception is thrown if the PDF cannot be read. This may be due to no password being provided, an incorrect password, or an unsupported encryption algorithm.
|
Currently Camelot only supports PDFs encrypted with ASCII passwords and algorithm `code 1 or 2`_. An exception is thrown if the PDF cannot be read. This may be due to no password being provided, an incorrect password, or an unsupported encryption algorithm.
|
||||||
|
|
||||||
Further encryption support may be added in future, however in the meantime if your PDF files are using unsupported encryption algorithms you are advised to remove encryption before calling :meth:`read_pdf() <camelot.read_pdf>`. This can been successfully achieved with third-party tools such as `QPDF`_.
|
Further encryption support may be added in future, however in the meantime if your PDF files are using unsupported encryption algorithms you are advised to remove encryption before calling :meth:`read_pdf() <camelot.read_pdf>`. This can been successfully achieved with third-party tools such as `QPDF`_.
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue