.. _quickstart: Quickstart ========== In a hurry to extract tables from PDFs? This document gives a good introduction to help you get started with Camelot. Read the PDF ------------ Reading a PDF to extract tables with Camelot is very simple. Begin by importing the Camelot module:: >>> import camelot Now, let's try to read a PDF. (You can check out the PDF used in this example `here`_.) Since the PDF has a table with clearly demarcated lines, we will use the :ref:`Lattice ` method here. To do that, we will set the ``mesh`` keyword argument to ``True``. .. note:: :ref:`Lattice ` is used by default. You can use :ref:`Stream ` with ``flavor='stream'``. .. _here: ../_static/pdf/foo.pdf :: >>> tables = camelot.read_pdf('foo.pdf') >>> tables Now, we have a :class:`TableList ` object called ``tables``, which is a list of :class:`Table ` objects. We can get everything we need from this object. We can access each table using its index. From the code snippet above, we can see that the ``tables`` object has only one table, since ``n=1``. Let's access the table using the index ``0`` and take a look at its ``shape``. :: >>> tables[0] Let's print the parsing report. :: >>> print tables[0].parsing_report { 'accuracy': 99.02, 'whitespace': 12.24, 'order': 1, 'page': 1 } Woah! The accuracy is top-notch and there is less whitespace, which means the table was most likely extracted correctly. You can access the table as a pandas DataFrame by using the :class:`table ` object's ``df`` property. :: >>> tables[0].df .. csv-table:: :file: ../_static/csv/foo.csv Looks good! You can now export the table as a CSV file using its :meth:`to_csv() ` method. Alternatively you can use :meth:`to_json() `, :meth:`to_excel() ` or :meth:`to_html() ` methods to export the table as JSON, Excel and HTML files respectively. :: >>> tables[0].to_csv('foo.csv') This will export the table as a CSV file at the path specified. In this case, it is ``foo.csv`` in the current directory. You can also export all tables at once, using the :class:`tables ` object's :meth:`export() ` method. :: >>> tables.export('foo.csv', f='csv') This will export all tables as CSV files at the path specified. Alternatively, you can use ``f='json'``, ``f='excel'`` or ``f='html'``. .. note:: The :meth:`export() ` method exports files with a ``page-*-table-*`` suffix. In the example above, the single table in the list will be exported to ``foo-page-1-table-1.csv``. If the list contains multiple tables, multiple CSV files will be created. To avoid filling up your path with multiple files, you can use ``compress=True``, which will create a single ZIP file at your path with all the CSV files. .. note:: Camelot handles rotated PDF pages automatically. As an exercise, try to extract the table out of `this PDF`_. .. _this PDF: ../_static/pdf/rotated.pdf Specify page numbers -------------------- By default, Camelot only uses the first page of the PDF to extract tables. To specify multiple pages, you can use the ``pages`` keyword argument:: >>> camelot.read_pdf('your.pdf', pages='1,2,3') The ``pages`` keyword argument accepts pages as comma-separated string of page numbers. You can also specify page ranges — for example, ``pages=1,4-10,20-30`` or ``pages=1,4-10,20-end``. ------------------------ Ready for more? Check out the :ref:`advanced ` section.