Merge pull request #2 from camelot-dev/master

Update fork
2020-12-08 18:37:55 +01:00 · 2020-12-08 18:37:55 +01:00 · 644e17edec
parent eadc54ad25 7709e58d64
commit 644e17edec
18 changed files with 358 additions and 190 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -0,0 +1,48 @@
 ---
 name: Bug report
 about: Please follow this template to submit bug reports.
 title: ''
 labels: bug
 assignees: ''
 ---
 <!-- Please read the filing issues section of the contributor's guide first: https://camelot-py.readthedocs.io/en/master/dev/contributing.html -->
 **Describe the bug**
 A clear and concise description of what the bug is.
 **Steps to reproduce the bug**
 Steps used to install `camelot`:
 1. Add step here (you can add more steps too)
 Steps to reproduce the behavior:
 1. Add step here (you can add more steps too)
 **Expected behavior**
 A clear and concise description of what you expected to happen.
 **Code**
 Add the Camelot code snippet that you used.
 ```
 import camelot
 # add your code here
 ```
 **PDF**
 Add the PDF file that you want to extract tables from.
 **Screenshots**
 If applicable, add screenshots to help explain your problem.
 **Environment**
 - OS: [e.g. MacOS]
 - Python version:
 - Numpy version:
 - OpenCV version:
 - Ghostscript version:
 - Camelot version:
 **Additional context**
 Add any other context about the problem here.
--- a/9
+++ b/9
@ -1,12 +1,7 @@
 MIT License
-Modifications:
+Copyright (c) 2019-2020 Camelot Developers
-
+Copyright (c) 2018-2019 Peeply Private Ltd (Singapore)
 Copyright (c) 2019 Camelot Developers
 Original project:
 Copyright (c) 2018 Peeply Private Ltd (Singapore)
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/README.md
+++ b/README.md
@ -10,13 +10,13 @@
 [![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) [![image](https://img.shields.io/badge/continous%20quality-deepsource-lightgrey)](https://deepsource.io/gh/camelot-dev/camelot/?ref=repository-badge)
-**Camelot** is a Python library that makes it easy for *anyone* to extract tables from PDF files!
+**Camelot** is a Python library that can help you extract tables from PDFs!
-**Note:** You can also check out [Excalibur](https://github.com/camelot-dev/excalibur), which is a web interface for Camelot!
+**Note:** You can also check out [Excalibur](https://github.com/camelot-dev/excalibur), the web interface to Camelot!
 ---
-**Here's how you can extract tables from PDF files.** Check out the PDF used in this example [here](https://github.com/camelot-dev/camelot/blob/master/docs/_static/pdf/foo.pdf).
+**Here's how you can extract tables from PDFs.** You can check out the PDF used in this example [here](https://github.com/camelot-dev/camelot/blob/master/docs/_static/pdf/foo.pdf).
 <pre>
 >>> import camelot
@ -46,24 +46,27 @@
 | 2032_2     | 0.17      | 57.8          | 21.7%                | 0.3%            | 2.7%            | 1.2%           |
 | 4171_1     | 0.07      | 173.9         | 58.1%                | 1.6%            | 2.1%            | 0.5%           |
-There's a [command-line interface](https://camelot-py.readthedocs.io/en/master/user/cli.html) too!
+Camelot also comes packaged with a [command-line interface](https://camelot-py.readthedocs.io/en/master/user/cli.html)!
 **Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
 ## Why Camelot?
- **You are in control.**: Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
+- **Configurability**: Camelot gives you control over the table extraction process with its [tweakable settings](https://camelot-py.readthedocs.io/en/master/user/advanced.html).
- *Bad* tables can be discarded based on **metrics** like accuracy and whitespace, without ever having to manually look at each table.
+- **Metrics**: Bad tables can be discarded based on metrics like accuracy and whitespace, without having to manually look at each table.
- Each table is a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873).
+- **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873). You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML and Sqlite.
 - **Export** to multiple formats, including JSON, Excel, HTML and Sqlite.
-See [comparison with other PDF table extraction libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).
+See [comparison with similar libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).
 ## Support the development
 If Camelot has helped you, please consider supporting its development with a one-time or monthly donation [on OpenCollective](https://opencollective.com/camelot).
 ## Installation
 ### Using conda
-The easiest way to install Camelot is to install it with [conda](https://conda.io/docs/), which is a package manager and  environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution.
+The easiest way to install Camelot is with [conda](https://conda.io/docs/), which is a package manager and environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution.
 <pre>
 $ conda install -c conda-forge camelot-py
@ -71,7 +74,7 @@ $ conda install -c conda-forge camelot-py
 ### Using pip
-After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install-deps.html) ([tk](https://packages.ubuntu.com/bionic/python/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can simply use pip to install Camelot:
+After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install-deps.html) ([tk](https://packages.ubuntu.com/bionic/python/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can also just use pip to install Camelot:
 <pre>
 $ pip install "camelot-py[cv]"
@ -94,40 +97,16 @@ $ pip install ".[cv]"
 ## Documentation
-Great documentation is available at [http://camelot-py.readthedocs.io/](http://camelot-py.readthedocs.io/).
+The documentation is available at [http://camelot-py.readthedocs.io/](http://camelot-py.readthedocs.io/).
 ## Development
 The [Contributor's Guide](https://camelot-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.
 ### Source code
 You can check the latest sources with:
 <pre>
 $ git clone https://www.github.com/camelot-dev/camelot
 </pre>
 ### Setting up a development environment
 You can install the development dependencies easily, using pip:
 <pre>
 $ pip install "camelot-py[dev]"
 </pre>
 ### Testing
 After installation, you can run tests using:
 <pre>
 $ python setup.py test
 </pre>
 ## Wrappers
 - [camelot-php](https://github.com/randomstate/camelot-php) provides a [PHP](https://www.php.net/) wrapper on Camelot.
 ## Contributing
 The [Contributor's Guide](https://camelot-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing issues, documentation, code, and tests.
 ## Versioning
 Camelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/camelot-dev/camelot/blob/master/HISTORY.md).
@ -135,9 +114,3 @@ Camelot uses [Semantic Versioning](https://semver.org/). For the available versi
 ## License
 This project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/camelot/blob/master/LICENSE) file for details.
 ## Support the development
 You can support our work on Camelot with a one-time or monthly donation [on OpenCollective](https://opencollective.com/camelot). Organizations who use camelot can also sponsor the project for an acknowledgement on [our documentation site](https://camelot-py.readthedocs.io/en/master/) and this README.
 Special thanks to all the users, organizations and contributors that support Camelot!
--- a/camelot/handlers.py
+++ b/camelot/handlers.py
@ -70,7 +70,8 @@ class PDFHandler(object):
        if pages == "1":
            page_numbers.append({"start": 1, "end": 1})
        else:
-            infile = PdfFileReader(open(filepath, "rb"), strict=False)
+            instream = open(filepath, "rb")
            infile = PdfFileReader(instream, strict=False)
            if infile.isEncrypted:
                infile.decrypt(self.password)
            if pages == "all":
@ -84,6 +85,7 @@ class PDFHandler(object):
                        page_numbers.append({"start": int(a), "end": int(b)})
                    else:
                        page_numbers.append({"start": int(r), "end": int(r)})
            instream.close()
        P = []
        for p in page_numbers:
            P.extend(range(p["start"], p["end"] + 1))
@ -122,7 +124,8 @@ class PDFHandler(object):
            if rotation != "":
                fpath_new = "".join([froot.replace("page", "p"), "_rotated", fext])
                os.rename(fpath, fpath_new)
-                infile = PdfFileReader(open(fpath_new, "rb"), strict=False)
+                instream = open(fpath_new, "rb")
                infile = PdfFileReader(instream, strict=False)
                if infile.isEncrypted:
                    infile.decrypt(self.password)
                outfile = PdfFileWriter()
@ -134,6 +137,7 @@ class PDFHandler(object):
                outfile.addPage(p)
                with open(fpath, "wb") as f:
                    outfile.write(f)
                instream.close()
    def parse(
        self, flavor="lattice", suppress_stdout=False, layout_kwargs={}, **kwargs
--- a/camelot/parsers/stream.py
+++ b/camelot/parsers/stream.py
@ -121,6 +121,7 @@ class Stream(BaseParser):
        row_y = 0
        rows = []
        temp = []
        for t in text:
            # is checking for upright necessary?
            # if t.get_text().strip() and all([obj.upright for obj in t._objs if
@ -131,8 +132,10 @@ class Stream(BaseParser):
                    temp = []
                    row_y = t.y0
                temp.append(t)
        rows.append(sorted(temp, key=lambda t: t.x0))
-        __ = rows.pop(0)  # TODO: hacky
+        if len(rows) > 1:
            __ = rows.pop(0)  # TODO: hacky
        return rows
    @staticmethod
@ -345,43 +348,46 @@ class Stream(BaseParser):
        else:
            # calculate mode of the list of number of elements in
            # each row to guess the number of columns
-            ncols = max(set(elements), key=elements.count)
+            if not len(elements):
-            if ncols == 1:
+                cols = [(text_x_min, text_x_max)]
-                # if mode is 1, the page usually contains not tables
+            else:
-                # but there can be cases where the list can be skewed,
+                ncols = max(set(elements), key=elements.count)
-                # try to remove all 1s from list in this case and
+                if ncols == 1:
-                # see if the list contains elements, if yes, then use
+                    # if mode is 1, the page usually contains not tables
-                # the mode after removing 1s
+                    # but there can be cases where the list can be skewed,
-                elements = list(filter(lambda x: x != 1, elements))
+                    # try to remove all 1s from list in this case and
-                if len(elements):
+                    # see if the list contains elements, if yes, then use
-                    ncols = max(set(elements), key=elements.count)
+                    # the mode after removing 1s
-                else:
+                    elements = list(filter(lambda x: x != 1, elements))
-                    warnings.warn(
+                    if len(elements):
-                        f"No tables found in table area {table_idx + 1}"
+                        ncols = max(set(elements), key=elements.count)
                    else:
                        warnings.warn(
                            f"No tables found in table area {table_idx + 1}"
                        )
                cols = [(t.x0, t.x1) for r in rows_grouped if len(r) == ncols for t in r]
                cols = self._merge_columns(sorted(cols), column_tol=self.column_tol)
                inner_text = []
                for i in range(1, len(cols)):
                    left = cols[i - 1][1]
                    right = cols[i][0]
                    inner_text.extend(
                        [
                            t
                            for direction in self.t_bbox
                            for t in self.t_bbox[direction]
                            if t.x0 > left and t.x1 < right
                        ]
                    )
-            cols = [(t.x0, t.x1) for r in rows_grouped if len(r) == ncols for t in r]
+                outer_text = [
-            cols = self._merge_columns(sorted(cols), column_tol=self.column_tol)
+                    t
-            inner_text = []
+                    for direction in self.t_bbox
-            for i in range(1, len(cols)):
+                    for t in self.t_bbox[direction]
-                left = cols[i - 1][1]
+                    if t.x0 > cols[-1][1] or t.x1 < cols[0][0]
-                right = cols[i][0]
+                ]
-                inner_text.extend(
+                inner_text.extend(outer_text)
-                    [
+                cols = self._add_columns(cols, inner_text, self.row_tol)
-                        t
+                cols = self._join_columns(cols, text_x_min, text_x_max)
                        for direction in self.t_bbox
                        for t in self.t_bbox[direction]
                        if t.x0 > left and t.x1 < right
                    ]
                )
            outer_text = [
                t
                for direction in self.t_bbox
                for t in self.t_bbox[direction]
                if t.x0 > cols[-1][1] or t.x1 < cols[0][0]
            ]
            inner_text.extend(outer_text)
            cols = self._add_columns(cols, inner_text, self.row_tol)
            cols = self._join_columns(cols, text_x_min, text_x_max)
        return cols, rows
--- a/camelot/utils.py
+++ b/camelot/utils.py
@ -353,7 +353,7 @@ def text_in_bbox(bbox, text):
    Returns
    -------
    t_bbox : list
-        List of PDFMiner text objects that lie inside table.
+        List of PDFMiner text objects that lie inside table, discarding the overlapping ones
    """
    lb = (bbox[0], bbox[1])
@ -364,7 +364,97 @@ def text_in_bbox(bbox, text):
        if lb[0] - 2 <= (t.x0 + t.x1) / 2.0 <= rt[0] + 2
        and lb[1] - 2 <= (t.y0 + t.y1) / 2.0 <= rt[1] + 2
    ]
-    return t_bbox
+
    # Avoid duplicate text by discarding overlapping boxes
    rest = {t for t in t_bbox}
    for ba in t_bbox:
        for bb in rest.copy():
            if ba == bb:
                continue
            if bbox_intersect(ba, bb):
                # if the intersection is larger than 80% of ba's size, we keep the longest
                if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
                    if bbox_longer(bb, ba):
                        rest.discard(ba)
    unique_boxes = list(rest)
    return unique_boxes
 def bbox_intersection_area(ba, bb) -> float:
    """Returns area of the intersection of the bounding boxes of two PDFMiner objects.
    Parameters
    ----------
    ba : PDFMiner text object
    bb : PDFMiner text object
    Returns
    -------
    intersection_area : float
        Area of the intersection of the bounding boxes of both objects
    """
    x_left = max(ba.x0, bb.x0)
    y_top = min(ba.y1, bb.y1)
    x_right = min(ba.x1, bb.x1)
    y_bottom = max(ba.y0, bb.y0)
    if x_right < x_left or y_bottom > y_top:
        return 0.0
    intersection_area = (x_right - x_left) * (y_top - y_bottom)
    return intersection_area
 def bbox_area(bb) -> float:
    """Returns area of the bounding box of a PDFMiner object.
    Parameters
    ----------
    bb : PDFMiner text object
    Returns
    -------
    area : float
        Area of the bounding box of the object
    """
    return (bb.x1 - bb.x0) * (bb.y1 - bb.y0)
 def bbox_intersect(ba, bb) -> bool:
    """Returns True if the bounding boxes of two PDFMiner objects intersect.
    Parameters
    ----------
    ba : PDFMiner text object
    bb : PDFMiner text object
    Returns
    -------
    overlaps : bool
        True if the bounding boxes intersect
    """
    return ba.x1 >= bb.x0 and bb.x1 >= ba.x0 and ba.y1 >= bb.y0 and bb.y1 >= ba.y0
 def bbox_longer(ba, bb) -> bool:
    """Returns True if the bounding box of the first PDFMiner object is longer or equal to the second.
    Parameters
    ----------
    ba : PDFMiner text object
    bb : PDFMiner text object
    Returns
    -------
    longer : bool
        True if the bounding box of the first object is longer or equal
    """
    return (ba.x1 - ba.x0) >= (bb.x1 - bb.x0)
 def merge_close_lines(ar, line_tol=2):
@ -411,7 +501,7 @@ def text_strip(text, strip=""):
        return text
    stripped = re.sub(
-        fr"[{''.join(map(re.escape, strip))}]", "", text, re.UNICODE
+        fr"[{''.join(map(re.escape, strip))}]", "", text, flags=re.UNICODE
    )
    return stripped
--- a/docs/conf.py
+++ b/docs/conf.py
@ -63,7 +63,7 @@ master_doc = 'index'
 # General information about the project.
 project = u'Camelot'
-copyright = u'2019, Camelot Developers'
+copyright = u'2020, Camelot Developers'
 author = u'Vinayak Mehta'
 # The version info for the project you're documenting, acts as replacement for
--- a/docs/index.rst
+++ b/docs/index.rst
@ -36,15 +36,15 @@ Release v\ |version|. (:ref:`Installation <install>`)
 .. image:: https://img.shields.io/badge/continous%20quality-deepsource-lightgrey
    :target: https://deepsource.io/gh/camelot-dev/camelot/?ref=repository-badge
-**Camelot** is a Python library that makes it easy for *anyone* to extract tables from PDF files!
+**Camelot** is a Python library that can help you extract tables from PDFs!
-.. note:: You can also check out `Excalibur`_, which is a web interface for Camelot!
+.. note:: You can also check out `Excalibur`_, the web interface to Camelot!
 .. _Excalibur: https://github.com/camelot-dev/excalibur
 ----
-**Here's how you can extract tables from PDF files.** Check out the PDF used in this example `here`_.
+**Here's how you can extract tables from PDFs.** You can check out the PDF used in this example `here`_.
 .. _here: _static/pdf/foo.pdf
@ -70,7 +70,7 @@ Release v\ |version|. (:ref:`Installation <install>`)
 .. csv-table::
  :file: _static/csv/foo.csv
-There's a :ref:`command-line interface <cli>` too!
+Camelot also comes packaged with a :ref:`command-line interface <cli>`!
 .. note:: Camelot only works with text-based PDFs and not scanned documents. (As Tabula `explains`_, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
@ -79,27 +79,27 @@ There's a :ref:`command-line interface <cli>` too!
 Why Camelot?
 ------------
- **You are in control.** Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)
+- **Configurability**: Camelot gives you control over the table extraction process with its :ref:`tweakable settings <advanced>`.
- *Bad* tables can be discarded based on **metrics** like accuracy and whitespace, without ever having to manually look at each table.
+- **Metrics**: Bad tables can be discarded based on metrics like accuracy and whitespace, without having to manually look at each table.
- Each table is a **pandas DataFrame**, which seamlessly integrates into `ETL and data analysis workflows`_.
+- **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into `ETL and data analysis workflows`_. You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML and Sqlite.
 - **Export** to multiple formats, including JSON, Excel and HTML.
 See `comparison with other PDF table extraction libraries and tools`_.
 .. _ETL and data analysis workflows: https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873
 .. _comparison with other PDF table extraction libraries and tools: https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
-Support us on OpenCollective
+See `comparison with similar libraries and tools`_.
 ----------------------------
-If Camelot helped you extract tables from PDFs, please consider supporting its development by `becoming a backer or a sponsor on OpenCollective`_!
+.. _comparison with similar libraries and tools: https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
-.. _becoming a backer or a sponsor on OpenCollective: https://opencollective.com/camelot
+Support the development
 -----------------------
 If Camelot has helped you, please consider supporting its development with a one-time or monthly donation `on OpenCollective`_!
 .. _on OpenCollective: https://opencollective.com/camelot
 The User Guide
 --------------
-This part of the documentation begins with some background information about why Camelot was created, takes a small dip into the implementation details and then focuses on step-by-step instructions for getting the most out of Camelot.
+This part of the documentation begins with some background information about why Camelot was created, takes you through some implementation details, and then focuses on step-by-step instructions for getting the most out of Camelot.
 .. toctree::
   :maxdepth: 2
@ -115,8 +115,7 @@ This part of the documentation begins with some background information about why
 The API Documentation/Guide
 ---------------------------
-If you are looking for information on a specific function, class, or method,
+If you are looking for information on a specific function, class, or method, this part of the documentation is for you.
 this part of the documentation is for you.
 .. toctree::
   :maxdepth: 2
@ -126,8 +125,7 @@ this part of the documentation is for you.
 The Contributor Guide
 ---------------------
-If you want to contribute to the project, this part of the documentation is for
+If you want to contribute to the project, this part of the documentation is for you.
 you.
 .. toctree::
   :maxdepth: 2
--- a/docs/user/install-deps.rst
+++ b/docs/user/install-deps.rst
@ -3,72 +3,59 @@
 Installation of dependencies
 ============================
-The dependencies `Tkinter`_ and `ghostscript`_ can be installed using your system's package manager. You can run one of the following, based on your OS.
+The dependencies `Ghostscript <https://www.ghostscript.com>`_ and `Tkinter <https://wiki.python.org/moin/TkInter>`_ can be installed using your system's package manager or by running their installer.
 .. _Tkinter: https://wiki.python.org/moin/TkInter
 .. _ghostscript: https://www.ghostscript.com
 OS-specific instructions
 ------------------------
-For Ubuntu
+Ubuntu
-^^^^^^^^^^
+^^^^^^
 ::
-    $ apt install python-tk ghostscript
+    $ apt install ghostscript python3-tk
-Or for Python 3::
+MacOS
-
+^^^^^
    $ apt install python3-tk ghostscript
 For macOS
 ^^^^^^^^^
 ::
-    $ brew install tcl-tk ghostscript
+    $ brew install ghostscript tcl-tk
-For Windows
+Windows
-^^^^^^^^^^^
+^^^^^^^
-For Tkinter, you can download the `ActiveTcl Community Edition`_ from ActiveState. For ghostscript, you can get the installer at the `ghostscript downloads page`_.
+For Ghostscript, you can get the installer at their `downloads page <https://www.ghostscript.com/download/gsdnld.html>`_. And for Tkinter, you can download the `ActiveTcl Community Edition <https://www.activestate.com/activetcl/downloads>`_ from ActiveState.
-.. _ActiveTcl Community Edition: https://www.activestate.com/activetcl/downloads
+Checks to see if dependencies are installed correctly
-.. _ghostscript downloads page: https://www.ghostscript.com/download/gsdnld.html
+-----------------------------------------------------
 .. _as shown here: https://java.com/en/download/help/path.xml
-Checks to see if dependencies were installed correctly
+You can run the following checks to see if the dependencies were installed correctly.
 ------------------------------------------------------
-You can do the following checks to see if the dependencies were installed correctly.
+For Ghostscript
 ^^^^^^^^^^^^^^^
 Open the Python REPL and run the following:
 For Ubuntu/MacOS::
    >>> from ctypes.util import find_library
    >>> find_library("gs")
    "libgs.so.9"
 For Windows::
    >>> from ctypes.util import find_library
    >>> find_library("".join(("gsdll", str(ctypes.sizeof(ctypes.c_voidp) * 8), ".dll"))
    <name-of-ghostscript-library-on-windows>
 **Check:** The output of the ``find_library`` function should not be empty.
 If the output is empty, then it's possible that the Ghostscript library is not available one of the ``LD_LIBRARY_PATH``/``DYLD_LIBRARY_PATH``/``PATH`` variables depending on your operating system. In this case, you may have to modify one of those path variables.
 For Tkinter
 ^^^^^^^^^^^
-Launch Python, and then at the prompt, type::
+Launch Python and then import Tkinter::
    >>> import Tkinter
 Or in Python 3::
    >>> import tkinter
-If you have Tkinter, Python will not print an error message, and if not, you will see an ``ImportError``.
+**Check:** Importing ``tkinter`` should not raise an import error.
 For ghostscript
 ^^^^^^^^^^^^^^^
 Run the following to check the ghostscript version.
 For Ubuntu/macOS::
    $ gs -version
 For Windows::
    C:\> gswin64c.exe -version
 Or for Windows 32-bit::
    C:\> gswin32c.exe -version
 If you have ghostscript, you should see the ghostscript version and copyright information.
--- a/docs/user/install.rst
+++ b/docs/user/install.rst
@ -5,42 +5,35 @@ Installation of Camelot
 This part of the documentation covers the steps to install Camelot.
-Using conda
+After :ref:`installing the dependencies <install_deps>`, which include `Ghostscript <https://www.ghostscript.com>`_ and `Tkinter <https://wiki.python.org/moin/TkInter>`_, you can use one of the following methods to install Camelot:
 -----------
-The easiest way to install Camelot is to install it with `conda`_, which is a package manager and environment management system for the `Anaconda`_ distribution.
+.. warning:: The ``lattice`` flavor will fail to run if Ghostscript is not installed. You may run into errors as shown in `issue #193 <https://github.com/camelot-dev/camelot/issues/193>`_.
 ::
-    $ conda install -c conda-forge camelot-py
+pip
 ---
-.. note:: Camelot is available for Python 2.7, 3.5, 3.6 and 3.7 on Linux, macOS and Windows. For Windows, you will need to install ghostscript which you can get from their `downloads page`_.
+To install Camelot from PyPI using ``pip``, please include the extra ``cv`` requirement as shown::
 .. _conda: https://conda.io/docs/
 .. _Anaconda: http://docs.continuum.io/anaconda/
 .. _downloads page: https://www.ghostscript.com/download/gsdnld.html
 .. _conda-forge: https://conda-forge.org/
 Using pip
 ---------
 After :ref:`installing the dependencies <install_deps>`, which include `Tkinter`_ and `ghostscript`_, you can simply use pip to install Camelot::
    $ pip install "camelot-py[cv]"
-.. _Tkinter: https://wiki.python.org/moin/TkInter
+conda
-.. _ghostscript: https://www.ghostscript.com
+-----
 `conda`_ is a package manager and environment management system for the `Anaconda <https://anaconda.org>`_ distribution. It can be used to install Camelot from the ``conda-forge`` channel::
    $ conda install -c conda-forge camelot-py
 From the source code
 --------------------
-After :ref:`installing the dependencies <install_deps>`, you can install from the source by:
+After :ref:`installing the dependencies <install_deps>`, you can install Camelot from source by:
 1. Cloning the GitHub repository.
 ::
    $ git clone https://www.github.com/camelot-dev/camelot
-2. Then simply using pip again.
+2. And then simply using pip again.
 ::
    $ cd camelot
--- a/tests/data.py
+++ b/tests/data.py
@ -2798,3 +2798,51 @@ data_stream_layout_kwargs = [
    ["A.O.P Cornas", ""],
    ["Domaine Lionnet « Terre Brûlée » 2012", "15 €"],
 ]
 data_stream_duplicated_text = [
    ['', '2012 BETTER VARIETIES Harvest Report for Minnesota Central  [ MNCE ]', '', '', '', '', '', '', '', '',
     'ALL SEASON TEST'],
    ['', 'Doug Toreen, Renville County, MN 55310          [ BIRD ISLAND ]', '', '', '', '', '', '', '', '',
     '1.3 - 2.0 MAT. GROUP'],
    ['PREV. CROP/HERB:', 'Corn / Surpass, Roundup', '', '', '', '', '', '', '', '', 'S2MNCE01'],
    ['SOIL DESCRIPTION:', '', 'Canisteo clay loam, mod. well drained, non-irrigated', '', '', '', '', '', '', '', ''],
    ['SOIL CONDITIONS:', '', 'High P, high K, 6.7 pH, 3.9% OM, Low SCN', '', '', '', '', '', '', '', '30" ROW SPACING'],
    ['TILLAGE/CULTIVATION:', 'conventional w/ fall till', '', '', '', '', '', '', '', '', ''],
    ['PEST MANAGEMENT:', 'Roundup twice', '', '', '', '', '', '', '', '', ''],
    ['SEEDED - RATE:', 'May 15', '140 000 /A', '', '', '', '', '', '', 'TOP 30 for YIELD of 63 TESTED', ''],
    ['HARVESTED - STAND:', 'Oct 3', '122 921 /A', '', '', '', '', '', '', 'AVERAGE of (3) REPLICATIONS', ''],
    ['', '', '', '', 'SCN', 'Seed', 'Yield', 'Moisture', 'Lodging', 'Stand', 'Gross'],
    ['Company/Brand', 'Product/Brand†', 'Technol.†', 'Mat.', 'Resist.', 'Trmt.†', 'Bu/A', '%', '%', '(x 1000)',
     'Income'], ['Kruger', 'K2 1901', 'RR2Y', '1.9', 'R', 'Ac,PV', '56.4', '7.6', '0', '126.3', '$846'],
    ['Stine', '19RA02 §', 'RR2Y', '1.9', 'R', 'CMB', '55.3', '7.6', '0', '120.0', '$830'],
    ['Wensman', 'W 3190NR2', 'RR2Y', '1.9', 'R', 'Ac', '54.5', '7.6', '0', '119.5', '$818'],
    ['Hefty', 'H17Y12', 'RR2Y', '1.7', 'MR', 'I', '53.7', '7.7', '0', '124.4', '$806'],
    ['Dyna-Gro', 'S15RY53', 'RR2Y', '1.5', 'R', 'Ac', '53.6', '7.7', '0', '126.8', '$804'],
    ['LG Seeds', 'C2050R2', 'RR2Y', '2.1', 'R', 'Ac', '53.6', '7.7', '0', '123.9', '$804'],
    ['Titan Pro', '19M42', 'RR2Y', '1.9', 'R', 'CMB', '53.6', '7.7', '0', '121.0', '$804'],
    ['Stine', '19RA02 (2) §', 'RR2Y', '1.9', 'R', 'CMB', '53.4', '7.7', '0', '123.9', '$801'],
    ['Asgrow', 'AG1832 §', 'RR2Y', '1.8', 'MR', 'Ac,PV', '52.9', '7.7', '0', '122.0', '$794'],
    ['Prairie Brand', 'PB-1566R2', 'RR2Y', '1.5', 'R', 'CMB', '52.8', '7.7', '0', '122.9', '$792'],
    ['Channel', '1901R2', 'RR2Y', '1.9', 'R', 'Ac,PV', '52.8', '7.6', '0', '123.4', '$791'],
    ['Titan Pro', '20M1', 'RR2Y', '2.0', 'R', 'Am', '52.5', '7.5', '0', '124.4', '$788'],
    ['Kruger', 'K2-2002', 'RR2Y', '2.0', 'R', 'Ac,PV', '52.4', '7.9', '0', '125.4', '$786'],
    ['Channel', '1700R2', 'RR2Y', '1.7', 'R', 'Ac,PV', '52.3', '7.9', '0', '123.9', '$784'],
    ['Hefty', 'H16Y11', 'RR2Y', '1.6', 'MR', 'I', '51.4', '7.6', '0', '123.9', '$771'],
    ['Anderson', '162R2Y', 'RR2Y', '1.6', 'R', 'None', '51.3', '7.5', '0', '119.5', '$770'],
    ['Titan Pro', '15M22', 'RR2Y', '1.5', 'R', 'CMB', '51.3', '7.8', '0', '125.4', '$769'],
    ['Dairyland', 'DSR-1710R2Y', 'RR2Y', '1.7', 'R', 'CMB', '51.3', '7.7', '0', '122.0', '$769'],
    ['Hefty', 'H20R3', 'RR2Y', '2.0', 'MR', 'I', '50.5', '8.2', '0', '121.0', '$757'],
    ['Prairie Brand', 'PB 1743R2', 'RR2Y', '1.7', 'R', 'CMB', '50.2', '7.7', '0', '125.8', '$752'],
    ['Gold Country', '1741', 'RR2Y', '1.7', 'R', 'Ac', '50.1', '7.8', '0', '123.9', '$751'],
    ['Trelay', '20RR43', 'RR2Y', '2.0', 'R', 'Ac,Ex', '49.9', '7.6', '0', '127.8', '$749'],
    ['Hefty', 'H14R3', 'RR2Y', '1.4', 'MR', 'I', '49.7', '7.7', '0', '122.9', '$746'],
    ['Prairie Brand', 'PB-2099NRR2', 'RR2Y', '2.0', 'R', 'CMB', '49.6', '7.8', '0', '126.3', '$743'],
    ['Wensman', 'W 3174NR2', 'RR2Y', '1.7', 'R', 'Ac', '49.3', '7.6', '0', '122.5', '$740'],
    ['Kruger', 'K2 1602', 'RR2Y', '1.6', 'R', 'Ac,PV', '48.7', '7.6', '0', '125.4', '$731'],
    ['NK Brand', 'S18-C2 §', 'RR2Y', '1.8', 'R', 'CMB', '48.7', '7.7', '0', '126.8', '$731'],
    ['Kruger', 'K2 1902', 'RR2Y', '1.9', 'R', 'Ac,PV', '48.7', '7.5', '0', '124.4', '$730'],
    ['Prairie Brand', 'PB-1823R2', 'RR2Y', '1.8', 'R', 'None', '48.5', '7.6', '0', '121.0', '$727'],
    ['Gold Country', '1541', 'RR2Y', '1.5', 'R', 'Ac', '48.4', '7.6', '0', '110.4', '$726'],
    ['', '', '', '', '', 'Test Average  =', '47.6', '7.7', '0', '122.9', '$713'],
    ['', '', '', '', '', 'LSD (0.10)  =', '5.7', '0.3', 'ns', '37.8', '566.4']
 ]
--- a/tests/files/birdisland.pdf
+++ b/tests/files/birdisland.pdf
--- a/tests/files/blank.pdf
+++ b/tests/files/blank.pdf
--- a/tests/files/empty.pdf
+++ b/tests/files/empty.pdf
--- a/tests/files/only_page_number.pdf
+++ b/tests/files/only_page_number.pdf
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@ -160,8 +160,8 @@ def test_cli_output_format():
 def test_cli_quiet():
    with TemporaryDirectory() as tempdir:
-        infile = os.path.join(testdir, "blank.pdf")
+        infile = os.path.join(testdir, "empty.pdf")
-        outfile = os.path.join(tempdir, "blank.csv")
+        outfile = os.path.join(tempdir, "empty.csv")
        runner = CliRunner()
        result = runner.invoke(
--- a/tests/test_common.py
+++ b/tests/test_common.py
@ -314,3 +314,11 @@ def test_version_generation_with_prerelease_revision():
        generate_version(version, prerelease=prerelease, revision=revision)
        == "0.7.3-alpha.2"
    )
 def test_stream_duplicated_text():
    df = pd.DataFrame(data_stream_duplicated_text)
    filename = os.path.join(testdir, "birdisland.pdf")
    tables = camelot.read_pdf(filename, flavor="stream")
    assert_frame_equal(df, tables[0].df)
--- a/tests/test_errors.py
+++ b/tests/test_errors.py
@ -55,15 +55,33 @@ def test_image_warning():
            )
-def test_no_tables_found():
+def test_lattice_no_tables_on_page():
-    filename = os.path.join(testdir, "blank.pdf")
+    filename = os.path.join(testdir, "empty.pdf")
    with warnings.catch_warnings():
        warnings.simplefilter("error")
        with pytest.raises(UserWarning) as e:
-            tables = camelot.read_pdf(filename)
+            tables = camelot.read_pdf(filename, flavor="lattice")
        assert str(e.value) == "No tables found on page-1"
 def test_stream_no_tables_on_page():
    filename = os.path.join(testdir, "empty.pdf")
    with warnings.catch_warnings():
        warnings.simplefilter("error")
        with pytest.raises(UserWarning) as e:
            tables = camelot.read_pdf(filename, flavor="stream")
        assert str(e.value) == "No tables found on page-1"
 def test_stream_no_tables_in_area():
    filename = os.path.join(testdir, "only_page_number.pdf")
    with warnings.catch_warnings():
        warnings.simplefilter("error")
        with pytest.raises(UserWarning) as e:
            tables = camelot.read_pdf(filename, flavor="stream")
        assert str(e.value) == "No tables found in table area 1"
 def test_no_tables_found_logs_suppressed():
    filename = os.path.join(testdir, "foo.pdf")
    with warnings.catch_warnings():
@ -77,7 +95,7 @@ def test_no_tables_found_logs_suppressed():
 def test_no_tables_found_warnings_suppressed():
-    filename = os.path.join(testdir, "blank.pdf")
+    filename = os.path.join(testdir, "empty.pdf")
    with warnings.catch_warnings():
        # the test should fail if any warning is thrown
        warnings.simplefilter("error")