Add doc fixes

2018-09-24 16:26:35 +05:30 · 2018-09-24 16:26:35 +05:30 · 3600025a22
parent 36b1dee5d9
commit 3600025a22
5 changed files with 27 additions and 28 deletions
--- a/docs/index.rst
+++ b/docs/index.rst
@ -3,8 +3,8 @@
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.
-Camelot: PDF Table Parsing for Humans
+Camelot: PDF Table Extraction for Humans
-=====================================
+========================================
 Release v\ |version|. (:ref:`Installation <install>`)
@ -63,10 +63,10 @@ Why Camelot?
 - **Export** to multiple formats, including json, excel and html.
 - Simple and Elegant API, written in **Python**!
-See `comparison with other PDF parsing libraries and tools`_.
+See `comparison with other PDF table extraction libraries and tools`_.
 .. _ETL and data analysis workflows: https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873
-.. _comparison with other PDF parsing libraries and tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Parsing-libraries-and-tools
+.. _comparison with other PDF table extraction libraries and tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
 The User Guide
 --------------
--- a/docs/user/advanced.rst
+++ b/docs/user/advanced.rst
@ -77,7 +77,7 @@ This, as we shall later see, is very helpful with :ref:`Stream <stream>`, for no
 table
 ^^^^^
-Let's plot the table (to see if it was detected correctly or not). This geometry type, along with contour, line and joint is useful for debugging and improving the parsing output, in case the table wasn't detected correctly. More on that later.
+Let's plot the table (to see if it was detected correctly or not). This geometry type, along with contour, line and joint is useful for debugging and improving the extraction output, in case the table wasn't detected correctly. More on that later.
 ::
@ -220,7 +220,7 @@ In this case, the text that `other tools`_ return, will be ``24.912``. This is h
 You can solve this by passing ``flag_size=True``, which will enclose the superscripts and subscripts with ``<s></s>``, based on font size, as shown below.
-.. _other tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Parsing-libraries-and-tools
+.. _other tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
 ::
--- a/docs/user/cli.rst
+++ b/docs/user/cli.rst
@ -9,27 +9,26 @@ You can print the help for the interface, by typing ``camelot --help`` in your f
 ::
  $ camelot --help
  Usage: camelot [OPTIONS] COMMAND [ARGS]...
  Camelot: PDF Table Extraction for Humans
  Options:
    --version                       Show the version and exit.
-    -p, --pages TEXT                Comma-separated page numbers to parse.
+    -p, --pages TEXT                Comma-separated page numbers. Example: 1,3,4
-                                    Example: 1,3,4 or 1,4-end
+                                    or 1,4-end.
    -o, --output TEXT               Output file path.
    -f, --format [csv|json|excel|html]
                                    Output file format.
-    -z, --zip                       Whether or not to create a ZIP archive.
+    -z, --zip                       Create ZIP archive.
-    -split, --split_text            Whether or not to split text if it spans
+    -split, --split_text            Split text that spans across multiple cells.
-                                    across multiple cells.
+    -flag, --flag_size              Flag text based on font size. Useful to
-    -flag, --flag_size              (inactive) Whether or not to flag text which
+                                    detect super/subscripts.
                                    has uncommon size. (Useful to detect
                                    super/subscripts)
    -M, --margins <FLOAT FLOAT FLOAT>...
-                                    char_margin, line_margin, word_margin for
+                                    PDFMiner char_margin, line_margin and
-                                    PDFMiner.
+                                    word_margin.
    --help                          Show this message and exit.
  Commands:
-    lattice  Use lines between text to parse table.
+    lattice  Use lines between text to parse the table.
-    stream   Use spaces between text to parse table.
+    stream   Use spaces between text to parse the table.
--- a/docs/user/intro.rst
+++ b/docs/user/intro.rst
@ -14,8 +14,8 @@ Sadly, a lot of open data is given out as tables which are trapped inside PDF fi
 .. _PostScript: http://www.planetpdf.com/planetpdf/pdfs/warnock_camelot.pdf
-Why another PDF Table Parsing library?
+Why another PDF Table Extraction library?
--------------------------------------
+-----------------------------------------
 There are both open (`Tabula`_, `pdf-table-extract`_) and closed-source (`smallpdf`_, `PDFTables`_) tools that are widely used, to extract tables from PDF files. They either give a nice output, or fail miserably. There is no in-between. This is not helpful, since everything in the real world, including PDF table extraction, is fuzzy, leading to creation of adhoc table extraction scripts for each different type of PDF that the user wants to parse.
@ -27,7 +27,7 @@ Here is a `comparison`_ of Camelot's output with outputs from other open-source
 .. _pdf-table-extract: https://github.com/ashima/pdf-table-extract
 .. _PDFTables: https://pdftables.com/
 .. _Smallpdf: https://smallpdf.com
-.. _comparison: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Parsing-libraries-and-tools
+.. _comparison: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
 What's in a name?
 -----------------
--- a/docs/user/quickstart.rst
+++ b/docs/user/quickstart.rst
@ -5,10 +5,10 @@ Quickstart
 In a hurry to extract tables from PDFs? This document gives a good introduction to help you get started with using Camelot.
-Parse a PDF
+Read the PDF
-----------
+------------
-Parsing a PDF to extract tables with Camelot is very simple.
+Reading a PDF to extract tables with Camelot is very simple.
 Begin by importing the Camelot module::
@ -47,7 +47,7 @@ Let's print the parsing report.
        'page': 1
    }
-Woah! The accuracy is top-notch and whitespace is less, that means the table was parsed correctly (most probably). You can access the table as a pandas DataFrame by using the :class:`table <camelot.core.Table>` object's ``df`` property.
+Woah! The accuracy is top-notch and whitespace is less, that means the table was extracted correctly (most probably). You can access the table as a pandas DataFrame by using the :class:`table <camelot.core.Table>` object's ``df`` property.
 ::
@ -81,7 +81,7 @@ This will export all tables as CSV files at the path specified. Alternatively, y
 Specify page numbers
 --------------------
-By default, Camelot only parses the first page of the PDF. To specify multiple pages, you can use the ``pages`` keyword argument::
+By default, Camelot only uses the first page of the PDF to extract tables. To specify multiple pages, you can use the ``pages`` keyword argument::
    >>> camelot.read_pdf('your.pdf', pages='1,2,3')