From 3600025a22d383f61d5e61f6f71206e6b1134eb5 Mon Sep 17 00:00:00 2001 From: Vinayak Mehta Date: Mon, 24 Sep 2018 16:26:35 +0530 Subject: [PATCH] Add doc fixes --- docs/index.rst | 8 ++++---- docs/user/advanced.rst | 4 ++-- docs/user/cli.rst | 27 +++++++++++++-------------- docs/user/intro.rst | 6 +++--- docs/user/quickstart.rst | 10 +++++----- 5 files changed, 27 insertions(+), 28 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 418bbb5..7cc87c5 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,8 +3,8 @@ You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. -Camelot: PDF Table Parsing for Humans -===================================== +Camelot: PDF Table Extraction for Humans +======================================== Release v\ |version|. (:ref:`Installation `) @@ -63,10 +63,10 @@ Why Camelot? - **Export** to multiple formats, including json, excel and html. - Simple and Elegant API, written in **Python**! -See `comparison with other PDF parsing libraries and tools`_. +See `comparison with other PDF table extraction libraries and tools`_. .. _ETL and data analysis workflows: https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 -.. _comparison with other PDF parsing libraries and tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Parsing-libraries-and-tools +.. _comparison with other PDF table extraction libraries and tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools The User Guide -------------- diff --git a/docs/user/advanced.rst b/docs/user/advanced.rst index cf30ba4..e73d3e4 100644 --- a/docs/user/advanced.rst +++ b/docs/user/advanced.rst @@ -77,7 +77,7 @@ This, as we shall later see, is very helpful with :ref:`Stream `, for no table ^^^^^ -Let's plot the table (to see if it was detected correctly or not). This geometry type, along with contour, line and joint is useful for debugging and improving the parsing output, in case the table wasn't detected correctly. More on that later. +Let's plot the table (to see if it was detected correctly or not). This geometry type, along with contour, line and joint is useful for debugging and improving the extraction output, in case the table wasn't detected correctly. More on that later. :: @@ -220,7 +220,7 @@ In this case, the text that `other tools`_ return, will be ``24.912``. This is h You can solve this by passing ``flag_size=True``, which will enclose the superscripts and subscripts with ````, based on font size, as shown below. -.. _other tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Parsing-libraries-and-tools +.. _other tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools :: diff --git a/docs/user/cli.rst b/docs/user/cli.rst index 4b35d21..a90d8f9 100644 --- a/docs/user/cli.rst +++ b/docs/user/cli.rst @@ -9,27 +9,26 @@ You can print the help for the interface, by typing ``camelot --help`` in your f :: - $ camelot --help Usage: camelot [OPTIONS] COMMAND [ARGS]... + Camelot: PDF Table Extraction for Humans + Options: --version Show the version and exit. - -p, --pages TEXT Comma-separated page numbers to parse. - Example: 1,3,4 or 1,4-end - -o, --output TEXT Output filepath. + -p, --pages TEXT Comma-separated page numbers. Example: 1,3,4 + or 1,4-end. + -o, --output TEXT Output file path. -f, --format [csv|json|excel|html] Output file format. - -z, --zip Whether or not to create a ZIP archive. - -split, --split_text Whether or not to split text if it spans - across multiple cells. - -flag, --flag_size (inactive) Whether or not to flag text which - has uncommon size. (Useful to detect - super/subscripts) + -z, --zip Create ZIP archive. + -split, --split_text Split text that spans across multiple cells. + -flag, --flag_size Flag text based on font size. Useful to + detect super/subscripts. -M, --margins ... - char_margin, line_margin, word_margin for - PDFMiner. + PDFMiner char_margin, line_margin and + word_margin. --help Show this message and exit. Commands: - lattice Use lines between text to parse table. - stream Use spaces between text to parse table. \ No newline at end of file + lattice Use lines between text to parse the table. + stream Use spaces between text to parse the table. \ No newline at end of file diff --git a/docs/user/intro.rst b/docs/user/intro.rst index da73a56..46447fa 100644 --- a/docs/user/intro.rst +++ b/docs/user/intro.rst @@ -14,8 +14,8 @@ Sadly, a lot of open data is given out as tables which are trapped inside PDF fi .. _PostScript: http://www.planetpdf.com/planetpdf/pdfs/warnock_camelot.pdf -Why another PDF Table Parsing library? --------------------------------------- +Why another PDF Table Extraction library? +----------------------------------------- There are both open (`Tabula`_, `pdf-table-extract`_) and closed-source (`smallpdf`_, `PDFTables`_) tools that are widely used, to extract tables from PDF files. They either give a nice output, or fail miserably. There is no in-between. This is not helpful, since everything in the real world, including PDF table extraction, is fuzzy, leading to creation of adhoc table extraction scripts for each different type of PDF that the user wants to parse. @@ -27,7 +27,7 @@ Here is a `comparison`_ of Camelot's output with outputs from other open-source .. _pdf-table-extract: https://github.com/ashima/pdf-table-extract .. _PDFTables: https://pdftables.com/ .. _Smallpdf: https://smallpdf.com -.. _comparison: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Parsing-libraries-and-tools +.. _comparison: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools What's in a name? ----------------- diff --git a/docs/user/quickstart.rst b/docs/user/quickstart.rst index c28fc08..d713746 100644 --- a/docs/user/quickstart.rst +++ b/docs/user/quickstart.rst @@ -5,10 +5,10 @@ Quickstart In a hurry to extract tables from PDFs? This document gives a good introduction to help you get started with using Camelot. -Parse a PDF ------------ +Read the PDF +------------ -Parsing a PDF to extract tables with Camelot is very simple. +Reading a PDF to extract tables with Camelot is very simple. Begin by importing the Camelot module:: @@ -47,7 +47,7 @@ Let's print the parsing report. 'page': 1 } -Woah! The accuracy is top-notch and whitespace is less, that means the table was parsed correctly (most probably). You can access the table as a pandas DataFrame by using the :class:`table ` object's ``df`` property. +Woah! The accuracy is top-notch and whitespace is less, that means the table was extracted correctly (most probably). You can access the table as a pandas DataFrame by using the :class:`table ` object's ``df`` property. :: @@ -81,7 +81,7 @@ This will export all tables as CSV files at the path specified. Alternatively, y Specify page numbers -------------------- -By default, Camelot only parses the first page of the PDF. To specify multiple pages, you can use the ``pages`` keyword argument:: +By default, Camelot only uses the first page of the PDF to extract tables. To specify multiple pages, you can use the ``pages`` keyword argument:: >>> camelot.read_pdf('your.pdf', pages='1,2,3')