diff --git a/README.md b/README.md index e0462cf..4975fb1 100644 --- a/README.md +++ b/README.md @@ -4,8 +4,6 @@ Camelot is a Python library and command-line tool for extracting tables from PDF ## Usage -### API -
>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
@@ -22,139 +20,51 @@ Camelot is a Python library and command-line tool for extracting tables from PDF
'page': 1
}
>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html
->>> tables[0].df
+>>> tables[0].df # get a pandas dataframe!
-### Command-line interface
-
--$ camelot --help -Usage: camelot [OPTIONS] FILEPATH - -Options: - -p, --pages TEXT Comma-separated page numbers to parse. - Example: 1,3,4 or 1,4-end - -o, --output TEXT Output filepath. - -f, --format [csv|json|excel|html] - Output file format. - -z, --zip Whether or not to create a ZIP archive. - -m, --mesh Whether or not to use Lattice method of - parsing. Stream is used by default. - -T, --table_area TEXT Table areas (x1,y1,x2,y2) to process. - x1, y1 - -> left-top and x2, y2 -> right-bottom - -split, --split_text Whether or not to split text if it spans - across multiple cells. - -flag, --flag_size (inactive) Whether or not to flag text which - has uncommon size. (Useful to detect - super/subscripts) - -M, --margins <FLOAT FLOAT FLOAT>... - char_margin, line_margin, word_margin for - PDFMiner. - -C, --columns TEXT x-coordinates of column separators. - -r, --row_close_tol INTEGER Rows will be formed by combining text - vertically within this tolerance. - -c, --col_close_tol INTEGER Columns will be formed by combining text - horizontally within this tolerance. - -back, --process_background (with --mesh) Whether or not to process - lines that are in background. - -scale, --line_size_scaling INTEGER - (with --mesh) Factor by which the page - dimensions will be divided to get smallest - length of detected lines. - -copy, --copy_text [h|v] (with --mesh) Specify direction in which - text will be copied over in a spanning cell. - -shift, --shift_text [l|r|t|b] (with --mesh) Specify direction in which - text in a spanning cell should flow. - -l, --line_close_tol INTEGER (with --mesh) Tolerance parameter used to - merge close vertical lines and close - horizontal lines. - -j, --joint_close_tol INTEGER (with --mesh) Tolerance parameter used to - decide whether the detected lines and points - lie close to each other. - -block, --threshold_blocksize INTEGER - (with --mesh) For adaptive thresholding, - size of a pixel neighborhood that is used to - calculate a threshold value for the pixel: - 3, 5, 7, and so on. - -const, --threshold_constant INTEGER - (with --mesh) For adaptive thresholding, - constant subtracted from the mean or - weighted mean. - Normally, it is positive but - may be zero or negative as well. - -I, --iterations INTEGER (with --mesh) Number of times for - erosion/dilation is applied. - -G, --geometry_type [text|table|contour|joint|line] - Plot geometry found on pdf page for - debugging. - text: Plot text objects. (Useful to get - table_area and columns coordinates) - table: Plot parsed table. - contour (with --mesh): Plot detected rectangles. - joint (with --mesh): Plot detected line intersections. - line (with --mesh): Plot detected lines. - --help Show this message and exit. -- -## Dependencies - -The dependencies include [tk](https://wiki.tcl.tk/3743) and [ghostscript](https://www.ghostscript.com/). +There's a [command-line tool]() too! ## Installation -Make sure you have the most updated versions for `pip` and `setuptools`. You can update them by +After [installing dependencies](), you can simply use pip:
-$ pip install -U pip setuptools +$ pip install camelot-py-### Installing dependencies +## Documentation -tk and ghostscript can be installed using your system's default package manager. - -#### Linux - -* Ubuntu - -
-$ sudo apt-get install python-tk ghostscript -- -* Arch Linux - -
-$ sudo pacman -S tk ghostscript -- -#### OS X - -
-$ brew install tcl-tk ghostscript -- -Finally, `cd` into the project directory and install by - -
-$ python setup.py install -+Th documentation is available at [link](). ## Development -### Code +The [Contributor's Guide]() has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README. + +### Source code You can check the latest sources with the command:
-$ git clone https://github.com/socialcopsdev/camelot.git +$ git clone https://www.github.com/socialcopsdev/camelot.git-### Contributing +### Setting up development environment -See [Contributing guidelines](). +You can install the development dependencies with the command: + +
+$ pip install camelot-py[dev] +### Testing +After installation, you can run tests using: +
$ python setup.py test -\ No newline at end of file + + +## License + +This project is licensed under the [MIT License](LICENSE). \ No newline at end of file