# camelot Camelot is a Python 2.7 library and command-line tool for getting tables out of PDF files. ## Usage
>>> import camelot
>>> tables = camelot.read_pdf("foo.pdf")
>>> tables
<TableList n=2>
>>> tables.to_csv(zip=True) # to_json, to_excel, to_html
>>> tables[0]
<Table shape=(3,4)>
>>> tables[0].parsing_report
{
    "accuracy": 96,
    "whitespace": 80,
    "time_taken": 0.5,
    "page": 1
}
>>> df = tables[0].df
>>> tables[0].to_csv("foo.csv") # to_json, to_excel, to_html
Camelot comes with a CLI where you can specify page numbers, output format, output directory etc. By default, the output files are placed in the same directory as the PDF.
Camelot: PDF parsing made simpler!

usage:
 camelot [options] <method> [<args>...]

options:
 -h, --help                Show this screen.
 -v, --version             Show version.
 -V, --verbose             Verbose.
 -p, --pages <pageno>      Comma-separated list of page numbers.
                           Example: -p 1,3-6,10  [default: 1]
 -P, --parallel            Parallelize the parsing process.
 -f, --format <format>     Output format. (csv,tsv,html,json,xlsx) [default: csv]
 -l, --log                 Log to file.
 -o, --output <directory>  Output directory.
 -M, --cmargin <cmargin>   Char margin. Chars closer than cmargin are
                           grouped together to form a word. [default: 2.0]
 -L, --lmargin <lmargin>   Line margin. Lines closer than lmargin are
                           grouped together to form a textbox. [default: 0.5]
 -W, --wmargin <wmargin>   Word margin. Insert blank spaces between chars
                           if distance between words is greater than word
                           margin. [default: 0.1]
 -J, --split_text          Split text lines if they span across multiple cells.
 -K, --flag_size           Flag substring if its size differs from the whole string.
                           Useful for super and subscripts.
 -X, --print-stats         List stats on the parsing process.
 -Y, --save-stats          Save stats to a file.
 -Z, --plot <dist>         Plot distributions. (page,all,rc)

camelot methods:
 lattice  Looks for lines between data.
 stream   Looks for spaces between data.

See 'camelot <method> -h' for more information on a specific method.
## Dependencies Currently, camelot works under Python 2.7. The required dependencies include [numpy](http://www.numpy.org/), [OpenCV](http://opencv.org/) and [ghostscript](https://www.ghostscript.com/). ## Installation Make sure you have the most updated versions for `pip` and `setuptools`. You can update them by
pip install -U pip setuptools
### Installing dependencies numpy can be install using `pip`. OpenCV and ghostscript can be installed using your system's default package manager. #### Linux * Arch Linux
sudo pacman -S opencv tk ghostscript
* Ubuntu
sudo apt-get install python-opencv python-tk ghostscript
#### OS X
brew install homebrew/science/opencv ghostscript
Finally, `cd` into the project directory and install by
make install
## Development ### Code You can check the latest sources with the command:
git clone https://github.com/socialcopsdev/camelot.git
### Contributing See [Contributing doc](). ### Testing
make test
## License BSD License