Merge branch 'master' into format-markdown

pull/222/head
Vinayak Mehta 2021-06-28 00:32:00 +05:30 committed by GitHub
commit acb8f005c2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
23 changed files with 831 additions and 313 deletions

View File

@ -10,20 +10,25 @@ assignees: ''
<!-- Please read the filing issues section of the contributor's guide first: https://camelot-py.readthedocs.io/en/master/dev/contributing.html --> <!-- Please read the filing issues section of the contributor's guide first: https://camelot-py.readthedocs.io/en/master/dev/contributing.html -->
**Describe the bug** **Describe the bug**
A clear and concise description of what the bug is.
<!-- A clear and concise description of what the bug is. -->
**Steps to reproduce the bug** **Steps to reproduce the bug**
Steps used to install `camelot`:
1. Add step here (you can add more steps too)
Steps to reproduce the behavior: <!-- Steps used to install `camelot`:
1. Add step here (you can add more steps too) 1. Add step here (you can add more steps too) -->
<!-- Steps to be used to reproduce behavior:
1. Add step here (you can add more steps too) -->
**Expected behavior** **Expected behavior**
A clear and concise description of what you expected to happen.
<!-- A clear and concise description of what you expected to happen. -->
**Code** **Code**
Add the Camelot code snippet that you used.
<!-- Add the Camelot code snippet that you used. -->
``` ```
import camelot import camelot
@ -31,18 +36,22 @@ import camelot
``` ```
**PDF** **PDF**
Add the PDF file that you want to extract tables from.
<!-- Add the PDF file that you want to extract tables from. -->
**Screenshots** **Screenshots**
If applicable, add screenshots to help explain your problem.
<!-- If applicable, add screenshots to help explain your problem. -->
**Environment** **Environment**
- OS: [e.g. MacOS]
- Python version: - OS: [e.g. macOS]
- Numpy version: - Python version:
- OpenCV version: - Numpy version:
- Ghostscript version: - OpenCV version:
- Camelot version: - Ghostscript version:
- Camelot version:
**Additional context** **Additional context**
Add any other context about the problem here.
<!-- Add any other context about the problem here. -->

23
.github/workflows/tests.yml vendored 100644
View File

@ -0,0 +1,23 @@
name: tests
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7, 3.8, 3.9]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install camelot with dependencies
run: |
make install
- name: Test with pytest
run: |
make test

View File

@ -1,29 +0,0 @@
sudo: true
language: python
cache: pip
addons:
apt:
update: true
install:
- make install
jobs:
include:
- stage: test
script:
- make test
python: '3.6'
- stage: test
script:
- make test
python: '3.7'
dist: xenial
- stage: test
script:
- make test
python: '3.8'
dist: xenial
- stage: coverage
python: '3.8'
script:
- make test
- codecov --verbose

View File

@ -4,6 +4,33 @@ Release History
master master
------ ------
- Add faq section. [#216](https://github.com/camelot-dev/camelot/pull/216) by [Stefano Fiorucci](https://github.com/anakin87).
0.9.0 (2021-06-15)
------------------
**Bugfixes**
- Fix use of resolution argument to generate image with ghostscript. [#231](https://github.com/camelot-dev/camelot/pull/231) by [Tiago Samaha Cordeiro](https://github.com/tiagosamaha).
- [#15](https://github.com/camelot-dev/camelot/issues/15) Fix duplicate strings being assigned to the same cell. [#206](https://github.com/camelot-dev/camelot/pull/206) by [Eduardo Gonzalez Lopez de Murillas](https://github.com/edugonza).
- Save plot when filename is specified. [#121](https://github.com/camelot-dev/camelot/pull/121) by [Jens Diemer](https://github.com/jedie).
- Close file streams explicitly. [#202](https://github.com/camelot-dev/camelot/pull/202) by [Martin Abente Lahaye](https://github.com/tchx84).
- Use correct re.sub signature. [#186](https://github.com/camelot-dev/camelot/pull/186) by [pevisscher](https://github.com/pevisscher).
- [#183](https://github.com/camelot-dev/camelot/issues/183) Fix UnicodeEncodeError when using Stream flavor by adding encoding kwarg to `to_html`. [#188](https://github.com/camelot-dev/camelot/pull/188) by [Stefano Fiorucci](https://github.com/anakin87).
- [#179](https://github.com/camelot-dev/camelot/issues/179) Fix `max() arg is an empty sequence` error on PDFs with blank pages. [#189](https://github.com/camelot-dev/camelot/pull/189) by Vinayak Mehta.
**Improvements**
- Add `line_overlap` and `boxes_flow` to `LAParams`. [#219](https://github.com/camelot-dev/camelot/pull/219) by [Arnie97](https://github.com/Arnie97).
- [Add bug report template.](https://github.com/camelot-dev/camelot/commit/0a3944e54d133b701edfe9c7546ff11289301ba8)
- Move from [Travis to GitHub Actions](https://github.com/camelot-dev/camelot/pull/241).
- Update `.readthedocs.yml` and [remove requirements.txt](https://github.com/camelot-dev/camelot/commit/7ab5db39d07baa4063f975e9e00f6073340e04c1#diff-cde814ef2f549dc093f5b8fc533b7e8f47e7b32a8081e0760e57d5c25a1139d9)
**Documentation**
- [#193](https://github.com/camelot-dev/camelot/issues/193) Add better checks to confirm proper installation of ghostscript. [#196](https://github.com/camelot-dev/camelot/pull/196) by [jimhall](https://github.com/jimhall).
- Update `advanced.rst` plotting examples. [#119](https://github.com/camelot-dev/camelot/pull/119) by [Jens Diemer](https://github.com/jedie).
0.8.2 (2020-07-27) 0.8.2 (2020-07-27)
------------------ ------------------

View File

@ -1,6 +1,6 @@
MIT License MIT License
Copyright (c) 2019-2020 Camelot Developers Copyright (c) 2019-2021 Camelot Developers
Copyright (c) 2018-2019 Peeply Private Ltd (Singapore) Copyright (c) 2018-2019 Peeply Private Ltd (Singapore)
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy

View File

@ -4,11 +4,10 @@
# Camelot: PDF Table Extraction for Humans # Camelot: PDF Table Extraction for Humans
[![Build Status](https://travis-ci.org/camelot-dev/camelot.svg?branch=master)](https://travis-ci.org/camelot-dev/camelot) [![Documentation Status](https://readthedocs.org/projects/camelot-py/badge/?version=master)](https://camelot-py.readthedocs.io/en/master/) ![Build Status](https://github.com/camelot-dev/camelot/actions/workflows/tests.yml/badge.svg) [![Documentation Status](https://readthedocs.org/projects/camelot-py/badge/?version=master)](https://camelot-py.readthedocs.io/en/master/)
[![codecov.io](https://codecov.io/github/camelot-dev/camelot/badge.svg?branch=master&service=github)](https://codecov.io/github/camelot-dev/camelot?branch=master) [![codecov.io](https://codecov.io/github/camelot-dev/camelot/badge.svg?branch=master&service=github)](https://codecov.io/github/camelot-dev/camelot?branch=master)
[![image](https://img.shields.io/pypi/v/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/l/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/pyversions/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![Gitter chat](https://badges.gitter.im/camelot-dev/Lobby.png)](https://gitter.im/camelot-dev/Lobby) [![image](https://img.shields.io/pypi/v/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/l/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![image](https://img.shields.io/pypi/pyversions/camelot-py.svg)](https://pypi.org/project/camelot-py/) [![Gitter chat](https://badges.gitter.im/camelot-dev/Lobby.png)](https://gitter.im/camelot-dev/Lobby)
[![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) [![image](https://img.shields.io/badge/continous%20quality-deepsource-lightgrey)](https://deepsource.io/gh/camelot-dev/camelot/?ref=repository-badge) [![image](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
**Camelot** is a Python library that can help you extract tables from PDFs! **Camelot** is a Python library that can help you extract tables from PDFs!
@ -50,10 +49,12 @@ Camelot also comes packaged with a [command-line interface](https://camelot-py.r
**Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".) **Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
You can check out some frequently asked questions [here](https://camelot-py.readthedocs.io/en/master/user/faq.html).
## Why Camelot? ## Why Camelot?
- **Configurability**: Camelot gives you control over the table extraction process with its [tweakable settings](https://camelot-py.readthedocs.io/en/master/user/advanced.html). - **Configurability**: Camelot gives you control over the table extraction process with [tweakable settings](https://camelot-py.readthedocs.io/en/master/user/advanced.html).
- **Metrics**: Bad tables can be discarded based on metrics like accuracy and whitespace, without having to manually look at each table. - **Metrics**: You can discard bad tables based on metrics like accuracy and whitespace, without having to manually look at each table.
- **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873). You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML and Sqlite. - **Output**: Each table is extracted into a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873). You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML and Sqlite.
See [comparison with similar libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools). See [comparison with similar libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).

View File

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
VERSION = (0, 8, 2) VERSION = (0, 9, 0)
PRERELEASE = None # alpha, beta or rc PRERELEASE = None # alpha, beta or rc
REVISION = None REVISION = None

View File

@ -55,7 +55,9 @@ class TextEdge(object):
x = round(self.x, 2) x = round(self.x, 2)
y0 = round(self.y0, 2) y0 = round(self.y0, 2)
y1 = round(self.y1, 2) y1 = round(self.y1, 2)
return f"<TextEdge x={x} y0={y0} y1={y1} align={self.align} valid={self.is_valid}>" return (
f"<TextEdge x={x} y0={y0} y1={y1} align={self.align} valid={self.is_valid}>"
)
def update_coords(self, x, y0, edge_tol=50): def update_coords(self, x, y0, edge_tol=50):
"""Updates the text edge's x and bottom y coordinates and sets """Updates the text edge's x and bottom y coordinates and sets
@ -102,8 +104,7 @@ class TextEdges(object):
return None return None
def add(self, textline, align): def add(self, textline, align):
"""Adds a new text edge to the current dict. """Adds a new text edge to the current dict."""
"""
x = self.get_x_coord(textline, align) x = self.get_x_coord(textline, align)
y0 = textline.y0 y0 = textline.y0
y1 = textline.y1 y1 = textline.y1
@ -111,8 +112,7 @@ class TextEdges(object):
self._textedges[align].append(te) self._textedges[align].append(te)
def update(self, textline): def update(self, textline):
"""Updates an existing text edge in the current dict. """Updates an existing text edge in the current dict."""
"""
for align in ["left", "right", "middle"]: for align in ["left", "right", "middle"]:
x_coord = self.get_x_coord(textline, align) x_coord = self.get_x_coord(textline, align)
idx = self.find(x_coord, align) idx = self.find(x_coord, align)
@ -304,8 +304,7 @@ class Cell(object):
@property @property
def bound(self): def bound(self):
"""The number of sides on which the cell is bounded. """The number of sides on which the cell is bounded."""
"""
return self.top + self.bottom + self.left + self.right return self.top + self.bottom + self.left + self.right
@ -361,8 +360,7 @@ class Table(object):
@property @property
def data(self): def data(self):
"""Returns two-dimensional list of strings in table. """Returns two-dimensional list of strings in table."""
"""
d = [] d = []
for row in self.cells: for row in self.cells:
d.append([cell.text.strip() for cell in row]) d.append([cell.text.strip() for cell in row])
@ -383,8 +381,7 @@ class Table(object):
return report return report
def set_all_edges(self): def set_all_edges(self):
"""Sets all table edges to True. """Sets all table edges to True."""
"""
for row in self.cells: for row in self.cells:
for cell in row: for cell in row:
cell.left = cell.right = cell.top = cell.bottom = True cell.left = cell.right = cell.top = cell.bottom = True
@ -526,8 +523,7 @@ class Table(object):
return self return self
def set_border(self): def set_border(self):
"""Sets table border edges to True. """Sets table border edges to True."""
"""
for r in range(len(self.rows)): for r in range(len(self.rows)):
self.cells[r][0].left = True self.cells[r][0].left = True
self.cells[r][len(self.cols) - 1].right = True self.cells[r][len(self.cols) - 1].right = True

View File

@ -81,8 +81,7 @@ class __Ghostscript(object):
def Ghostscript(*args, **kwargs): def Ghostscript(*args, **kwargs):
"""Factory function for setting up a Ghostscript instance """Factory function for setting up a Ghostscript instance"""
"""
global __instance__ global __instance__
# Ghostscript only supports a single instance # Ghostscript only supports a single instance
if __instance__ is None: if __instance__ is None:

View File

@ -167,9 +167,7 @@ class PDFHandler(object):
with TemporaryDirectory() as tempdir: with TemporaryDirectory() as tempdir:
for p in self.pages: for p in self.pages:
self._save_page(self.filepath, p, tempdir) self._save_page(self.filepath, p, tempdir)
pages = [ pages = [os.path.join(tempdir, f"page-{p}.pdf") for p in self.pages]
os.path.join(tempdir, f"page-{p}.pdf") for p in self.pages
]
parser = Lattice(**kwargs) if flavor == "lattice" else Stream(**kwargs) parser = Lattice(**kwargs) if flavor == "lattice" else Stream(**kwargs)
for p in pages: for p in pages:
t = parser.extract_tables( t = parser.extract_tables(

View File

@ -6,8 +6,7 @@ from ..utils import get_page_layout, get_text_objects
class BaseParser(object): class BaseParser(object):
"""Defines a base parser. """Defines a base parser."""
"""
def _generate_layout(self, filename, layout_kwargs): def _generate_layout(self, filename, layout_kwargs):
self.filename = filename self.filename = filename

View File

@ -211,8 +211,8 @@ class Lattice(BaseParser):
from ..ext.ghostscript import Ghostscript from ..ext.ghostscript import Ghostscript
self.imagename = "".join([self.rootname, ".png"]) self.imagename = "".join([self.rootname, ".png"])
gs_call = "-q -sDEVICE=png16m -o {} -r300 {}".format( gs_call = "-q -sDEVICE=png16m -o {} -r{} {}".format(
self.imagename, self.filename self.imagename, self.resolution, self.filename
) )
gs_call = gs_call.encode().split() gs_call = gs_call.encode().split()
null = open(os.devnull, "wb") null = open(os.devnull, "wb")

View File

@ -65,7 +65,7 @@ class Stream(BaseParser):
edge_tol=50, edge_tol=50,
row_tol=2, row_tol=2,
column_tol=0, column_tol=0,
**kwargs **kwargs,
): ):
self.table_regions = table_regions self.table_regions = table_regions
self.table_areas = table_areas self.table_areas = table_areas
@ -362,10 +362,10 @@ class Stream(BaseParser):
if len(elements): if len(elements):
ncols = max(set(elements), key=elements.count) ncols = max(set(elements), key=elements.count)
else: else:
warnings.warn( warnings.warn(f"No tables found in table area {table_idx + 1}")
f"No tables found in table area {table_idx + 1}" cols = [
) (t.x0, t.x1) for r in rows_grouped if len(r) == ncols for t in r
cols = [(t.x0, t.x1) for r in rows_grouped if len(r) == ncols for t in r] ]
cols = self._merge_columns(sorted(cols), column_tol=self.column_tol) cols = self._merge_columns(sorted(cols), column_tol=self.column_tol)
inner_text = [] inner_text = []
for i in range(1, len(cols)): for i in range(1, len(cols)):

View File

@ -34,13 +34,9 @@ class PlotMethods(object):
raise ImportError("matplotlib is required for plotting.") raise ImportError("matplotlib is required for plotting.")
if table.flavor == "lattice" and kind in ["textedge"]: if table.flavor == "lattice" and kind in ["textedge"]:
raise NotImplementedError( raise NotImplementedError(f"Lattice flavor does not support kind='{kind}'")
f"Lattice flavor does not support kind='{kind}'"
)
elif table.flavor == "stream" and kind in ["joint", "line"]: elif table.flavor == "stream" and kind in ["joint", "line"]:
raise NotImplementedError( raise NotImplementedError(f"Stream flavor does not support kind='{kind}'")
f"Stream flavor does not support kind='{kind}'"
)
plot_method = getattr(self, kind) plot_method = getattr(self, kind)
fig = plot_method(table) fig = plot_method(table)
@ -48,7 +44,7 @@ class PlotMethods(object):
if filename is not None: if filename is not None:
fig.savefig(filename) fig.savefig(filename)
return None return None
return fig return fig
def text(self, table): def text(self, table):

View File

@ -838,23 +838,27 @@ def compute_whitespace(d):
def get_page_layout( def get_page_layout(
filename, filename,
line_overlap=0.5,
char_margin=1.0, char_margin=1.0,
line_margin=0.5, line_margin=0.5,
word_margin=0.1, word_margin=0.1,
boxes_flow=0.5,
detect_vertical=True, detect_vertical=True,
all_texts=True, all_texts=True,
): ):
"""Returns a PDFMiner LTPage object and page dimension of a single """Returns a PDFMiner LTPage object and page dimension of a single
page pdf. See https://euske.github.io/pdfminer/ to get definitions page pdf. To get the definitions of kwargs, see
of kwargs. https://pdfminersix.rtfd.io/en/latest/reference/composable.html.
Parameters Parameters
---------- ----------
filename : string filename : string
Path to pdf file. Path to pdf file.
line_overlap : float
char_margin : float char_margin : float
line_margin : float line_margin : float
word_margin : float word_margin : float
boxes_flow : float
detect_vertical : bool detect_vertical : bool
all_texts : bool all_texts : bool
@ -870,11 +874,15 @@ def get_page_layout(
parser = PDFParser(f) parser = PDFParser(f)
document = PDFDocument(parser) document = PDFDocument(parser)
if not document.is_extractable: if not document.is_extractable:
raise PDFTextExtractionNotAllowed(f"Text extraction is not allowed: {filename}") raise PDFTextExtractionNotAllowed(
f"Text extraction is not allowed: {filename}"
)
laparams = LAParams( laparams = LAParams(
line_overlap=line_overlap,
char_margin=char_margin, char_margin=char_margin,
line_margin=line_margin, line_margin=line_margin,
word_margin=word_margin, word_margin=word_margin,
boxes_flow=boxes_flow,
detect_vertical=detect_vertical, detect_vertical=detect_vertical,
all_texts=all_texts, all_texts=all_texts,
) )

View File

@ -1,7 +1,19 @@
# flasky pygments style based on tango style # flasky pygments style based on tango style
from pygments.style import Style from pygments.style import Style
from pygments.token import Keyword, Name, Comment, String, Error, \ from pygments.token import (
Number, Operator, Generic, Whitespace, Punctuation, Other, Literal Keyword,
Name,
Comment,
String,
Error,
Number,
Operator,
Generic,
Whitespace,
Punctuation,
Other,
Literal,
)
class FlaskyStyle(Style): class FlaskyStyle(Style):
@ -11,76 +23,67 @@ class FlaskyStyle(Style):
styles = { styles = {
# No corresponding class for the following: # No corresponding class for the following:
# Text: "", # class: '' # Text: "", # class: ''
Whitespace: "underline #f8f8f8", # class: 'w' Whitespace: "underline #f8f8f8", # class: 'w'
Error: "#a40000 border:#ef2929", # class: 'err' Error: "#a40000 border:#ef2929", # class: 'err'
Other: "#000000", # class 'x' Other: "#000000", # class 'x'
Comment: "italic #8f5902", # class: 'c'
Comment: "italic #8f5902", # class: 'c' Comment.Preproc: "noitalic", # class: 'cp'
Comment.Preproc: "noitalic", # class: 'cp' Keyword: "bold #004461", # class: 'k'
Keyword.Constant: "bold #004461", # class: 'kc'
Keyword: "bold #004461", # class: 'k' Keyword.Declaration: "bold #004461", # class: 'kd'
Keyword.Constant: "bold #004461", # class: 'kc' Keyword.Namespace: "bold #004461", # class: 'kn'
Keyword.Declaration: "bold #004461", # class: 'kd' Keyword.Pseudo: "bold #004461", # class: 'kp'
Keyword.Namespace: "bold #004461", # class: 'kn' Keyword.Reserved: "bold #004461", # class: 'kr'
Keyword.Pseudo: "bold #004461", # class: 'kp' Keyword.Type: "bold #004461", # class: 'kt'
Keyword.Reserved: "bold #004461", # class: 'kr' Operator: "#582800", # class: 'o'
Keyword.Type: "bold #004461", # class: 'kt' Operator.Word: "bold #004461", # class: 'ow' - like keywords
Punctuation: "bold #000000", # class: 'p'
Operator: "#582800", # class: 'o'
Operator.Word: "bold #004461", # class: 'ow' - like keywords
Punctuation: "bold #000000", # class: 'p'
# because special names such as Name.Class, Name.Function, etc. # because special names such as Name.Class, Name.Function, etc.
# are not recognized as such later in the parsing, we choose them # are not recognized as such later in the parsing, we choose them
# to look the same as ordinary variables. # to look the same as ordinary variables.
Name: "#000000", # class: 'n' Name: "#000000", # class: 'n'
Name.Attribute: "#c4a000", # class: 'na' - to be revised Name.Attribute: "#c4a000", # class: 'na' - to be revised
Name.Builtin: "#004461", # class: 'nb' Name.Builtin: "#004461", # class: 'nb'
Name.Builtin.Pseudo: "#3465a4", # class: 'bp' Name.Builtin.Pseudo: "#3465a4", # class: 'bp'
Name.Class: "#000000", # class: 'nc' - to be revised Name.Class: "#000000", # class: 'nc' - to be revised
Name.Constant: "#000000", # class: 'no' - to be revised Name.Constant: "#000000", # class: 'no' - to be revised
Name.Decorator: "#888", # class: 'nd' - to be revised Name.Decorator: "#888", # class: 'nd' - to be revised
Name.Entity: "#ce5c00", # class: 'ni' Name.Entity: "#ce5c00", # class: 'ni'
Name.Exception: "bold #cc0000", # class: 'ne' Name.Exception: "bold #cc0000", # class: 'ne'
Name.Function: "#000000", # class: 'nf' Name.Function: "#000000", # class: 'nf'
Name.Property: "#000000", # class: 'py' Name.Property: "#000000", # class: 'py'
Name.Label: "#f57900", # class: 'nl' Name.Label: "#f57900", # class: 'nl'
Name.Namespace: "#000000", # class: 'nn' - to be revised Name.Namespace: "#000000", # class: 'nn' - to be revised
Name.Other: "#000000", # class: 'nx' Name.Other: "#000000", # class: 'nx'
Name.Tag: "bold #004461", # class: 'nt' - like a keyword Name.Tag: "bold #004461", # class: 'nt' - like a keyword
Name.Variable: "#000000", # class: 'nv' - to be revised Name.Variable: "#000000", # class: 'nv' - to be revised
Name.Variable.Class: "#000000", # class: 'vc' - to be revised Name.Variable.Class: "#000000", # class: 'vc' - to be revised
Name.Variable.Global: "#000000", # class: 'vg' - to be revised Name.Variable.Global: "#000000", # class: 'vg' - to be revised
Name.Variable.Instance: "#000000", # class: 'vi' - to be revised Name.Variable.Instance: "#000000", # class: 'vi' - to be revised
Number: "#990000", # class: 'm'
Number: "#990000", # class: 'm' Literal: "#000000", # class: 'l'
Literal.Date: "#000000", # class: 'ld'
Literal: "#000000", # class: 'l' String: "#4e9a06", # class: 's'
Literal.Date: "#000000", # class: 'ld' String.Backtick: "#4e9a06", # class: 'sb'
String.Char: "#4e9a06", # class: 'sc'
String: "#4e9a06", # class: 's' String.Doc: "italic #8f5902", # class: 'sd' - like a comment
String.Backtick: "#4e9a06", # class: 'sb' String.Double: "#4e9a06", # class: 's2'
String.Char: "#4e9a06", # class: 'sc' String.Escape: "#4e9a06", # class: 'se'
String.Doc: "italic #8f5902", # class: 'sd' - like a comment String.Heredoc: "#4e9a06", # class: 'sh'
String.Double: "#4e9a06", # class: 's2' String.Interpol: "#4e9a06", # class: 'si'
String.Escape: "#4e9a06", # class: 'se' String.Other: "#4e9a06", # class: 'sx'
String.Heredoc: "#4e9a06", # class: 'sh' String.Regex: "#4e9a06", # class: 'sr'
String.Interpol: "#4e9a06", # class: 'si' String.Single: "#4e9a06", # class: 's1'
String.Other: "#4e9a06", # class: 'sx' String.Symbol: "#4e9a06", # class: 'ss'
String.Regex: "#4e9a06", # class: 'sr' Generic: "#000000", # class: 'g'
String.Single: "#4e9a06", # class: 's1' Generic.Deleted: "#a40000", # class: 'gd'
String.Symbol: "#4e9a06", # class: 'ss' Generic.Emph: "italic #000000", # class: 'ge'
Generic.Error: "#ef2929", # class: 'gr'
Generic: "#000000", # class: 'g' Generic.Heading: "bold #000080", # class: 'gh'
Generic.Deleted: "#a40000", # class: 'gd' Generic.Inserted: "#00A000", # class: 'gi'
Generic.Emph: "italic #000000", # class: 'ge' Generic.Output: "#888", # class: 'go'
Generic.Error: "#ef2929", # class: 'gr' Generic.Prompt: "#745334", # class: 'gp'
Generic.Heading: "bold #000080", # class: 'gh' Generic.Strong: "bold #000000", # class: 'gs'
Generic.Inserted: "#00A000", # class: 'gi' Generic.Subheading: "bold #800080", # class: 'gu'
Generic.Output: "#888", # class: 'go' Generic.Traceback: "bold #a40000", # class: 'gt'
Generic.Prompt: "#745334", # class: 'gp'
Generic.Strong: "bold #000000", # class: 'gs'
Generic.Subheading: "bold #800080", # class: 'gu'
Generic.Traceback: "bold #a40000", # class: 'gt'
} }

View File

@ -22,8 +22,8 @@ import sys
# sys.path.insert(0, os.path.abspath('..')) # sys.path.insert(0, os.path.abspath('..'))
# Insert Camelot's path into the system. # Insert Camelot's path into the system.
sys.path.insert(0, os.path.abspath('..')) sys.path.insert(0, os.path.abspath(".."))
sys.path.insert(0, os.path.abspath('_themes')) sys.path.insert(0, os.path.abspath("_themes"))
import camelot import camelot
@ -38,33 +38,33 @@ import camelot
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones. # ones.
extensions = [ extensions = [
'sphinx.ext.autodoc', "sphinx.ext.autodoc",
'sphinx.ext.napoleon', "sphinx.ext.napoleon",
'sphinx.ext.intersphinx', "sphinx.ext.intersphinx",
'sphinx.ext.todo', "sphinx.ext.todo",
'sphinx.ext.viewcode', "sphinx.ext.viewcode",
] ]
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates'] templates_path = ["_templates"]
# The suffix(es) of source filenames. # The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string: # You can specify multiple suffix as a list of string:
# #
# source_suffix = ['.rst', '.md'] # source_suffix = ['.rst', '.md']
source_suffix = '.rst' source_suffix = ".rst"
# The encoding of source files. # The encoding of source files.
# #
# source_encoding = 'utf-8-sig' # source_encoding = 'utf-8-sig'
# The master toctree document. # The master toctree document.
master_doc = 'index' master_doc = "index"
# General information about the project. # General information about the project.
project = u'Camelot' project = u"Camelot"
copyright = u'2020, Camelot Developers' copyright = u"2021, Camelot Developers"
author = u'Vinayak Mehta' author = u"Vinayak Mehta"
# The version info for the project you're documenting, acts as replacement for # The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the # |version| and |release|, also used in various other places throughout the
@ -94,7 +94,7 @@ language = None
# List of patterns, relative to source directory, that match files and # List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files. # directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path # This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build'] exclude_patterns = ["_build"]
# The reST default role (used for this markup: `text`) to use for all # The reST default role (used for this markup: `text`) to use for all
# documents. # documents.
@ -114,7 +114,7 @@ add_module_names = True
# show_authors = False # show_authors = False
# The name of the Pygments (syntax highlighting) style to use. # The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'flask_theme_support.FlaskyStyle' pygments_style = "flask_theme_support.FlaskyStyle"
# A list of ignored prefixes for module index sorting. # A list of ignored prefixes for module index sorting.
# modindex_common_prefix = [] # modindex_common_prefix = []
@ -130,18 +130,18 @@ todo_include_todos = True
# The theme to use for HTML and HTML Help pages. See the documentation for # The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes. # a list of builtin themes.
html_theme = 'alabaster' html_theme = "alabaster"
# Theme options are theme-specific and customize the look and feel of a theme # Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the # further. For a list of options available for each theme, see the
# documentation. # documentation.
html_theme_options = { html_theme_options = {
'show_powered_by': False, "show_powered_by": False,
'github_user': 'camelot-dev', "github_user": "camelot-dev",
'github_repo': 'camelot', "github_repo": "camelot",
'github_banner': True, "github_banner": True,
'show_related': False, "show_related": False,
'note_bg': '#FFF59C' "note_bg": "#FFF59C",
} }
# Add any paths that contain custom themes here, relative to this directory. # Add any paths that contain custom themes here, relative to this directory.
@ -164,12 +164,12 @@ html_theme_options = {
# The name of an image file (relative to this directory) to use as a favicon of # The name of an image file (relative to this directory) to use as a favicon of
# the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large. # pixels large.
html_favicon = '_static/favicon.ico' html_favicon = "_static/favicon.ico"
# Add any paths that contain custom static files (such as style sheets) here, # Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files, # relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css". # so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static'] html_static_path = ["_static"]
# Add any extra paths that contain custom files (such as robots.txt or # Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied # .htaccess) here, relative to this directory. These files are copied
@ -189,10 +189,21 @@ html_use_smartypants = True
# Custom sidebar templates, maps document names to template names. # Custom sidebar templates, maps document names to template names.
html_sidebars = { html_sidebars = {
'index': ['sidebarintro.html', 'relations.html', 'sourcelink.html', "index": [
'searchbox.html', 'hacks.html'], "sidebarintro.html",
'**': ['sidebarlogo.html', 'localtoc.html', 'relations.html', "relations.html",
'sourcelink.html', 'searchbox.html', 'hacks.html'] "sourcelink.html",
"searchbox.html",
"hacks.html",
],
"**": [
"sidebarlogo.html",
"localtoc.html",
"relations.html",
"sourcelink.html",
"searchbox.html",
"hacks.html",
],
} }
# Additional templates that should be rendered to pages, maps page names to # Additional templates that should be rendered to pages, maps page names to
@ -249,34 +260,30 @@ html_show_copyright = True
# html_search_scorer = 'scorer.js' # html_search_scorer = 'scorer.js'
# Output file base name for HTML help builder. # Output file base name for HTML help builder.
htmlhelp_basename = 'Camelotdoc' htmlhelp_basename = "Camelotdoc"
# -- Options for LaTeX output --------------------------------------------- # -- Options for LaTeX output ---------------------------------------------
latex_elements = { latex_elements = {
# The paper size ('letterpaper' or 'a4paper'). # The paper size ('letterpaper' or 'a4paper').
# #
# 'papersize': 'letterpaper', # 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
# The font size ('10pt', '11pt' or '12pt'). #
# # 'pointsize': '10pt',
# 'pointsize': '10pt', # Additional stuff for the LaTeX preamble.
#
# Additional stuff for the LaTeX preamble. # 'preamble': '',
# # Latex figure (float) alignment
# 'preamble': '', #
# 'figure_align': 'htbp',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
} }
# Grouping the document tree into LaTeX files. List of tuples # Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, # (source start file, target name, title,
# author, documentclass [howto, manual, or own class]). # author, documentclass [howto, manual, or own class]).
latex_documents = [ latex_documents = [
(master_doc, 'Camelot.tex', u'Camelot Documentation', (master_doc, "Camelot.tex", u"Camelot Documentation", u"Vinayak Mehta", "manual"),
u'Vinayak Mehta', 'manual'),
] ]
# The name of an image file (relative to this directory) to place at the top of # The name of an image file (relative to this directory) to place at the top of
@ -316,10 +323,7 @@ latex_documents = [
# One entry per manual page. List of tuples # One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section). # (source start file, name, description, authors, manual section).
man_pages = [ man_pages = [(master_doc, "Camelot", u"Camelot Documentation", [author], 1)]
(master_doc, 'Camelot', u'Camelot Documentation',
[author], 1)
]
# If true, show URL addresses after external links. # If true, show URL addresses after external links.
# #
@ -332,9 +336,15 @@ man_pages = [
# (source start file, target name, title, author, # (source start file, target name, title, author,
# dir menu entry, description, category) # dir menu entry, description, category)
texinfo_documents = [ texinfo_documents = [
(master_doc, 'Camelot', u'Camelot Documentation', (
author, 'Camelot', 'One line description of project.', master_doc,
'Miscellaneous'), "Camelot",
u"Camelot Documentation",
author,
"Camelot",
"One line description of project.",
"Miscellaneous",
),
] ]
# Documents to append as an appendix to all manuals. # Documents to append as an appendix to all manuals.
@ -356,6 +366,6 @@ texinfo_documents = [
# Example configuration for intersphinx: refer to the Python standard library. # Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = { intersphinx_mapping = {
'https://docs.python.org/2': None, "https://docs.python.org/2": None,
'http://pandas.pydata.org/pandas-docs/stable': None "http://pandas.pydata.org/pandas-docs/stable": None,
} }

View File

@ -110,6 +110,7 @@ This part of the documentation begins with some background information about why
user/how-it-works user/how-it-works
user/quickstart user/quickstart
user/advanced user/advanced
user/faq
user/cli user/cli
The API Documentation/Guide The API Documentation/Guide

View File

@ -618,7 +618,7 @@ Tweak layout generation
Camelot is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences. In some cases (such as `#170 <https://github.com/camelot-dev/camelot/issues/170>`_ and `#215 <https://github.com/camelot-dev/camelot/issues/215>`_), PDFMiner can group characters that should belong to the same sentence into separate sentences. Camelot is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences. In some cases (such as `#170 <https://github.com/camelot-dev/camelot/issues/170>`_ and `#215 <https://github.com/camelot-dev/camelot/issues/215>`_), PDFMiner can group characters that should belong to the same sentence into separate sentences.
To deal with such cases, you can tweak PDFMiner's `LAParams kwargs <https://github.com/euske/pdfminer/blob/master/pdfminer/layout.py#L33>`_ to improve layout generation, by passing the keyword arguments as a dict using ``layout_kwargs`` in :meth:`read_pdf() <camelot.read_pdf>`. To know more about the parameters you can tweak, you can check out `PDFMiner docs <https://euske.github.io/pdfminer/>`_. To deal with such cases, you can tweak PDFMiner's `LAParams kwargs <https://github.com/euske/pdfminer/blob/master/pdfminer/layout.py#L33>`_ to improve layout generation, by passing the keyword arguments as a dict using ``layout_kwargs`` in :meth:`read_pdf() <camelot.read_pdf>`. To know more about the parameters you can tweak, you can check out `PDFMiner docs <https://pdfminersix.rtfd.io/en/latest/reference/composable.html>`_.
:: ::

56
docs/user/faq.rst 100644
View File

@ -0,0 +1,56 @@
.. _faq:
Frequently Asked Questions
==========================
This part of the documentation answers some common questions. To add questions, please open an issue `here <https://github.com/camelot-dev/camelot/issues/new>`_.
Does Camelot work with image-based PDFs?
----------------------------------------
**No**, Camelot only works with text-based PDFs and not scanned documents. (As Tabula `explains <https://github.com/tabulapdf/tabula#why-tabula>`_, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
How to reduce memory usage for long PDFs?
-----------------------------------------
During table extraction from long PDF documents, RAM usage can grow significantly.
A simple workaround is to divide the extraction into chunks, and save extracted data to disk at the end of every chunk.
For more details, check out this code snippet from `@anakin87 <https://github.com/anakin87>`_:
::
import camelot
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i : i + n]
def extract_tables(filepath, pages, chunks=50, export_path=".", params={}):
"""
Divide the extraction work into n chunks. At the end of every chunk,
save data on disk and free RAM.
filepath : str
Filepath or URL of the PDF file.
pages : str, optional (default: '1')
Comma-separated page numbers.
Example: '1,3,4' or '1,4-end' or 'all'.
"""
# get list of pages from camelot.handlers.PDFHandler
handler = camelot.handlers.PDFHandler(filepath)
page_list = handler._get_pages(filepath, pages=pages)
# chunk pages list
page_chunks = list(chunks(page_list, chunks))
# extraction and export
for chunk in page_chunks:
pages_string = str(chunk).replace("[", "").replace("]", "")
tables = camelot.read_pdf(filepath, pages=pages_string, **params)
tables.export(f"{export_path}/tables.csv")

View File

@ -43,8 +43,9 @@ For Ubuntu/MacOS::
For Windows:: For Windows::
>>> import ctypes
>>> from ctypes.util import find_library >>> from ctypes.util import find_library
>>> find_library("".join(("gsdll", str(ctypes.sizeof(ctypes.c_voidp) * 8), ".dll")) >>> find_library("".join(("gsdll", str(ctypes.sizeof(ctypes.c_voidp) * 8), ".dll")))
<name-of-ghostscript-library-on-windows> <name-of-ghostscript-library-on-windows>
**Check:** The output of the ``find_library`` function should not be empty. **Check:** The output of the ``find_library`` function should not be empty.

104
setup.py
View File

@ -6,39 +6,38 @@ from setuptools import find_packages
here = os.path.abspath(os.path.dirname(__file__)) here = os.path.abspath(os.path.dirname(__file__))
about = {} about = {}
with open(os.path.join(here, 'camelot', '__version__.py'), 'r') as f: with open(os.path.join(here, "camelot", "__version__.py"), "r") as f:
exec(f.read(), about) exec(f.read(), about)
with open('README.md', 'r') as f: with open("README.md", "r") as f:
readme = f.read() readme = f.read()
requires = [ requires = [
'chardet>=3.0.4', "chardet>=3.0.4",
'click>=6.7', "click>=6.7",
'numpy>=1.13.3', "numpy>=1.13.3",
'openpyxl>=2.5.8', "openpyxl>=2.5.8",
'pandas>=0.23.4', "pandas>=0.23.4",
'pdfminer.six>=20200726', "pdfminer.six>=20200726",
'PyPDF2>=1.26.0', "PyPDF2>=1.26.0",
'tabulate' "tabulate>=0.8.9",
] ]
cv_requires = [ cv_requires = ["opencv-python>=3.4.2.17"]
'opencv-python>=3.4.2.17'
]
plot_requires = [ plot_requires = [
'matplotlib>=2.2.3', "matplotlib>=2.2.3",
] ]
dev_requires = [ dev_requires = [
'codecov>=2.0.15', "codecov>=2.0.15",
'pytest>=5.4.3', "pytest>=5.4.3",
'pytest-cov>=2.10.0', "pytest-cov>=2.10.0",
'pytest-mpl>=0.11', "pytest-mpl>=0.11",
'pytest-runner>=5.2', "pytest-runner>=5.2",
'Sphinx>=3.1.2' "Sphinx>=3.1.2",
"sphinx-autobuild>=2021.3.14",
] ]
all_requires = cv_requires + plot_requires all_requires = cv_requires + plot_requires
@ -46,36 +45,39 @@ dev_requires = dev_requires + all_requires
def setup_package(): def setup_package():
metadata = dict(name=about['__title__'], metadata = dict(
version=about['__version__'], name=about["__title__"],
description=about['__description__'], version=about["__version__"],
long_description=readme, description=about["__description__"],
long_description_content_type="text/markdown", long_description=readme,
url=about['__url__'], long_description_content_type="text/markdown",
author=about['__author__'], url=about["__url__"],
author_email=about['__author_email__'], author=about["__author__"],
license=about['__license__'], author_email=about["__author_email__"],
packages=find_packages(exclude=('tests',)), license=about["__license__"],
install_requires=requires, packages=find_packages(exclude=("tests",)),
extras_require={ install_requires=requires,
'all': all_requires, extras_require={
'cv': cv_requires, "all": all_requires,
'dev': dev_requires, "cv": cv_requires,
'plot': plot_requires "dev": dev_requires,
}, "plot": plot_requires,
entry_points={ },
'console_scripts': [ entry_points={
'camelot = camelot.cli:cli', "console_scripts": [
], "camelot = camelot.cli:cli",
}, ],
classifiers=[ },
# Trove classifiers classifiers=[
# Full list: https://pypi.python.org/pypi?%3Aaction=list_classifiers # Trove classifiers
'License :: OSI Approved :: MIT License', # Full list: https://pypi.python.org/pypi?%3Aaction=list_classifiers
'Programming Language :: Python :: 3.6', "License :: OSI Approved :: MIT License",
'Programming Language :: Python :: 3.7', "Programming Language :: Python :: 3.6",
'Programming Language :: Python :: 3.8' "Programming Language :: Python :: 3.7",
]) "Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
],
)
try: try:
from setuptools import setup from setuptools import setup
@ -85,5 +87,5 @@ def setup_package():
setup(**metadata) setup(**metadata)
if __name__ == '__main__': if __name__ == "__main__":
setup_package() setup_package()

View File

@ -2800,49 +2800,467 @@ data_stream_layout_kwargs = [
] ]
data_stream_duplicated_text = [ data_stream_duplicated_text = [
['', '2012 BETTER VARIETIES Harvest Report for Minnesota Central [ MNCE ]', '', '', '', '', '', '', '', '', [
'ALL SEASON TEST'], "",
['', 'Doug Toreen, Renville County, MN 55310 [ BIRD ISLAND ]', '', '', '', '', '', '', '', '', "2012 BETTER VARIETIES Harvest Report for Minnesota Central [ MNCE ]",
'1.3 - 2.0 MAT. GROUP'], "",
['PREV. CROP/HERB:', 'Corn / Surpass, Roundup', '', '', '', '', '', '', '', '', 'S2MNCE01'], "",
['SOIL DESCRIPTION:', '', 'Canisteo clay loam, mod. well drained, non-irrigated', '', '', '', '', '', '', '', ''], "",
['SOIL CONDITIONS:', '', 'High P, high K, 6.7 pH, 3.9% OM, Low SCN', '', '', '', '', '', '', '', '30" ROW SPACING'], "",
['TILLAGE/CULTIVATION:', 'conventional w/ fall till', '', '', '', '', '', '', '', '', ''], "",
['PEST MANAGEMENT:', 'Roundup twice', '', '', '', '', '', '', '', '', ''], "",
['SEEDED - RATE:', 'May 15', '140 000 /A', '', '', '', '', '', '', 'TOP 30 for YIELD of 63 TESTED', ''], "",
['HARVESTED - STAND:', 'Oct 3', '122 921 /A', '', '', '', '', '', '', 'AVERAGE of (3) REPLICATIONS', ''], "",
['', '', '', '', 'SCN', 'Seed', 'Yield', 'Moisture', 'Lodging', 'Stand', 'Gross'], "ALL SEASON TEST",
['Company/Brand', 'Product/Brand†', 'Technol.†', 'Mat.', 'Resist.', 'Trmt.†', 'Bu/A', '%', '%', '(x 1000)', ],
'Income'], ['Kruger', 'K2 1901', 'RR2Y', '1.9', 'R', 'Ac,PV', '56.4', '7.6', '0', '126.3', '$846'], [
['Stine', '19RA02 §', 'RR2Y', '1.9', 'R', 'CMB', '55.3', '7.6', '0', '120.0', '$830'], "",
['Wensman', 'W 3190NR2', 'RR2Y', '1.9', 'R', 'Ac', '54.5', '7.6', '0', '119.5', '$818'], "Doug Toreen, Renville County, MN 55310 [ BIRD ISLAND ]",
['Hefty', 'H17Y12', 'RR2Y', '1.7', 'MR', 'I', '53.7', '7.7', '0', '124.4', '$806'], "",
['Dyna-Gro', 'S15RY53', 'RR2Y', '1.5', 'R', 'Ac', '53.6', '7.7', '0', '126.8', '$804'], "",
['LG Seeds', 'C2050R2', 'RR2Y', '2.1', 'R', 'Ac', '53.6', '7.7', '0', '123.9', '$804'], "",
['Titan Pro', '19M42', 'RR2Y', '1.9', 'R', 'CMB', '53.6', '7.7', '0', '121.0', '$804'], "",
['Stine', '19RA02 (2) §', 'RR2Y', '1.9', 'R', 'CMB', '53.4', '7.7', '0', '123.9', '$801'], "",
['Asgrow', 'AG1832 §', 'RR2Y', '1.8', 'MR', 'Ac,PV', '52.9', '7.7', '0', '122.0', '$794'], "",
['Prairie Brand', 'PB-1566R2', 'RR2Y', '1.5', 'R', 'CMB', '52.8', '7.7', '0', '122.9', '$792'], "",
['Channel', '1901R2', 'RR2Y', '1.9', 'R', 'Ac,PV', '52.8', '7.6', '0', '123.4', '$791'], "",
['Titan Pro', '20M1', 'RR2Y', '2.0', 'R', 'Am', '52.5', '7.5', '0', '124.4', '$788'], "1.3 - 2.0 MAT. GROUP",
['Kruger', 'K2-2002', 'RR2Y', '2.0', 'R', 'Ac,PV', '52.4', '7.9', '0', '125.4', '$786'], ],
['Channel', '1700R2', 'RR2Y', '1.7', 'R', 'Ac,PV', '52.3', '7.9', '0', '123.9', '$784'], [
['Hefty', 'H16Y11', 'RR2Y', '1.6', 'MR', 'I', '51.4', '7.6', '0', '123.9', '$771'], "PREV. CROP/HERB:",
['Anderson', '162R2Y', 'RR2Y', '1.6', 'R', 'None', '51.3', '7.5', '0', '119.5', '$770'], "Corn / Surpass, Roundup",
['Titan Pro', '15M22', 'RR2Y', '1.5', 'R', 'CMB', '51.3', '7.8', '0', '125.4', '$769'], "",
['Dairyland', 'DSR-1710R2Y', 'RR2Y', '1.7', 'R', 'CMB', '51.3', '7.7', '0', '122.0', '$769'], "",
['Hefty', 'H20R3', 'RR2Y', '2.0', 'MR', 'I', '50.5', '8.2', '0', '121.0', '$757'], "",
['Prairie Brand', 'PB 1743R2', 'RR2Y', '1.7', 'R', 'CMB', '50.2', '7.7', '0', '125.8', '$752'], "",
['Gold Country', '1741', 'RR2Y', '1.7', 'R', 'Ac', '50.1', '7.8', '0', '123.9', '$751'], "",
['Trelay', '20RR43', 'RR2Y', '2.0', 'R', 'Ac,Ex', '49.9', '7.6', '0', '127.8', '$749'], "",
['Hefty', 'H14R3', 'RR2Y', '1.4', 'MR', 'I', '49.7', '7.7', '0', '122.9', '$746'], "",
['Prairie Brand', 'PB-2099NRR2', 'RR2Y', '2.0', 'R', 'CMB', '49.6', '7.8', '0', '126.3', '$743'], "",
['Wensman', 'W 3174NR2', 'RR2Y', '1.7', 'R', 'Ac', '49.3', '7.6', '0', '122.5', '$740'], "S2MNCE01",
['Kruger', 'K2 1602', 'RR2Y', '1.6', 'R', 'Ac,PV', '48.7', '7.6', '0', '125.4', '$731'], ],
['NK Brand', 'S18-C2 §', 'RR2Y', '1.8', 'R', 'CMB', '48.7', '7.7', '0', '126.8', '$731'], [
['Kruger', 'K2 1902', 'RR2Y', '1.9', 'R', 'Ac,PV', '48.7', '7.5', '0', '124.4', '$730'], "SOIL DESCRIPTION:",
['Prairie Brand', 'PB-1823R2', 'RR2Y', '1.8', 'R', 'None', '48.5', '7.6', '0', '121.0', '$727'], "",
['Gold Country', '1541', 'RR2Y', '1.5', 'R', 'Ac', '48.4', '7.6', '0', '110.4', '$726'], "Canisteo clay loam, mod. well drained, non-irrigated",
['', '', '', '', '', 'Test Average =', '47.6', '7.7', '0', '122.9', '$713'], "",
['', '', '', '', '', 'LSD (0.10) =', '5.7', '0.3', 'ns', '37.8', '566.4'] "",
"",
"",
"",
"",
"",
"",
],
[
"SOIL CONDITIONS:",
"",
"High P, high K, 6.7 pH, 3.9% OM, Low SCN",
"",
"",
"",
"",
"",
"",
"",
'30" ROW SPACING',
],
[
"TILLAGE/CULTIVATION:",
"conventional w/ fall till",
"",
"",
"",
"",
"",
"",
"",
"",
"",
],
["PEST MANAGEMENT:", "Roundup twice", "", "", "", "", "", "", "", "", ""],
[
"SEEDED - RATE:",
"May 15",
"140 000 /A",
"",
"",
"",
"",
"",
"",
"TOP 30 for YIELD of 63 TESTED",
"",
],
[
"HARVESTED - STAND:",
"Oct 3",
"122 921 /A",
"",
"",
"",
"",
"",
"",
"AVERAGE of (3) REPLICATIONS",
"",
],
["", "", "", "", "SCN", "Seed", "Yield", "Moisture", "Lodging", "Stand", "Gross"],
[
"Company/Brand",
"Product/Brand†",
"Technol.†",
"Mat.",
"Resist.",
"Trmt.†",
"Bu/A",
"%",
"%",
"(x 1000)",
"Income",
],
[
"Kruger",
"K2 1901",
"RR2Y",
"1.9",
"R",
"Ac,PV",
"56.4",
"7.6",
"0",
"126.3",
"$846",
],
[
"Stine",
"19RA02 §",
"RR2Y",
"1.9",
"R",
"CMB",
"55.3",
"7.6",
"0",
"120.0",
"$830",
],
[
"Wensman",
"W 3190NR2",
"RR2Y",
"1.9",
"R",
"Ac",
"54.5",
"7.6",
"0",
"119.5",
"$818",
],
["Hefty", "H17Y12", "RR2Y", "1.7", "MR", "I", "53.7", "7.7", "0", "124.4", "$806"],
[
"Dyna-Gro",
"S15RY53",
"RR2Y",
"1.5",
"R",
"Ac",
"53.6",
"7.7",
"0",
"126.8",
"$804",
],
[
"LG Seeds",
"C2050R2",
"RR2Y",
"2.1",
"R",
"Ac",
"53.6",
"7.7",
"0",
"123.9",
"$804",
],
[
"Titan Pro",
"19M42",
"RR2Y",
"1.9",
"R",
"CMB",
"53.6",
"7.7",
"0",
"121.0",
"$804",
],
[
"Stine",
"19RA02 (2) §",
"RR2Y",
"1.9",
"R",
"CMB",
"53.4",
"7.7",
"0",
"123.9",
"$801",
],
[
"Asgrow",
"AG1832 §",
"RR2Y",
"1.8",
"MR",
"Ac,PV",
"52.9",
"7.7",
"0",
"122.0",
"$794",
],
[
"Prairie Brand",
"PB-1566R2",
"RR2Y",
"1.5",
"R",
"CMB",
"52.8",
"7.7",
"0",
"122.9",
"$792",
],
[
"Channel",
"1901R2",
"RR2Y",
"1.9",
"R",
"Ac,PV",
"52.8",
"7.6",
"0",
"123.4",
"$791",
],
[
"Titan Pro",
"20M1",
"RR2Y",
"2.0",
"R",
"Am",
"52.5",
"7.5",
"0",
"124.4",
"$788",
],
[
"Kruger",
"K2-2002",
"RR2Y",
"2.0",
"R",
"Ac,PV",
"52.4",
"7.9",
"0",
"125.4",
"$786",
],
[
"Channel",
"1700R2",
"RR2Y",
"1.7",
"R",
"Ac,PV",
"52.3",
"7.9",
"0",
"123.9",
"$784",
],
["Hefty", "H16Y11", "RR2Y", "1.6", "MR", "I", "51.4", "7.6", "0", "123.9", "$771"],
[
"Anderson",
"162R2Y",
"RR2Y",
"1.6",
"R",
"None",
"51.3",
"7.5",
"0",
"119.5",
"$770",
],
[
"Titan Pro",
"15M22",
"RR2Y",
"1.5",
"R",
"CMB",
"51.3",
"7.8",
"0",
"125.4",
"$769",
],
[
"Dairyland",
"DSR-1710R2Y",
"RR2Y",
"1.7",
"R",
"CMB",
"51.3",
"7.7",
"0",
"122.0",
"$769",
],
["Hefty", "H20R3", "RR2Y", "2.0", "MR", "I", "50.5", "8.2", "0", "121.0", "$757"],
[
"Prairie Brand",
"PB 1743R2",
"RR2Y",
"1.7",
"R",
"CMB",
"50.2",
"7.7",
"0",
"125.8",
"$752",
],
[
"Gold Country",
"1741",
"RR2Y",
"1.7",
"R",
"Ac",
"50.1",
"7.8",
"0",
"123.9",
"$751",
],
[
"Trelay",
"20RR43",
"RR2Y",
"2.0",
"R",
"Ac,Ex",
"49.9",
"7.6",
"0",
"127.8",
"$749",
],
["Hefty", "H14R3", "RR2Y", "1.4", "MR", "I", "49.7", "7.7", "0", "122.9", "$746"],
[
"Prairie Brand",
"PB-2099NRR2",
"RR2Y",
"2.0",
"R",
"CMB",
"49.6",
"7.8",
"0",
"126.3",
"$743",
],
[
"Wensman",
"W 3174NR2",
"RR2Y",
"1.7",
"R",
"Ac",
"49.3",
"7.6",
"0",
"122.5",
"$740",
],
[
"Kruger",
"K2 1602",
"RR2Y",
"1.6",
"R",
"Ac,PV",
"48.7",
"7.6",
"0",
"125.4",
"$731",
],
[
"NK Brand",
"S18-C2 §",
"RR2Y",
"1.8",
"R",
"CMB",
"48.7",
"7.7",
"0",
"126.8",
"$731",
],
[
"Kruger",
"K2 1902",
"RR2Y",
"1.9",
"R",
"Ac,PV",
"48.7",
"7.5",
"0",
"124.4",
"$730",
],
[
"Prairie Brand",
"PB-1823R2",
"RR2Y",
"1.8",
"R",
"None",
"48.5",
"7.6",
"0",
"121.0",
"$727",
],
[
"Gold Country",
"1541",
"RR2Y",
"1.5",
"R",
"Ac",
"48.4",
"7.6",
"0",
"110.4",
"$726",
],
["", "", "", "", "", "Test Average =", "47.6", "7.7", "0", "122.9", "$713"],
["", "", "", "", "", "LSD (0.10) =", "5.7", "0.3", "ns", "37.8", "566.4"],
] ]