Update faq

2021-06-28 00:15:43 +05:30 · 2021-06-28 00:15:43 +05:30 · 2aaa913c40
parent ba5be43005
commit 2aaa913c40
2 changed files with 34 additions and 28 deletions
--- a/docs/index.rst
+++ b/docs/index.rst
@ -109,6 +109,7 @@ This part of the documentation begins with some background information about why
   user/install
   user/how-it-works
   user/quickstart
   user/faq
   user/advanced
   user/cli
--- a/docs/user/faq.rst
+++ b/docs/user/faq.rst
@ -1,46 +1,51 @@
 .. _faq:
-FAQ
+Frequently Asked Questions
-===
+==========================
 This part of the documentation answers some common questions. If you want to add some questions you can simply open an issue `here <https://github.com/camelot-dev/camelot/issues/new>`_.
 This part of the documentation answers some common questions. To add questions, please open an issue `here <https://github.com/camelot-dev/camelot/issues/new>`_.
 How to reduce memory usage for long PDFs?
---------------------------------------------------
+-----------------------------------------
 During table extraction from long PDF documents, RAM usage can grow significantly.
-A simple workaround is to divide the extraction into some chunks (for example, chunks of 50 pages); at the end of every chunk extraction, data are saved to disk.
+A simple workaround is to divide the extraction into chunks, and save extracted data to disk at the end of every chunk.
-For more information, refer to this code snippet from `@anakin87 <https://github.com/anakin87>`_:
+For more details, check out this code snippet from `@anakin87 <https://github.com/anakin87>`_:
-.. code-block:: python3
+::
    import camelot
    def chunks(l, n):
        """Yield successive n-sized chunks from l."""
        for i in range(0, len(l), n):
-            yield l[i:i + n]
+            yield l[i : i + n]
-    def extract_tables_with_less_memory_usage(filepath, pages, params={}, 
+
- export_path='.', chunks_length=50):
+    def extract_tables(filepath, pages, chunks=50, export_path=".", params={}):
        """
-        Control page number
+        Divide the extraction work into n chunks. At the end of every chunk,
-        and subdivide the extraction work into n-pages parts (chunks_length).
+        save data on disk and free RAM.
-        At the end of every part, save the data on disk and free ram
+
        filepath : str
            Filepath or URL of the PDF file.
        pages : str, optional (default: '1')
            Comma-separated page numbers.
            Example: '1,3,4' or '1,4-end' or 'all'.
        """
-        # get list of document pages from Camelot handler
+        # get list of pages from camelot.handlers.PDFHandler
-        handler=camelot.handlers.PDFHandler(filepath)
+        handler = camelot.handlers.PDFHandler(filepath)
-        pages_list=handler._get_pages(filepath,pages=pages)
+        page_list = handler._get_pages(filepath, pages=pages)
        # chunk pages list
-        pages_chunks=list(chunks(pages_list,chunks_length))
+        page_chunks = list(chunks(page_list, chunks))
        # extraction and export
-        for chunk in pages_chunks:
+        for chunk in page_chunks:
-            pages_string=str(chunk).replace('[','').replace(']','')
+            pages_string = str(chunk).replace("[", "").replace("]", "")
-            tables = camelot.read_pdf(filepath, pages=pages_string,**params)
+            tables = camelot.read_pdf(filepath, pages=pages_string, **params)
-            tables.export(f'{export_path}/tables.json',f='json')	
+            tables.export(f"{export_path}/tables.csv")