uv. However, uv and venv are not required.
To work with all supported file types, run:
.txt), HTML files (.html), XML files (.xml), and emails (.eml, .msg, and .p7s) by default.
To further conserve disk space and reduce code dependencies, you can run the following command instead, replacing <extra> with the appropriate extra for the target file type:
all-docs(for all supported file types in this list)csv(for.csvfiles only)docx(for.docand.docxfiles only)epub(for.epubfiles only)image(for all supported image file types:.bmp,.heic,.jpeg,.png, and.tiff)md(for.mdfiles only)odt(for.odtfiles only)org(for.orgfiles only)pdf(for.pdffiles only)pptx(for.pptand.pptxfiles only)rst(for.rstfiles only)rtf(for.rtffiles only)tsv(for.tsvfiles only)xlsx(for.xlsand.xlsxfiles only)
- libmagic-dev (for filetype detection)
- poppler-utils and tesseract-ocr (for images and PDFs), and
tesseract-lang(for additional language support) - libreoffice (for Microsoft Office documents)
- pandoc (for
.epub,.odt, and.rtffiles. For.rtffiles, you must have version 2.14.2 or newer. Running this script will install the correct version for you.)

