pdfly

A command-line tool to extract (meta)data from PDFs and manipulate PDF files at scale.

Author: py-pdf

Since: 2022-04-09

A compact command-line toolkit for extracting and manipulating PDF files.

Detailed Introduction

pdfly is a lightweight command-line tool designed to extract metadata and content from PDF files and perform common PDF manipulations. It offers configurable parsing and export options, making it easy to integrate into automation scripts, CI pipelines, or batch-processing workflows.

Main Features

Extraction capabilities for metadata, text, and structured document information.
Batch and scriptable operations suitable for CI/CD or automation tasks.
Extensible configuration for custom output formats and processing steps.
Open-source under the BSD-3-Clause license for broad reuse.

Use Cases

Ideal for large-scale PDF analysis, archival indexing, post-OCR processing, and automated data extraction pipelines. Developers can call pdfly from scripts or CI jobs to include PDF processing as part of a document workflow.

Technical Features

pdfly is implemented in Python and exposes a command-line interface (CLI, Command Line Interface) and programmable APIs. It builds on established PDF parsing libraries to ensure compatibility and reliability. The source code and documentation are hosted on GitHub and Read the Docs.

pdfly

Detailed Introduction

Main Features

Use Cases

Technical Features

Resource Info

Related Resources

gtr — Git Worktree Runner

Katana

Flox