pdfly

A command-line tool to extract (meta)data from PDFs and manipulate PDF files at scale.

py-pdf · Since 2022-04-09

Loading score...

A compact command-line toolkit for extracting and manipulating PDF files.

Detailed Introduction

pdfly is a lightweight command-line tool designed to extract metadata and content from PDF files and perform common PDF manipulations. It offers configurable parsing and export options, making it easy to integrate into automation scripts, CI pipelines, or batch-processing workflows.

Main Features

Extraction capabilities for metadata, text, and structured document information.
Batch and scriptable operations suitable for CI/CD or automation tasks.
Extensible configuration for custom output formats and processing steps.
Open-source under the BSD-3-Clause license for broad reuse.

Use Cases

Ideal for large-scale PDF analysis, archival indexing, post-OCR processing, and automated data extraction pipelines. Developers can call pdfly from scripts or CI jobs to include PDF processing as part of a document workflow.

Technical Features

pdfly is implemented in Python and exposes a command-line interface (CLI, Command Line Interface) and programmable APIs. It builds on established PDF parsing libraries to ensure compatibility and reliability. The source code and documentation are hosted on GitHub and Read the Docs.

Core Content

Core Content

Technology

Technology

More

More

AI Infrastructure

AI Infrastructure

Explore

Explore

Connect

Connect

Quick Links

Quick Links

LinkedIn

LinkedIn

Follow on X

Follow on X

pdfly

Detailed Introduction

Main Features

Use Cases

Technical Features

Score Breakdown

Related Resources

PicoClaw

Agent Development Kit Web (ADK Web)

Claude Code Agents & Plugins