Pipet

Pipet is a developer-focused command-line web scraping and data extraction tool supporting HTML, JSON and Playwright query modes.

Author: bjesus

Since: 2024-08-31

Visit Website GitHub

Detailed Introduction

Pipet, created by bjesus, is a command-line web scraping and data extraction tool designed as a “swiss-army” lightweight scraper. It supports three query modes: HTML (CSS selectors), JSON (GJSON syntax), and Playwright (client-side JavaScript execution), and it leverages curl or an embedded browser for resource retrieval. Pipet uses .pipet query scripts to describe scraping workflows and combines Unix-style pipes and template rendering to output results as plain text, JSON, or templated output for direct consumption in terminals, automation scripts, or CI pipelines.

Main Features

Multi-mode scraping: native support for HTML, JSON, and Playwright query types.
CLI-first design: integrates seamlessly with curl, pipes, and common Unix tools.
Flexible outputs: supports plain text, JSON export, and template rendering.
Multiple distribution channels: releases, Homebrew, AUR, and Nix packages available.

Use Cases

Monitor web pages for changes and trigger notifications or commands on updates.
Extract structured data from complex pages or APIs and export to CSV/JSON for analysis.
Rapidly prototype and validate scraping rules during development and testing.
Integrate lightweight scraping into CI/automation pipelines to fetch live metrics or statuses.

Technical Features

Implemented in Go, producing standalone, low-overhead binaries with fast startup.
Integrates curl and Playwright as backends for robust resource retrieval across scenarios.
Uses GJSON for efficient JSON path queries, simplifying processing of nested API responses.
Well-documented repository with examples for installation and common usage patterns.

Pipet

Detailed Introduction

Main Features

Use Cases

Technical Features

Resource Info

Related Resources

gtr — Git Worktree Runner

Katana

Flox