Detailed Introduction
Pipet, created by bjesus, is a command-line web scraping and data extraction tool designed as a “swiss-army” lightweight scraper. It supports three query modes: HTML (CSS selectors), JSON (GJSON syntax), and Playwright (client-side JavaScript execution), and it leverages curl or an embedded browser for resource retrieval. Pipet uses .pipet query scripts to describe scraping workflows and combines Unix-style pipes and template rendering to output results as plain text, JSON, or templated output for direct consumption in terminals, automation scripts, or CI pipelines.
Main Features
- Multi-mode scraping: native support for HTML, JSON, and Playwright query types.
- CLI-first design: integrates seamlessly with
curl, pipes, and common Unix tools. - Flexible outputs: supports plain text, JSON export, and template rendering.
- Multiple distribution channels: releases, Homebrew, AUR, and Nix packages available.
Use Cases
- Monitor web pages for changes and trigger notifications or commands on updates.
- Extract structured data from complex pages or APIs and export to CSV/JSON for analysis.
- Rapidly prototype and validate scraping rules during development and testing.
- Integrate lightweight scraping into CI/automation pipelines to fetch live metrics or statuses.
Technical Features
- Implemented in Go, producing standalone, low-overhead binaries with fast startup.
- Integrates
curland Playwright as backends for robust resource retrieval across scenarios. - Uses GJSON for efficient JSON path queries, simplifying processing of nested API responses.
- Well-documented repository with examples for installation and common usage patterns.