arXiv Paper Curator

A six-week, hands-on course by Jam With AI that teaches building a production-ready Retrieval-Augmented Generation system.

Author: Jam With AI

Since: 2025-08-06

Visit Website GitHub

Detailed Introduction

The arXiv Paper Curator is a six-week, hands-on course from Jam With AI aimed at engineers and researchers who want to build production-grade Retrieval-Augmented Generation (RAG) systems. The curriculum walks learners through infrastructure setup, paper ingestion and PDF parsing, OpenSearch/BM25 indexing, intelligent chunking and hybrid retrieval, local LLM integration (Ollama), and interactive interfaces with monitoring and caching. See the course page for more details.

Main Features

Week-by-week practical path from infrastructure (Docker, FastAPI, PostgreSQL) to production monitoring (Langfuse, Redis).
Full RAG engineering: BM25 keyword search foundation, progressive introduction of embedding-based retrieval and hybrid fusion.
Production-minded implementations: intelligent chunking, index optimization, streaming SSE responses and a Gradio-based UI.

Use Cases

Academic research assistant: automatically fetch and index arXiv papers for exploratory search and Q&A.
Enterprise document search: convert document collections into QA-ready knowledge stores supporting hybrid retrieval and source attribution.
Teaching and engineering practice: a reproducible codebase and notebooks to learn production RAG architecture.

Technical Features

Retrieval-first design: emphasizes BM25 and keyword search as the backbone of production systems, augmented with vectors when needed.
Hybrid retrieval and chunking: section-aware chunking and RRF/hybrid fusion deliver both precision and semantic recall.
Local generation and observability: supports Ollama local LLMs, Gradio UI, Langfuse tracing, and Redis caching for performance and maintainability.

arXiv Paper Curator

Detailed Introduction

Main Features

Use Cases

Technical Features

Resource Info

Related Resources

Rhesis

Valkey

DuckDB