A curated list of AI tools and resources for developers, see the AI Resources .

arXiv Paper Curator

A six-week, hands-on course by Jam With AI that teaches building a production-ready Retrieval-Augmented Generation system.

Detailed Introduction

The arXiv Paper Curator is a six-week, hands-on course from Jam With AI aimed at engineers and researchers who want to build production-grade Retrieval-Augmented Generation (RAG) systems. The curriculum walks learners through infrastructure setup, paper ingestion and PDF parsing, OpenSearch/BM25 indexing, intelligent chunking and hybrid retrieval, local LLM integration (Ollama), and interactive interfaces with monitoring and caching. See the course page for more details.

Main Features

  • Week-by-week practical path from infrastructure (Docker, FastAPI, PostgreSQL) to production monitoring (Langfuse, Redis).
  • Full RAG engineering: BM25 keyword search foundation, progressive introduction of embedding-based retrieval and hybrid fusion.
  • Production-minded implementations: intelligent chunking, index optimization, streaming SSE responses and a Gradio-based UI.

Use Cases

  • Academic research assistant: automatically fetch and index arXiv papers for exploratory search and Q&A.
  • Enterprise document search: convert document collections into QA-ready knowledge stores supporting hybrid retrieval and source attribution.
  • Teaching and engineering practice: a reproducible codebase and notebooks to learn production RAG architecture.

Technical Features

  • Retrieval-first design: emphasizes BM25 and keyword search as the backbone of production systems, augmented with vectors when needed.
  • Hybrid retrieval and chunking: section-aware chunking and RRF/hybrid fusion deliver both precision and semantic recall.
  • Local generation and observability: supports Ollama local LLMs, Gradio UI, Langfuse tracing, and Redis caching for performance and maintainability.
arXiv Paper Curator
Resource Info
📚 RAG 💾 Data 🛠️ Dev Tools 📖 Tutorial 🌱 Open Source