Detailed Introduction
The arXiv Paper Curator is a six-week, hands-on course from Jam With AI aimed at engineers and researchers who want to build production-grade Retrieval-Augmented Generation (RAG) systems. The curriculum walks learners through infrastructure setup, paper ingestion and PDF parsing, OpenSearch/BM25 indexing, intelligent chunking and hybrid retrieval, local LLM integration (Ollama), and interactive interfaces with monitoring and caching. See the
course page
for more details.
Main Features
- Week-by-week practical path from infrastructure (Docker, FastAPI, PostgreSQL) to production monitoring (Langfuse, Redis).
- Full RAG engineering: BM25 keyword search foundation, progressive introduction of embedding-based retrieval and hybrid fusion.
- Production-minded implementations: intelligent chunking, index optimization, streaming SSE responses and a Gradio-based UI.
Use Cases
- Academic research assistant: automatically fetch and index arXiv papers for exploratory search and Q&A.
- Enterprise document search: convert document collections into QA-ready knowledge stores supporting hybrid retrieval and source attribution.
- Teaching and engineering practice: a reproducible codebase and notebooks to learn production RAG architecture.
Technical Features
- Retrieval-first design: emphasizes BM25 and keyword search as the backbone of production systems, augmented with vectors when needed.
- Hybrid retrieval and chunking: section-aware chunking and RRF/hybrid fusion deliver both precision and semantic recall.
- Local generation and observability: supports Ollama local LLMs, Gradio UI, Langfuse tracing, and Redis caching for performance and maintainability.