Core Concepts
The mental model before you build.
Pipeline stages
RAG-Forge treats a RAG pipeline as a sequence of discrete stages. Each stage has its own CLI command, its own configuration knobs, and its own observability span.
| Stage | Command | What it does |
|---|---|---|
| Ingest & parse | index --source | Pulls documents, extracts text and structure |
| Chunk | chunk | Splits parsed content into retrieval units |
| Index | index | Embeds chunks and writes them to a vector store |
| Query | query | Retrieves chunks and generates an answer |
| Evaluate | audit, assess | Measures retrieval and answer quality against a golden set |
You can run stages independently (e.g. chunk alone for experimentation) or in the standard index → query flow.
Configuration
Every scaffolded project keeps its RAG-Forge configuration in the [tool.rag-forge] section of its pyproject.toml. This is the single source of truth for chunking strategy, retrieval settings, and provider choices; evaluation thresholds for the CI audit gate live alongside the golden set in eval/config.yaml. RAG-Forge validates the configuration at init time — misconfigurations fail fast via Pydantic, before any documents are touched.
The RAG Maturity Model (RMM)
The RMM is a 0-to-5 scoring ladder for RAG pipelines. RAG-Forge measures your pipeline against it on every audit run, turning “improve RAG quality” into a concrete, measurable goal with a clear next step at each level. See RAG Maturity Model for the full breakdown.
Provider model — bring your own keys
RAG-Forge is provider-agnostic. Embedding, generation, and reranking providers are all user-configurable. Every command that calls an external model accepts an explicit provider flag (e.g. --embedding openai, --generator claude). You are never locked to a specific vendor, and local/free options (Ollama, mock providers) are always available.
Observability by default
Every pipeline stage emits an OpenTelemetry span with standardised attributes. Plug in Langfuse or any OTEL-compatible backend and you get request-level tracing with zero extra instrumentation code. See Observability.
Language boundaries
The CLI and MCP server are TypeScript (Node 20+). All RAG logic — chunking, retrieval, evaluation, tracing — is Python 3.11+. The CLI delegates to Python via a subprocess bridge (uv run python -m <module>). This means you can use the CLI from any Node project without caring about Python internals, and you can also call the Python packages directly if you prefer.