Skip to Content

rag-forge query

Execute a RAG query against the indexed pipeline.

Synopsis

rag-forge query <question> [options]

Description

query takes a natural-language question, retrieves the most relevant chunks from the vector store, and passes them to a language model to generate a grounded answer. It prints the answer, the model used, and the retrieved source chunks with their relevance scores.

The command supports three retrieval strategies: dense (vector similarity only), sparse (BM25 keyword matching), and hybrid (a Reciprocal Rank Fusion blend of both). For hybrid retrieval, --sparse-index-path must point to the index created during rag-forge index and --alpha controls the dense/sparse weighting.

Optional guardrails (--input-guard, --output-guard) can block queries or answers that violate safety rules. If a query is blocked the command exits gracefully with a warning rather than an error.

Semantic caching (--cache) avoids duplicate LLM calls for queries that are sufficiently similar to a previous answer (controlled by --cache-similarity). Agent mode (--agent-mode) enables multi-query decomposition where complex questions are broken into sub-queries before retrieval.

Arguments

ArgumentRequiredDescription
questionYesThe question to ask the RAG pipeline

Options

FlagDefaultDescription
-k, --top-k <number>5Number of chunks to retrieve
-e, --embedding <provider>mockEmbedding provider: openai | local | mock
-g, --generator <provider>mockGeneration provider: claude | openai | mock
-c, --collection <name>rag-forgeCollection name to query
--strategy <type>denseRetrieval strategy: dense | sparse | hybrid
--alpha <number>0.6RRF alpha weighting for hybrid retrieval (0.0–1.0)
--reranker <type>noneReranker: none | cohere | bge-local
--sparse-index-path <path>Path to BM25 sparse index
--input-guardEnable input security guard
--output-guardEnable output security guard
--faithfulness-threshold <number>0.85Faithfulness score threshold (0.0–1.0)
--rate-limit <number>60Max queries per minute
--cacheEnable semantic query caching
--cache-ttl <seconds>3600Cache TTL in seconds
--cache-similarity <threshold>0.95Cosine similarity threshold for cache hits
--agent-modeEnable multi-query decomposition

Examples

Basic query with mock providers

rag-forge query "What is the refund policy?"

Production query with Claude and OpenAI embeddings

rag-forge query "Summarise the onboarding process" \ --embedding openai \ --generator claude \ --collection my-project

Hybrid retrieval with reranking

rag-forge query "How do I reset my password?" \ --strategy hybrid \ --sparse-index-path ./bm25.pkl \ --reranker cohere \ --top-k 10

Agentic multi-query decomposition

rag-forge query "Compare our pricing tiers and explain the enterprise SLA" \ --agent-mode \ --generator claude