Skip to Content

rag-forge chunk

Preview chunking without indexing.

Synopsis

rag-forge chunk [options]

Description

chunk runs the parse and chunking stages and prints statistics — total chunk count, average/min/max token sizes, total token count, and a sample of chunk previews — without writing anything to the vector store. Like parse, it is a dry-run diagnostic tool.

Use chunk to tune your chunking strategy and chunk size before committing to a full index run. Different strategies suit different content types: fixed is predictable, recursive handles most prose well and is the default, semantic groups semantically related sentences, structural respects document structure (headings, code fences), and llm-driven uses a language model to decide split points.

The command delegates to the Python rag_forge_core.cli module.

Options

FlagDefaultDescription
-s, --source <directory>./docsSource directory to chunk
--strategy <type>recursiveChunking strategy: fixed | recursive | semantic | structural | llm-driven
--chunk-size <tokens>Target chunk size in tokens

Examples

Preview with default settings

rag-forge chunk

Try semantic chunking on a custom source

rag-forge chunk --source ./data --strategy semantic

Tune chunk size

rag-forge chunk --strategy fixed --chunk-size 256