rag-forge chunk
Preview chunking without indexing.
Synopsis
rag-forge chunk [options]Description
chunk runs the parse and chunking stages and prints statistics — total chunk count, average/min/max token sizes, total token count, and a sample of chunk previews — without writing anything to the vector store. Like parse, it is a dry-run diagnostic tool.
Use chunk to tune your chunking strategy and chunk size before committing to a full index run. Different strategies suit different content types: fixed is predictable, recursive handles most prose well and is the default, semantic groups semantically related sentences, structural respects document structure (headings, code fences), and llm-driven uses a language model to decide split points.
The command delegates to the Python rag_forge_core.cli module.
Options
| Flag | Default | Description |
|---|---|---|
-s, --source <directory> | ./docs | Source directory to chunk |
--strategy <type> | recursive | Chunking strategy: fixed | recursive | semantic | structural | llm-driven |
--chunk-size <tokens> | — | Target chunk size in tokens |
Examples
Preview with default settings
rag-forge chunkTry semantic chunking on a custom source
rag-forge chunk --source ./data --strategy semanticTune chunk size
rag-forge chunk --strategy fixed --chunk-size 256Related commands
rag-forge parse— run just the extraction stagerag-forge index— run the full pipeline after you are happy with chunk output