Prompt Caching
Prompt caching
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. The first matching request writes cache tokens (cacheWrite), and later matching requests can read them back (cacheRead).
Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
This page covers all cache-related knobs that affect prompt reuse and token cost.
For Anthropic pricing details, see: https://docs.anthropic.com/docs/build-with-claude/prompt-caching
Primary knobs
cacheRetention (per-agent)
Set cache retention in per-agent params:
agents: list: - id: "research" params: cacheRetention: "short" # none | short | long - id: "alerts" params: cacheRetention: "none"Legacy cacheControlTtl
Legacy values are still accepted and mapped:
5m->short1h->long
Prefer cacheRetention for new config.
contextPruning.mode: "cache-ttl"
Prunes old tool-result context after cache TTL windows so post-idle requests do not re-cache oversized history.
agents: defaults: contextPruning: mode: "cache-ttl" ttl: "1h"See Session Pruning for full behavior.
Heartbeat keep-warm
Heartbeat can keep cache windows warm and reduce repeated cache writes after idle gaps.
agents: defaults: heartbeat: every: "55m"Per-agent heartbeat is supported at agents.list[].heartbeat.
Provider behavior
Anthropic (direct API)
cacheRetentionis supported.- When no explicit
cacheRetentionis set for an Anthropic model ref, RemoteClaw defaults to"short".
Amazon Bedrock
- Anthropic Claude model refs on Bedrock support
cacheRetention— the CLI agent passes the value through to the Bedrock API. - Non-Anthropic Bedrock models do not support prompt caching;
cacheRetentionhas no effect on them.
OpenRouter Anthropic models
For openrouter/anthropic/* model refs, the CLI agent handles Anthropic cache_control headers on system/developer prompt blocks to improve prompt-cache reuse.
Other providers
Whether cacheRetention has any effect depends on the CLI agent and the underlying model API. For providers without prompt-caching support, the setting is silently ignored.
Tuning patterns
Mixed traffic (recommended default)
Keep a long-lived baseline on your main agent, disable caching on bursty notifier agents:
agents: list: - id: "research" default: true params: cacheRetention: "long" heartbeat: every: "55m" - id: "alerts" params: cacheRetention: "none"Cost-first baseline
- Set baseline
cacheRetention: "short". - Enable
contextPruning.mode: "cache-ttl". - Keep heartbeat below your TTL only for agents that benefit from warm caches.
Cache diagnostics
RemoteClaw exposes dedicated cache-trace diagnostics for agent runs.
diagnostics.cacheTrace config
diagnostics: cacheTrace: enabled: true filePath: "~/.remoteclaw/logs/cache-trace.jsonl" # optional includeMessages: false # default true includePrompt: false # default true includeSystem: false # default trueDefaults:
filePath:$REMOTECLAW_STATE_DIR/logs/cache-trace.jsonlincludeMessages:trueincludePrompt:trueincludeSystem:true
Env toggles (one-off debugging)
REMOTECLAW_CACHE_TRACE=1enables cache tracing.REMOTECLAW_CACHE_TRACE_FILE=/path/to/cache-trace.jsonloverrides output path.REMOTECLAW_CACHE_TRACE_MESSAGES=0|1toggles full message payload capture.REMOTECLAW_CACHE_TRACE_PROMPT=0|1toggles prompt text capture.REMOTECLAW_CACHE_TRACE_SYSTEM=0|1toggles system prompt capture.
What to inspect
- Cache trace events are JSONL and include staged snapshots like
session:loaded,prompt:before,stream:context, andsession:after. - Per-turn cache token impact is visible in normal usage surfaces via
cacheReadandcacheWrite(for example/usage fulland session usage summaries).
Quick troubleshooting
- High
cacheWriteon most turns: check for volatile system-prompt inputs and verify that your model/provider supports prompt caching. - No effect from
cacheRetention: confirm the setting is present in the agent’sparamsblock and that the CLI agent and provider support it. - Non-Anthropic Bedrock models ignore
cacheRetention— this is expected.
Related docs: