Inference CLI
Inference CLI
remoteclaw infer is the canonical headless surface for provider-backed inference workflows.
It intentionally exposes capability families, not raw gateway RPC names and not raw agent tool ids.
Turn infer into a skill
Copy and paste this to an agent:
Read https://docs.remoteclaw.org/cli/infer, then create a skill that routes my common workflows to `remoteclaw infer`.Focus on model runs, image generation, video generation, audio transcription, TTS, web search, and embeddings.A good infer-based skill should:
- map common user intents to the correct infer subcommand
- include a few canonical infer examples for the workflows it covers
- prefer
remoteclaw infer ...in examples and suggestions - avoid re-documenting the entire infer surface inside the skill body
Typical infer-focused skill coverage:
remoteclaw infer model runremoteclaw infer image generateremoteclaw infer audio transcriberemoteclaw infer tts convertremoteclaw infer web searchremoteclaw infer embedding create
Why use infer
remoteclaw infer provides one consistent CLI for provider-backed inference tasks inside RemoteClaw.
Benefits:
- Use the providers and models already configured in RemoteClaw instead of wiring up one-off wrappers for each backend.
- Keep model, image, audio transcription, TTS, video, web, and embedding workflows under one command tree.
- Use a stable
--jsonoutput shape for scripts, automation, and agent-driven workflows. - Prefer a first-party RemoteClaw surface when the task is fundamentally “run inference.”
- Use the normal local path without requiring the gateway for most infer commands.
Command tree
remoteclaw infer list inspect
model run list inspect providers auth login auth logout auth status
image generate edit describe describe-many providers
audio transcribe providers
tts convert voices providers status enable disable set-provider
video generate describe providers
web search fetch providers
embedding create providersCommon tasks
This table maps common inference tasks to the corresponding infer command.
| Task | Command | Notes |
|---|---|---|
| Run a text/model prompt | remoteclaw infer model run --prompt "..." --json | Uses the normal local path by default |
| Generate an image | remoteclaw infer image generate --prompt "..." --json | Use image edit when starting from an existing file |
| Describe an image file | remoteclaw infer image describe --file ./image.png --json | --model must be <provider/model> |
| Transcribe audio | remoteclaw infer audio transcribe --file ./memo.m4a --json | --model must be <provider/model> |
| Synthesize speech | remoteclaw infer tts convert --text "..." --output ./speech.mp3 --json | tts status is gateway-oriented |
| Generate a video | remoteclaw infer video generate --prompt "..." --json | |
| Describe a video file | remoteclaw infer video describe --file ./clip.mp4 --json | --model must be <provider/model> |
| Search the web | remoteclaw infer web search --query "..." --json | |
| Fetch a web page | remoteclaw infer web fetch --url https://example.com --json | |
| Create embeddings | remoteclaw infer embedding create --text "..." --json |
Behavior
remoteclaw infer ...is the primary CLI surface for these workflows.- Use
--jsonwhen the output will be consumed by another command or script. - Use
--provideror--model provider/modelwhen a specific backend is required. - For
image describe,audio transcribe, andvideo describe,--modelmust use the form<provider/model>. - Stateless execution commands default to local.
- Gateway-managed state commands default to gateway.
- The normal local path does not require the gateway to be running.
Model
Use model for provider-backed text inference and model/provider inspection.
remoteclaw infer model run --prompt "Reply with exactly: smoke-ok" --jsonremoteclaw infer model run --prompt "Summarize this changelog entry" --provider openai --jsonremoteclaw infer model providers --jsonremoteclaw infer model inspect --name gpt-5.4 --jsonNotes:
model runreuses the agent runtime so provider/model overrides behave like normal agent execution.model auth login,model auth logout, andmodel auth statusmanage saved provider auth state.
Image
Use image for generation, edit, and description.
remoteclaw infer image generate --prompt "friendly lobster illustration" --jsonremoteclaw infer image generate --prompt "cinematic product photo of headphones" --jsonremoteclaw infer image describe --file ./photo.jpg --jsonremoteclaw infer image describe --file ./ui-screenshot.png --model openai/gpt-4.1-mini --jsonNotes:
- Use
image editwhen starting from existing input files. - For
image describe,--modelmust be<provider/model>.
Audio
Use audio for file transcription.
remoteclaw infer audio transcribe --file ./memo.m4a --jsonremoteclaw infer audio transcribe --file ./team-sync.m4a --language en --prompt "Focus on names and action items" --jsonremoteclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --jsonNotes:
audio transcribeis for file transcription, not realtime session management.--modelmust be<provider/model>.
TTS
Use tts for speech synthesis and TTS provider state.
remoteclaw infer tts convert --text "hello from remoteclaw" --output ./hello.mp3 --jsonremoteclaw infer tts convert --text "Your build is complete" --output ./build-complete.mp3 --jsonremoteclaw infer tts providers --jsonremoteclaw infer tts status --jsonNotes:
tts statusdefaults to gateway because it reflects gateway-managed TTS state.- Use
tts providers,tts voices, andtts set-providerto inspect and configure TTS behavior.
Video
Use video for generation and description.
remoteclaw infer video generate --prompt "cinematic sunset over the ocean" --jsonremoteclaw infer video generate --prompt "slow drone shot over a forest lake" --jsonremoteclaw infer video describe --file ./clip.mp4 --jsonremoteclaw infer video describe --file ./clip.mp4 --model openai/gpt-4.1-mini --jsonNotes:
--modelmust be<provider/model>forvideo describe.
Web
Use web for search and fetch workflows.
remoteclaw infer web search --query "RemoteClaw docs" --jsonremoteclaw infer web search --query "RemoteClaw infer web providers" --jsonremoteclaw infer web fetch --url https://docs.remoteclaw.org/cli/infer --jsonremoteclaw infer web providers --jsonNotes:
- Use
web providersto inspect available, configured, and selected providers.
Embedding
Use embedding for vector creation and embedding provider inspection.
remoteclaw infer embedding create --text "friendly lobster" --jsonremoteclaw infer embedding create --text "customer support ticket: delayed shipment" --model openai/text-embedding-3-large --jsonremoteclaw infer embedding providers --jsonJSON output
Infer commands normalize JSON output under a shared envelope:
{ "ok": true, "capability": "image.generate", "transport": "local", "provider": "openai", "model": "gpt-image-1", "attempts": [], "outputs": []}Top-level fields are stable:
okcapabilitytransportprovidermodelattemptsoutputserror
Common pitfalls
# Badremoteclaw infer media image generate --prompt "friendly lobster"
# Goodremoteclaw infer image generate --prompt "friendly lobster"# Badremoteclaw infer audio transcribe --file ./memo.m4a --model whisper-1 --json
# Goodremoteclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --jsonNotes
remoteclaw capability ...is an alias forremoteclaw infer ....