How ClawPipe Works
Every prompt passes through a pipeline of optimization stages. Each stage independently reduces cost, latency, or both. Stages that are not needed are skipped automatically.
1. Agent Booster
Skip the LLM entirely
The Booster applies deterministic transforms to resolve prompts that don't need AI. Math expressions, date calculations, JSON formatting, unit conversions, and UUID generation are handled instantly at zero cost.
When it fires: Every request passes through the Booster first. If it can resolve the prompt, the pipeline short-circuits and returns immediately.
Cost impact: 100% savings on boosted requests. Typical apps see 5-15% of prompts resolved by the Booster.
2. RAG Pipeline
Retrieval-augmented generation
Retrieves relevant documents from a pluggable vector store and prepends them as context. Supports any embedding provider and configurable document limits.
When it fires: When a RAG retriever is configured. Runs before packing to ensure retrieved documents are also compressed.
Cost impact: Increases input tokens but dramatically improves response quality, reducing the need for follow-up calls.
3. Context Packer
Compress context windows
Removes redundancy, deduplicates repeated content, strips boilerplate, and compresses the prompt. Typical savings: 20-60% fewer input tokens.
When it fires: On every request where enablePacker is true (default). Runs after RAG so retrieved documents are also compressed.
Cost impact: 20-60% reduction in input tokens. The savings percentage is reported in result.meta.contextSavings.
4. Semantic Cache
Prompt deduplication
Two-layer cache: hash-based exact matching and embedding-based semantic matching. Similar prompts return cached results in milliseconds. Configurable TTL and LRU eviction.
When it fires: After packing, before routing. Checks both local in-memory cache and optional KV-backed persistent cache.
Cost impact: 100% savings on cache hits. Cache hit rates of 15-40% are typical for production workloads.
5. Self-Learning Router
Smart model selection
Analyzes prompt complexity and selects the cheapest model that meets quality requirements. Considers cost, latency, and quality scores. Routing weights are updated after every call based on actual outcomes.
When it fires: On every non-cached, non-boosted request. Uses learned weights from the Learner stage.
Cost impact: 30-70% savings by routing simple prompts to cheaper models. Complex prompts still go to premium models.
6. Swarm Orchestration
Multi-model consensus
Fans out a prompt to N models in parallel. Four strategies: first (fastest response), vote (majority consensus), best (highest quality score), merge (combine responses).
When it fires: Only when explicitly configured with a Swarm instance. Used for high-stakes prompts where accuracy matters more than cost.
Cost impact: Increases cost (N model calls) but improves reliability and accuracy for critical decisions.
7. Multi-Provider Gateway
One API, every provider
Dispatches to OpenAI, Anthropic, DeepSeek, Groq, Mistral, and local models through a unified interface. Includes circuit breaker protection with configurable failure thresholds and automatic recovery.
When it fires: On every request that reaches the call stage. Handles provider-specific API formats, streaming, and error handling.
Cost impact: No direct savings, but automatic failover prevents downtime-related costs.
8. Learner
Continuous improvement
Tracks every call's outcome (latency, token usage, cost) and updates the Router's weights. Weights are persisted to D1 so routing improves across sessions and deployments.
When it fires: After every successful gateway call. Updates are batched and persisted periodically.
Cost impact: Indirect savings through improved routing accuracy over time. Routing quality improves with volume.
Enterprise Controls
In addition to the pipeline stages, ClawPipe provides enterprise-grade controls that run alongside the pipeline:
- Budget caps — hard and soft USD limits that reject or warn on exceeded budgets
- Rate limiting — per-project daily call limits matching your pricing tier
- Circuit breaker — automatic provider isolation on repeated failures
- Allowlist / denylist — restrict which providers and models can be used
- Audit logging — timestamped logs of every action for compliance
- Pipeline tracing — stage-by-stage timing data exportable as Perfetto JSON