Docs / How It Works

How ClawPipe Works

Every prompt passes through a pipeline of optimization stages. Each stage independently reduces cost, latency, or both. Stages that are not needed are skipped automatically.

Booster
RAG
Pack
Cache
Route
Swarm
Call
Learn

1. Agent Booster

Skip the LLM entirely

The Booster applies deterministic transforms to resolve prompts that don't need AI. Math expressions, date calculations, JSON formatting, unit conversions, and UUID generation are handled instantly at zero cost.

When it fires: Every request passes through the Booster first. If it can resolve the prompt, the pipeline short-circuits and returns immediately.

Cost impact: 100% savings on boosted requests. Typical apps see 5-15% of prompts resolved by the Booster.

2. RAG Pipeline

Retrieval-augmented generation

Retrieves relevant documents from a pluggable vector store and prepends them as context. Supports any embedding provider and configurable document limits.

When it fires: When a RAG retriever is configured. Runs before packing to ensure retrieved documents are also compressed.

Cost impact: Increases input tokens but dramatically improves response quality, reducing the need for follow-up calls.

3. Context Packer

Compress context windows

Removes redundancy, deduplicates repeated content, strips boilerplate, and compresses the prompt. Typical savings: 20-60% fewer input tokens.

When it fires: On every request where enablePacker is true (default). Runs after RAG so retrieved documents are also compressed.

Cost impact: 20-60% reduction in input tokens. The savings percentage is reported in result.meta.contextSavings.

4. Semantic Cache

Prompt deduplication

Two-layer cache: hash-based exact matching and embedding-based semantic matching. Similar prompts return cached results in milliseconds. Configurable TTL and LRU eviction.

When it fires: After packing, before routing. Checks both local in-memory cache and optional KV-backed persistent cache.

Cost impact: 100% savings on cache hits. Cache hit rates of 15-40% are typical for production workloads.

5. Self-Learning Router

Smart model selection

Analyzes prompt complexity and selects the cheapest model that meets quality requirements. Considers cost, latency, and quality scores. Routing weights are updated after every call based on actual outcomes.

When it fires: On every non-cached, non-boosted request. Uses learned weights from the Learner stage.

Cost impact: 30-70% savings by routing simple prompts to cheaper models. Complex prompts still go to premium models.

6. Swarm Orchestration

Multi-model consensus

Fans out a prompt to N models in parallel. Four strategies: first (fastest response), vote (majority consensus), best (highest quality score), merge (combine responses).

When it fires: Only when explicitly configured with a Swarm instance. Used for high-stakes prompts where accuracy matters more than cost.

Cost impact: Increases cost (N model calls) but improves reliability and accuracy for critical decisions.

7. Multi-Provider Gateway

One API, every provider

Dispatches to OpenAI, Anthropic, DeepSeek, Groq, Mistral, and local models through a unified interface. Includes circuit breaker protection with configurable failure thresholds and automatic recovery.

When it fires: On every request that reaches the call stage. Handles provider-specific API formats, streaming, and error handling.

Cost impact: No direct savings, but automatic failover prevents downtime-related costs.

8. Learner

Continuous improvement

Tracks every call's outcome (latency, token usage, cost) and updates the Router's weights. Weights are persisted to D1 so routing improves across sessions and deployments.

When it fires: After every successful gateway call. Updates are batched and persisted periodically.

Cost impact: Indirect savings through improved routing accuracy over time. Routing quality improves with volume.

Enterprise Controls

In addition to the pipeline stages, ClawPipe provides enterprise-grade controls that run alongside the pipeline:

  • Budget caps — hard and soft USD limits that reject or warn on exceeded budgets
  • Rate limiting — per-project daily call limits matching your pricing tier
  • Circuit breaker — automatic provider isolation on repeated failures
  • Allowlist / denylist — restrict which providers and models can be used
  • Audit logging — timestamped logs of every action for compliance
  • Pipeline tracing — stage-by-stage timing data exportable as Perfetto JSON