MCP-native Edge Caching

Caching for the AI agents — faster, cheaper, visible.

The simplest way to speed up performance, reduce infrastructure costs, and improve reliability by offloading repetitive traffic of your AI agents.

npm install @alchymos/sdk

Interactive demo coming soon

Experience Alchymos in action

Multi-layer Caching

Simple Process, Powerful Results

Get started in minutes and see the difference our platform can make for your business.

Using CDN only

Alchymos provides Edge caching for global low-latency access, ideal for read-heavy, mostly static outputs. It complements in-server memory caches, delivering faster responses with acceptable eventual consistency for most AI/MCP tools.

Full layered caching

Alchymos combines in-memory, regional/distributed, and CDN edge caching to deliver ultra-low latency at global scale. Dynamic outputs flow through Redis while static or read-heavy data sits at the edge, with TTL and optional invalidation ensuring acceptable consistency.

Full layered caching with multi-region

Alchymos layers caching from local in-memory to regional Redis and global CDN/edge to minimize latency. Most requests are served from cache, with the origin backend only hit on full misses — keeping responses fast and server load low.

🏗️

Global Architecture

Multi-tier caching with regional distribution

Alchymos Mechanics

Say goodbye to unnecessary agent load and say hello to savings, all while delivering a lightning-fast agent performance.

Global edge distribution

Serve cached MCP responses from regions close to the user/agent to minimize latency

Support for authenticated/private caching

Cache scoped per license / user / server so data privacy is maintained

Fine-grained caching rules

Ability to configure which tools or types of requests are cacheable, by parameter, by tool name, etc.

Automatic invalidation

When data changes (tool output, resource update), purge affected cache entries

Purge API

Manual or programmatic purges of cached keys / tool results

Metrics & analytics

Monitor cache hit rate, miss rate, latency, traffic saved

Cache rule preview / debugging

See which requests are being cached or bypassed, and why

Partial cache / split caching

Perhaps for large responses or tools with sub-components, allow partial caching of parts of the data

Traffic shaping / scope rules

Ability to bypass cache for certain request types (error, dynamic, or "hot" real-time data)

Response time SLAs / performance guarantees

Ensure cache layers bring down p50/p95 latencies to acceptable thresholds

Feel the difference in seconds

With three lines of code, Alchymos cuts costs and latency of your AI agents. Get started today.

Book a Demo