Back to Projects
Cortex
Workflow-Aware Resource Pooling and Scheduling for Agentic Serving
Cortex is a prototype workflow-aware serving platform designed for agentic workloads. The core principle of Cortex is stage isolation: it provisions dedicated resource pools for each distinct stage of an agentic workflow. This simple yet powerful strategy mitigates inter-stage interference in compute and memory, leading to better KV cache utilization, higher throughput, and more predictable performance. By customizing resource allocation and scheduling within each distinct stage of agentic workflows, Cortex lays the groundwork for more advanced, agent-native serving paradigms, including malleable resource management, speculative execution of workflow branches, and a shared, multi-tiered cache for “agentic state.”
Contributors
- Nikos Pagonas ,
- Yeounoh Chung ,
- Kostis Kaffes ,
- Arvind Krishnamurthy
Publications
-
Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving1st Workshop on Systems for Agentic AI (SAA 2025) - 2025View Publication →