kGym & kAgent

A Platform and Agent to patch Linux kernel bugs

Posted: May 13, 2025

Tags: Linux Kernel, LLM Agents, Automated Crash Repair, Systems

Overview

Debugging the Linux kernel remains one of the most challenging problems in systems software. With over 20 million lines of code, thousands of contributors, and hardware-level concurrency, crashes can be subtle, nondeterministic, and cryptic to interpret. Traditional debugging techniques often fail to scale across this diversity and complexity.

To address this, we built an agentic framework that leverages Large Language Models (LLMs) to reason about, reproduce, and fix kernel-level bugs. This ecosystem — kBench, kGym, and kAgent — integrates structured datasets, experimental automation, and intelligent reasoning to close the loop between bug observation and repair.

Components

Name	Description
kBench	A curated benchmark of Linux kernel bugs, each paired with developer-provided fixes and deterministic reproduction scripts. Enables systematic evaluation of patching strategies.
kGym	A sandboxed, large-scale kernel experimentation platform capable of booting and testing thousands of kernel configurations in parallel. Provides execution traces, crash states, and verification environments for patches.
kAgent	An LLM-based autonomous agent that runs experiments in kGym, interprets crash logs, hypothesizes code changes, and iteratively validates patches until a verified fix is achieved.

Approach

The system adopts a hypothesis-driven debugging workflow:

Observe a kernel crash and extract structured traces and logs.
Generate hypotheses for potential fault locations using LLM reasoning.
Propose and apply plausible patches to the codebase.
Validate candidate patches within kGym until the crash is fully resolved.

Through iterative feedback between execution results and model reasoning, kAgent narrows its search space, reducing spurious edits and improving both precision and convergence speed.

Impact

This platform demonstrates the first end-to-end LLM-based repair loop for real Linux kernel failures.
It shows that structured experimentation and domain-specific agent design can overcome limitations of general-purpose code models in system-level debugging.

Key outcomes:

Reproducing complex kernel bugs automatically and deterministically.
Validating LLM-proposed patches at scale within kGym.
Reducing incorrect edit rates via structured, state-aware search.
Providing a reproducible benchmark for future research on agentic debugging.

Contributors

Alex Mathai*

Chenxi Huang*

Petros Maniatis

Aleksandr Nogikh

Franjo Ivančić

Junfeng Yang

Baishakhi Ray

Publications

CrashFixer: A crash resolution agent for the Linux kernel

arXiv - 2025
View Publication →
kGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution

NeurIPS - 2024
View Publication →