All Publications

All Agent-ready Systems Human-agent Collaboration Agent Intelligence Automation White & Position Papers
  1. Latent Cache Flow: Model-to-Model Communication Without Text
    Maximillian Rossi, Prajwal Raghunath, Eugene Wu
    AdaptFM Workshop at ICML 2026
  2. Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All
    Chenxi Huang, Alex Mathai, Feiyang Yu, Aleksandr Nogikh, Petros Maniatis, Franjo Ivancic, Eugene Wu, Kostis Kaffes, Junfeng Yang, Baishakhi Ray
    ICML 2026
  3. LAKEQA: An Exploratory QA Benchmark over a Million-Scale Data Lake
    Haonan Wang*, Jiaxiang Liu*, Yurong Liu, Austin Senna Wijaya, Tianle Zhou, Eden Wu, Yijia Chen, Wanting You, Reya Vir, Daniela Pinto Veizaga, Grace Fan, Yusen Zhang, Juliana Freire, Eugene Wu
    ICML 2026
  4. BranchBench: An Extensible Benchmark for Agentic Database Branching
    Elaine Ang, Kostis Kaffes, Eugene Wu
    CAIS Workshop 2026
  5. VISTA: A Versatile Interactive User Simulation Toolkit for Agent Evaluation
    Yunan Lu, Ryan Shea, Yusen Zhang, Zhou Yu
    arXiv 2026
  6. Data Flow Control: Data Safety Policies for AI Agents
    Charlie Summers, Eugene Wu
    arXiv 2026
  7. SANA: What Matters for QA Agents over Massive Data Lakes?
    Austin Senna Wijaya, Jiaxiang Liu, Haonan Wang, Eugene Wu
    arXiv 2026
  8. Orchard: An Open-Source Agentic Modeling Framework
    Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Xiao Yu, Rui Yang, Tao Ge, Alessandro Sordoni, Xingdi Yuan, Yelong Shen, Pengcheng He, Tong Zhang, Zhou Yu, Jianfeng Gao
    arXiv 2026
  9. AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions
    Minghao Chen, Xinyi Hu, Zhou Yu, Yufei Yin
    ICML 2026
  10. Detecting Privilege Escalation in Polyglot Microservices via Agentic Program Analysis
    Penghui Li, Hong Yau Chong, Yinzhi Cao, Junfeng Yang
    IEEE S&P 2026
  11. PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures
    Yunan Lu, Luigi Liu, Omar Yahia, Arpit Sharma, Zhou Yu
    arXiv 2026
  12. Orchard: An Open-Source Agentic Modeling Framework
    Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Xiao Yu, Rui Yang, Tao Ge, Alessandro Sordoni, Xingdi Yuan, Yelong Shen, Pengcheng He, Tong Zhang, Zhou Yu, Jianfeng Gao
    arXiv 2026
  13. SemaTune: Semantic-Aware Online OS Tuning with Large Language Models
    Georgios Liargkovas, Mihir Nitin Joshi, Hubertus Franke, Kostis Kaffes
    arXiv 2026
  14. Conversational Customization of Productivity Systems: A Design Probe of Malleable AI Interfaces
    Karthik Sreedhar, Aryan Kaul, Lydia B. Chilton
    arXiv 2026
  15. ScarfBench: A Benchmark for Cross-Framework Application Migration in Enterprise Java
    Advait Pavuluri, Bridget McGinn, Ashita Saxena, George Safta, Srikanth Tamilselvam, Raju Pavuluri, Michele Merler, Baishakhi Ray, Rahul Krishna
    arXiv 2026
  16. Panprediction: optimal predictions for any downstream task and loss
    Sivaraman Balakrishnan, Nika Haghtalab, Daniel Hsu, Brian Lee, Eric Zhao
    AISTATS 2026
  17. Prior makes it possible: from sublinear graph algorithms to LLM test-time methods
    Avrim Blum, Daniel Hsu, Cyrus Rashtchian, Donya Saless
    AISTATS 2026
  18. Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers
    Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, Baishakhi Ray
    IEEE S&P 2026
  19. Estimating Tail Risks in Language Model Output Distributions
    Rico Angell, Raghav Singhal, Zachary Horvitz, Zhou Yu, Rajesh Ranganath, Kathleen McKeown, He He
    ICML 2026
  20. BranchBench: Aligning Database Branching with Agentic Demands
    Elaine Ang, Sam Weldon, In Keun Kim, Kevin Durand, Kostis Kaffes, Eugene Wu
    arXiv 2026
  21. MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard
    Ruishi Zou, Shiyu Xu, Margaret E Morris, Jihan Ryu, Timothy D. Becker, Nicholas Allen, Anne Marie Albano, Randy Auerbach, Dan Adler, Varun Mishra, Lace Padilla, Dakuo Wang, Ryan Sultan, Xuhai "Orson" Xu
    CHI 2026
  22. More than Decision Support: Exploring Patients' Longitudinal Usage of Large Language Models in Real-World Healthcare-Seeking Journeys
    Yancheng Cao, Yishu Ji, Chris Yue Fu, Sahiti Dharmavaram, Meghan Turchioe, Natalie C Benda, Lena Mamykina, Yuling Sun, Xuhai "Orson" Xu
    CHI 2026
  23. VineLM: Trie-Based Fine-Grained Control for Agentic Workflows
    Nikos Pagonas, Matthew Lou, Tianyi Peng, Dan Rubenstein, Kostis Kaffes
    arXiv 2026
  24. SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation
    Grace Jiarui Fan, Chengpiao Huang, Tianyi Peng, Kaizheng Wang, Yuhang Wu
    arXiv 2026
  25. AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agents
    Wenyue Hua, Sripad Karne, Qian Xie, Armaan Agrawal, Nikos Pagonas, Kostis Kaffes, Tianyi Peng
    arXiv 2026
  26. Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
    Wenyue Hua, Tianyi Peng, Chi Wang, Jiaxin Pei, Ian Kaufman, Bryan Lim, Chandler Fang
    arXiv 2026
  27. Agentic Data Environments
    Elaine Ang, Chenxi Huang, Georgios Liargkovas, Jerry Liu, Jinhui Liu, Nikos Pagonas, Charlie Summers, Haonan Wang, Jiakai Xu, Tianle Zhou, Yusen Zhang, Zhou Yu, Zhuo Zhang, Tianyi Peng, Kostis Kaffes, Eugene Wu
    IEEE Data Bulletin 2026
  28. Coherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right Code
    Myeongsoo Kim, Dingmin Wang, Siwei Cui, Farima Farmahinifarahani, Terry Yue Zhuo, Shweta Garg, Baishakhi Ray, Rajdeep Mukherjee, Varun Kumar
    arXiv 2026
  29. Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities
    Jean-Daniel Fekete, Yifan Hu, Dominik Moritz, Arnab Nandi, Senjuti Basu Roy, Eugene Wu, Nikos Bikakis, George Papastefanatos, Panos K. Chrysanthis, Guoliang Li, Lingyun Yu
    SIGMOD Record 2026
  30. Group-realizable multi-group learning by minimizing empirical risk
    Navid Ardeshir, Samuel Deng, Daniel Hsu, Jingwen Liu
    ALT 2026
  31. Trustworthy AI Software Engineers
    Aldeida Aleti, Baishakhi Ray, Rashina Hoda, Simin Chen
    arXiv 2026
  32. Please Don't Kill My Vibe: Empowering Agents with Data Flow Control
    Charlie Summers, Haneen Mohammed, Eugene Wu
    CIDR 2026 Slides
  33. LLM Generated Persona is a Promise with a Catch
    Ang Li, Haozhe Chen, Hongseok Namkoong, Tianyi Peng
    NeurIPS 2025 Position Paper
  34. Agents for Web Testing: A Case Study in the Wild
    Naimeng Ye, Xiao Yu, Ruize Xu, Tianyi Peng, Zhou Yu
    LAw Workshop at NeurIPS 2025
  35. Data Mixture Optimization: A Multi-Fidelity Multi-Scale Bayesian Framework
    Tzu-Ching Yen, Andrew Wei Tung Siah, Haozhe Chen, C. Daniel Guetta, Tianyi Peng, Hongseok Namkoong
    NeurIPS 2025
  36. Tail-Optimized Caching for LLM Inference
    Wenxin Zhang, Yueying Li, Ciamac C. Moallemi, Tianyi Peng
    NeurIPS 2025
  37. Multi-Agent Markov Entanglement
    Shuze Chen, Tianyi Peng
    NeurIPS 2025 (Spotlight)
  38. Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
    Xinyue Zhu*, Binghao Huang*, Yunzhu Li
    NeurIPS 2025
  39. Q-learning with Posterior Sampling
    Priyank Agrawal, Shipra Agrawal, Azmat Azati
    NeurIPS 2025
  40. RAISE: Reliable Agent Improvement via Simulated Experience
    Sahar Omidi Shayegan, Joshua Meyer, Victor Shih, Sebastian Sosa, Tianyi Peng, Kostis Kaffes, Eugene Wu, Andi Partovi, Mehdi Jamei
    NeurIPS 2025 (SEA Workshop)
  41. LLM Agents for Always-On Operating System Tuning
    Georgios Liargkovas, Vahab Jabrayilov, Hubertus Franke, Kostis Kaffes
    NeurIPS 2025
  42. A Decade of Systems for Human Data Interaction
    Eugene Wu, Yiru Chen, Haneen Mohammed, Zezhou Huang
    ArXiV 2025
  43. SAGE: A Top-Down Bottom-Up Knowledge-Grounded User Simulator for Multi-turn AGent Evaluation
    Ryan Shea, Yunan Lu, Liang Qiu, Zhou Yu
    EACL 2026
  44. Set It and Forget It: Zero-Mod ML Magic for Linux Tuning
    Georgios Liargkovas, Prabhpreet Singh Sodhi, Kostis Kaffes
    PACMI Workshop at SOSP 2025
  45. Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving
    Nikos Pagonas, Yeounoh Chung, Kostis Kaffes, Arvind Krishnamurthy
    SAA Workshop at SOSP 2025
  46. Toward Systems Foundations for Agentic Exploration
    Jiakai Xu, Tianle Zhou, Eugene Wu, Kostis Kaffes
    SAA Workshop at SOSP 2025
  47. Do Spammers Dream of Electric Sheep? Characterizing the Prevalence of LLM-Generated Malicious Emails
    Wei Hao, Van Tran, Vincent Rideout, Zixi Wang, AnMei Dasbach-Prisk, M. H. Afifi, Junfeng Yang, Ethan Katz-Bassett, Grant Ho, Asaf Cidon
    IMC 2025
  48. PickleBall: Secure Deserialization of Pickle-based Machine Learning Models
    Andreas Kellas, Neophytos Christou, Wenxin Jiang, Penghui Li, Laurent Simon, Yaniv David, Vasileios P. Kemerlis, James C. Davis, Junfeng Yang
    CCS 2025
  49. The Anatomy of a Personal Health Agent
    A. Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A. Metwally, Brent Winslow, Yubin Kim, Kumar Ayush, Yuzhe Yang, Girish Narayanswamy, Maxwell A. Xu, Jake Garrison, Amy Armento Lee, Jenny Vafeiadou, Ben Graef, Isaac R. Galatzer-Levy, Erik Schenck, Andrew Barakat, Javier Perez, Jacqueline Shreibati, John Hernandez, Anthony Z. Faranesh, Javier L. Prieto, Connor Heneghan, Yun Liu, Jiening Zhan, Mark Malhotra, Shwetak Patel, Tim Althoff, Xin Liu, Daniel McDuff, Xuhai "Orson" Xu
    ArXiv
  50. Suna: Scalable Causal Confounder Discovery over Relational Data
    Jiaxiang Liu, Siyuan Xia, Daniel Alabi, Eugene Wu
    VLDB 2025
  51. EditLord: Learning Code Transformation Rules for Code Editing
    Weichen Li, Albert Jan, Baishakhi Ray, Chengzhi Mao, Junfeng Yang, Kexin Pei
    ICML 2025
  52. Learning to Rewrite: Generalized LLM-Generated Text Detection
    Ran Li, Wei Hao, Weiliang Zhao, Junfeng Yang, Chengzhi Mao
    ACL 2025
  53. Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice.
    Akshit Kumar, Tianyi Peng, Yuhang Wu, Assaf Zeevi
    Winter Simulation Conference 2025
  54. Prompt Editor: A Taxonomy-driven System for Guided LLM Prompt Development in Enterprise Settings
    Jeffery Cao, Lampros Flokas, Yujian Xu, Eugene Wu, Xu Chu, Cong Yu
    SIGMOD Demo 2025
  55. Towards a Framework for Optimizing Hierarchical Text Segmentation using LLMs
    Lampros Flokas, Jeffrey Cao, Yujian Xu, Eugene Wu, Xu Chu, Cong Yu
    DEEM Workshop at SIGMOD 2025
  56. Position Paper: A System-Centric Approach is Necessary for AI Agents
    Nikos Pagonas, Haonan Wang, Jiaxiang Liu, Tianle Zhou, Deepak Dastrala, Raman Jatkar, Anirudh Sivaraman, Zhou Yu, Kostis Kaffes, Eugene Wu
    ArXiv 2025
  57. Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions
    Olivier Toubia, George Z. Gui, Tianyi Peng, Daniel J. Merlau, Ang Li, and Haozhe Chen
    Marketing Science
  58. Diversity Helps Jailbreak Large Language Models
    Weiliang Zhao, Daneil Ben-Levi, Wei Hao, Junfeng Yang, Chengzhi Mao
    NAACL 2025
  59. CrashFixer: A Crash Resolution Agent for the Linux Kernel
    Alex Mathai, Chenxi Huang, Suwei Ma, Jihwan Kim, Hailie Mitchell, Aleksandr Nogikh, Petros Maniatis, Franjo Ivančić, Junfeng Yang, Baishakhi Ray
    Arxiv 2025
  60. FeedQUAC: Quick Unobtrusive Agent-Generated Commentary
    Tao Long, Kendra Wannamaker, Jo Vermeulen, George Fitzmaurice, Justin Matejka
    arXiv 2025
  61. Steering Semantic Data Processing With DocWrangler
    Shreya Shankar, Bhavya Chopra, Mawil Hasan, Stephen Lee, Bjoern Hartmann, Joseph Hellerstein, Aditya Parameswaran, Eugene Wu
    UIST 2025
  62. Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents
    Yueying Li, Jim Dai, Tianyi Peng
    Arxiv 2025
  63. AgentDynEx: Nudging the Mechanics and Dynamics of Multi-Agent Simulations
    Jenny Ma, Riya Sahni, Karthik Sreedhar, Lydia B. Chilton
    Under Submission
  64. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
    Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, Eugene Wu
    VLDB 2025
  65. Program Synthesis Dialog Agents for Interactive Decision-Making
    Matthew Toles, Nikhil Balwani, Rattandeep Singh, Valentina Giulia Sartori Rodriguez, Zhou Yu
    ArXiv 2025
  66. How Well do LLMs Compress their Own Chain-of-Thought? A Token Complexity Approach
    Ayeong Lee, Ethan Che, Tianyi Peng
    ICML, Efficient Systems for Foundation Models Workshop 2025
  67. ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
    Xiao Yu, Baolin Peng, Vineeth Vajipey, Hao Cheng, Michel Galley, Jianfeng Gao, Zhou Yu
    ICLR 2025
  68. AnimationAgents: A Multi-Modal Team of Agents for Generating, Debugging, and Human Editing of Animation Code
    Vivian Liu, Rubaiat Habib Kazi, Li-Yi Wei, Matthew Fisher, Timothy Langlois, Seth Walker, Lydia B. Chilton
    CHI 2025
  69. ACE: A LLM Agent-based Negotiation Coaching System
    Ryan Shea, Aymen Kallala, Xin Lucy Liu, Michael W. Morris, Zhou Yu
    EMNLP 2024
  70. Fast Userspace Networking for the Rest of Us
    Alireza Sanaee, Vahab Jabrayilov, Ilias Marinos, Anuj Kalia, Divyanshu Saxena, Prateesh Goyal, Kostis Kaffes, Gianni Antichi
    ArXiv 2025
  71. DynEx: Agentic Assistance to Bridge Design and Code
    Jenny Ma, Karthik Sreedhar, Vivian Liu, Pedro Alejandro Perez, Sitong Wang, Riya Sahni, Lydia B. Chilton
    CHI 2025
  72. DietGlance: dietary monitoring and personalized analysis at a glance with knowledge-empowered AI assistant
    Zhihan Jiang, Running Zhao, Lin Lin, Yue Yu, Handi Chen, Xinchen Zhang, Xuhai "Orson" Xu, Yifang Wang, Xiaojuan Ma, Edith CH Ngai
    ACM HEALTH
  73. Data Cleaning Using Large Language Models
    Shuo Zhang, Zezhou Huang, Eugene Wu
    DAIS Workshop at ICDE 2025
  74. Alexpaca: Learning Factual Clarification Question Generation Without Examples
    Matthew Toles, Yukun Huang, Zhou Yu, Luis Gravano
    GEM^2 Workshop at ACL 2025
  75. KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution
    Alex Mathai, Chenxi Huang, Petros Maniatis, Aleksandr Nogikh, Franjo Ivančić, Junfeng Yang, Baishakhi Ray
    NeurIPS 2024
  76. Simulating Cooperative Prosocial Behavior with Multi-Agent LLMs
    Karthik Sreedhar, Alice Cai, Jenny Ma, Jeffrey V. Nickerson, Lydia B. Chilton
    IUI 2025
  77. Law-Abiding Agents: Demonstrating Safe Tax Preparation with Data Flow Control
    Charlie Summers, Prajwal Raghunath, Zhibin Shen, Hardik Gupta, Eric Choi, Mayur Kulkarni, Sally Go, Peter Yu, Akriti Agarwal, Zhuo Zhang, Oliver Kennedy, Eugene Wu
    Preprint 2026