All Publications

All Agent-ready Systems Human-agent Collaboration Agent Intelligence Automation White & Position Papers
  1. Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All
    Chenxi Huang, Alex Mathai, Feiyang Yu, Aleksandr Nogikh, Petros Maniatis, Franjo Ivancic, Eugene Wu, Kostis Kaffes, Junfeng Yang, Baishakhi Ray
    ICML 2026
  2. LAKEQA: An Exploratory QA Benchmark over a Million-Scale Data Lake
    Haonan Wang*, Jiaxiang Liu*, Yurong Liu, Austin Senna Wijaya, Tianle Zhou, Eden Wu, Yijia Chen, Wanting You, Reya Vir, Daniela Pinto Veizaga, Grace Fan, Yusen Zhang, Juliana Freire, Eugene Wu
    ICML 2026
  3. BranchBench: An Extensible Benchmark for Agentic Database Branching
    Elaine Ang, Kostis Kaffes, Eugene Wu
    CAIS Workshop 2026
  4. Panprediction: optimal predictions for any downstream task and loss
    Sivaraman Balakrishnan, Nika Haghtalab, Daniel Hsu, Brian Lee, Eric Zhao
    AISTATS 2026
  5. Prior makes it possible: from sublinear graph algorithms to LLM test-time methods
    Avrim Blum, Daniel Hsu, Cyrus Rashtchian, Donya Saless
    AISTATS 2026
  6. Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers
    Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, Baishakhi Ray
    IEEE S&P 2026
  7. MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard
    Ruishi Zou, Shiyu Xu, Margaret E Morris, Jihan Ryu, Timothy D. Becker, Nicholas Allen, Anne Marie Albano, Randy Auerbach, Dan Adler, Varun Mishra, Lace Padilla, Dakuo Wang, Ryan Sultan, Xuhai "Orson" Xu
    CHI 2026
  8. More than Decision Support: Exploring Patients' Longitudinal Usage of Large Language Models in Real-World Healthcare-Seeking Journeys
    Yancheng Cao, Yishu Ji, Chris Yue Fu, Sahiti Dharmavaram, Meghan Turchioe, Natalie C Benda, Lena Mamykina, Yuling Sun, Xuhai "Orson" Xu
    CHI 2026
  9. Agentic Data Environments
    Elaine Ang, Chenxi Huang, Georgios Liargkovas, Jerry Liu, Jinhui Liu, Nikos Pagonas, Charlie Summers, Haonan Wang, Jiakai Xu, Tianle Zhou, Yusen Zhang, Zhou Yu, Zhuo Zhang, Tianyi Peng, Kostis Kaffes, Eugene Wu
    IEEE Data Bulletin 2026
  10. Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities
    Jean-Daniel Fekete, Yifan Hu, Dominik Moritz, Arnab Nandi, Senjuti Basu Roy, Eugene Wu, Nikos Bikakis, George Papastefanatos, Panos K. Chrysanthis, Guoliang Li, Lingyun Yu
    SIGMOD Record 2026
  11. Group-realizable multi-group learning by minimizing empirical risk
    Navid Ardeshir, Samuel Deng, Daniel Hsu, Jingwen Liu
    ALT 2026
  12. Please Don't Kill My Vibe: Empowering Agents with Data Flow Control
    Charlie Summers, Haneen Mohammed, Eugene Wu
    CIDR 2026 Slides
  13. LLM Generated Persona is a Promise with a Catch
    Ang Li, Haozhe Chen, Hongseok Namkoong, Tianyi Peng
    NeurIPS 2025 Position Paper
  14. Agents for Web Testing: A Case Study in the Wild
    Naimeng Ye, Xiao Yu, Ruize Xu, Tianyi Peng, Zhou Yu
    LAw Workshop at NeurIPS 2025
  15. Data Mixture Optimization: A Multi-Fidelity Multi-Scale Bayesian Framework
    Tzu-Ching Yen, Andrew Wei Tung Siah, Haozhe Chen, C. Daniel Guetta, Tianyi Peng, Hongseok Namkoong
    NeurIPS 2025
  16. Tail-Optimized Caching for LLM Inference
    Wenxin Zhang, Yueying Li, Ciamac C. Moallemi, Tianyi Peng
    NeurIPS 2025
  17. Multi-Agent Markov Entanglement
    Shuze Chen, Tianyi Peng
    NeurIPS 2025 (Spotlight)
  18. Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
    Xinyue Zhu*, Binghao Huang*, Yunzhu Li
    NeurIPS 2025
  19. Q-learning with Posterior Sampling
    Priyank Agrawal, Shipra Agrawal, Azmat Azati
    NeurIPS 2025
  20. RAISE: Reliable Agent Improvement via Simulated Experience
    Sahar Omidi Shayegan, Joshua Meyer, Victor Shih, Sebastian Sosa, Tianyi Peng, Kostis Kaffes, Eugene Wu, Andi Partovi, Mehdi Jamei
    NeurIPS 2025 (SEA Workshop)
  21. LLM Agents for Always-On Operating System Tuning
    Georgios Liargkovas, Vahab Jabrayilov, Hubertus Franke, Kostis Kaffes
    NeurIPS 2025
  22. A Decade of Systems for Human Data Interaction
    Eugene Wu, Yiru Chen, Haneen Mohammed, Zezhou Huang
    ArXiV 2025
  23. SAGE: A Top-Down Bottom-Up Knowledge-Grounded User Simulator for Multi-turn AGent Evaluation
    Ryan Shea, Yunan Lu, Liang Qiu, Zhou Yu
    EACL 2026
  24. Set It and Forget It: Zero-Mod ML Magic for Linux Tuning
    Georgios Liargkovas, Prabhpreet Singh Sodhi, Kostis Kaffes
    PACMI Workshop at SOSP 2025
  25. Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving
    Nikos Pagonas, Yeounoh Chung, Kostis Kaffes, Arvind Krishnamurthy
    SAA Workshop at SOSP 2025
  26. Toward Systems Foundations for Agentic Exploration
    Jiakai Xu, Tianle Zhou, Eugene Wu, Kostis Kaffes
    SAA Workshop at SOSP 2025
  27. Do Spammers Dream of Electric Sheep? Characterizing the Prevalence of LLM-Generated Malicious Emails
    Wei Hao, Van Tran, Vincent Rideout, Zixi Wang, AnMei Dasbach-Prisk, M. H. Afifi, Junfeng Yang, Ethan Katz-Bassett, Grant Ho, Asaf Cidon
    IMC 2025
  28. PickleBall: Secure Deserialization of Pickle-based Machine Learning Models
    Andreas Kellas, Neophytos Christou, Wenxin Jiang, Penghui Li, Laurent Simon, Yaniv David, Vasileios P. Kemerlis, James C. Davis, Junfeng Yang
    CCS 2025
  29. The Anatomy of a Personal Health Agent
    A. Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A. Metwally, Brent Winslow, Yubin Kim, Kumar Ayush, Yuzhe Yang, Girish Narayanswamy, Maxwell A. Xu, Jake Garrison, Amy Armento Lee, Jenny Vafeiadou, Ben Graef, Isaac R. Galatzer-Levy, Erik Schenck, Andrew Barakat, Javier Perez, Jacqueline Shreibati, John Hernandez, Anthony Z. Faranesh, Javier L. Prieto, Connor Heneghan, Yun Liu, Jiening Zhan, Mark Malhotra, Shwetak Patel, Tim Althoff, Xin Liu, Daniel McDuff, Xuhai "Orson" Xu
    ArXiv
  30. Suna: Scalable Causal Confounder Discovery over Relational Data
    Jiaxiang Liu, Siyuan Xia, Daniel Alabi, Eugene Wu
    VLDB 2025
  31. EditLord: Learning Code Transformation Rules for Code Editing
    Weichen Li, Albert Jan, Baishakhi Ray, Chengzhi Mao, Junfeng Yang, Kexin Pei
    ICML 2025
  32. Learning to Rewrite: Generalized LLM-Generated Text Detection
    Ran Li, Wei Hao, Weiliang Zhao, Junfeng Yang, Chengzhi Mao
    ACL 2025
  33. Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice.
    Akshit Kumar, Tianyi Peng, Yuhang Wu, Assaf Zeevi
    Winter Simulation Conference 2025
  34. Prompt Editor: A Taxonomy-driven System for Guided LLM Prompt Development in Enterprise Settings
    Jeffery Cao, Lampros Flokas, Yujian Xu, Eugene Wu, Xu Chu, Cong Yu
    SIGMOD Demo 2025
  35. Towards a Framework for Optimizing Hierarchical Text Segmentation using LLMs
    Lampros Flokas, Jeffrey Cao, Yujian Xu, Eugene Wu, Xu Chu, Cong Yu
    DEEM Workshop at SIGMOD 2025
  36. Position Paper: A System-Centric Approach is Necessary for AI Agents
    Nikos Pagonas, Haonan Wang, Jiaxiang Liu, Tianle Zhou, Deepak Dastrala, Raman Jatkar, Anirudh Sivaraman, Zhou Yu, Kostis Kaffes, Eugene Wu
    ArXiv 2025
  37. Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions
    Olivier Toubia, George Z. Gui, Tianyi Peng, Daniel J. Merlau, Ang Li, and Haozhe Chen
    Marketing Science
  38. Diversity Helps Jailbreak Large Language Models
    Weiliang Zhao, Daneil Ben-Levi, Wei Hao, Junfeng Yang, Chengzhi Mao
    NAACL 2025
  39. CrashFixer: A Crash Resolution Agent for the Linux Kernel
    Alex Mathai, Chenxi Huang, Suwei Ma, Jihwan Kim, Hailie Mitchell, Aleksandr Nogikh, Petros Maniatis, Franjo Ivančić, Junfeng Yang, Baishakhi Ray
    Arxiv 2025
  40. FeedQUAC: Quick Unobtrusive Agent-Generated Commentary
    Tao Long, Kendra Wannamaker, Jo Vermeulen, George Fitzmaurice, Justin Matejka
    arXiv 2025
  41. Steering Semantic Data Processing With DocWrangler
    Shreya Shankar, Bhavya Chopra, Mawil Hasan, Stephen Lee, Bjoern Hartmann, Joseph Hellerstein, Aditya Parameswaran, Eugene Wu
    UIST 2025
  42. Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents
    Yueying Li, Jim Dai, Tianyi Peng
    Arxiv 2025
  43. AgentDynEx: Nudging the Mechanics and Dynamics of Multi-Agent Simulations
    Jenny Ma, Riya Sahni, Karthik Sreedhar, Lydia B. Chilton
    Under Submission
  44. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
    Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, Eugene Wu
    VLDB 2025
  45. Program Synthesis Dialog Agents for Interactive Decision-Making
    Matthew Toles, Nikhil Balwani, Rattandeep Singh, Valentina Giulia Sartori Rodriguez, Zhou Yu
    ArXiv 2025
  46. How Well do LLMs Compress their Own Chain-of-Thought? A Token Complexity Approach
    Ayeong Lee, Ethan Che, Tianyi Peng
    ICML, Efficient Systems for Foundation Models Workshop 2025
  47. ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
    Xiao Yu, Baolin Peng, Vineeth Vajipey, Hao Cheng, Michel Galley, Jianfeng Gao, Zhou Yu
    ICLR 2025
  48. AnimationAgents: A Multi-Modal Team of Agents for Generating, Debugging, and Human Editing of Animation Code
    Vivian Liu, Rubaiat Habib Kazi, Li-Yi Wei, Matthew Fisher, Timothy Langlois, Seth Walker, Lydia B. Chilton
    CHI 2025
  49. ACE: A LLM Agent-based Negotiation Coaching System
    Ryan Shea, Aymen Kallala, Xin Lucy Liu, Michael W. Morris, Zhou Yu
    EMNLP 2024
  50. Fast Userspace Networking for the Rest of Us
    Alireza Sanaee, Vahab Jabrayilov, Ilias Marinos, Anuj Kalia, Divyanshu Saxena, Prateesh Goyal, Kostis Kaffes, Gianni Antichi
    ArXiv 2025
  51. DynEx: Agentic Assistance to Bridge Design and Code
    Jenny Ma, Karthik Sreedhar, Vivian Liu, Pedro Alejandro Perez, Sitong Wang, Riya Sahni, Lydia B. Chilton
    CHI 2025
  52. DietGlance: dietary monitoring and personalized analysis at a glance with knowledge-empowered AI assistant
    Zhihan Jiang, Running Zhao, Lin Lin, Yue Yu, Handi Chen, Xinchen Zhang, Xuhai "Orson" Xu, Yifang Wang, Xiaojuan Ma, Edith CH Ngai
    ACM HEALTH
  53. Data Cleaning Using Large Language Models
    Shuo Zhang, Zezhou Huang, Eugene Wu
    DAIS Workshop at ICDE 2025
  54. Alexpaca: Learning Factual Clarification Question Generation Without Examples
    Matthew Toles, Yukun Huang, Zhou Yu, Luis Gravano
    GEM^2 Workshop at ACL 2025
  55. KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution
    Alex Mathai, Chenxi Huang, Petros Maniatis, Aleksandr Nogikh, Franjo Ivančić, Junfeng Yang, Baishakhi Ray
    NeurIPS 2024
  56. Simulating Cooperative Prosocial Behavior with Multi-Agent LLMs
    Karthik Sreedhar, Alice Cai, Jenny Ma, Jeffrey V. Nickerson, Lydia B. Chilton
    IUI 2025