MMLI Symposium 2025


Overview


​The MMLI Annual Symposium: “AI Scientists? What Would It Take?” is a two-day event exploring the integration of artificial intelligence (AI) in scientific discovery, particularly within chemistry and materials science. The symposium features three thematic sessions, each addressing critical aspects of AI’s role in advancing scientific research.

Through the sessions, the symposium aims to foster discussions on the current state and future potential of AI in scientific research, addressing both the advancements achieved and the challenges that lie ahead.

Agenda


Location:
National Center for Supercomputing Applications (NCSA),
1205 W Clark St, Urbana, IL, 61801

All talks are located in the Auditorium and meals will be served in the Atrium.
Poster session will be in the Atrium.

All time listed on the agenda are in Central Daylight Time (CDT).

April 15, Tuesday (Day 1)

8.00 amRegistration & breakfast
9.00 amWelcome & Logistics
9.15 amDirector’s Address and Introduction to MMLI Symposium
9.30 amTheme Keynote
“Language Agents for Scientific Discovery: Closer Than Ever, Yet Further Than We Think”
Speaker: Dr. Huan Sun, The Ohio State University (in-person)
10.15 amBreak
10.30 amFoundational AI: Where are we? Where do we want to go next?
Session Introduction by Prof. Jiawei Han, MMLI (in-person)
10.45 am“Multivariate Tails for Active Molecular Design”
Speaker: Dr. Ji Won Park, Genentech (Zoom)
11.15 am“The Virtual Lab of AI Scientists”
Speaker: Dr. James Zou, Stanford University (Zoom)
11.45 amFoundational AI Panel
Moderator: Prof. Jiawei Han
Panelists: Dr. Huan Sun, Dr. Ji Won Park and Dr. James Zou
12.15 pmLunch
1.15 pmAI Scientists in Small Molecule Discovery and Development – How does it change the game?
Session Introduction by Prof. Martin Burke, MMLI (in-person)
1.30 pmSmall Molecule Keynote
“Predicting Chiral Structures: Data-Driven Approaches to Asymmetric Catalysis”
Speaker: Dr. Jolene Reid, University of British Columbia (in-person)
2.15 pm“Optimization of Catalytic Transformations using Bayesian Optimization in Sparse Data Regimes”
Speaker: Dr. Richard Walroth, Genentech (Zoom)
2.45 pm“Teaching Language Models to Speak Chemistry”
Speaker: Dr. Philippe Schwaller, EPFL & NCCR Catalysis (Zoom)
3.15 pm“Data Science Enabled Synthesis”
Speaker: Dr. Kaid Harper, AbbVie (in-person)
3.45 pmAI for Small Molecule Innovation Panel
Moderator: Prof. Martin Burke
Panelists: Dr. Jolene Reid, Dr. Richard Walroth, Dr. Philippe Schwaller, Dr. Kaid Harper
4.30 pmBreak
5.00 pmPoster Session
6.00 pmDinner with roundtable discussion
“AI Scientist? What Would It Take?”

April 16, Wednesday (Day 2)

8.00 amRegistration & Breakfast
9.00 amAI Scientists in Material Discovery and Development: Where are we? Where do we want to go next?
Session Introduction by Prof. Charles Schroeder, MMLI (in-person)
9.15 amMaterials Keynote
“Where is the Prediction Frontier in Materials Chemistry?”

Speaker: Prof. Brett Savoie, University of Notre Dame (in-person)
10.00 amBreak
10.30 am“Engineering Electronic Polymers using Self-Driving Laboratory”
Speaker: Dr. Jie Xu, Argonne National Laboratory(in-person)
11.00 am“Automation and Machine Learning for Polymer Biomaterials”
Speaker: Dr. Adam Gormley, Rutgers University (in-person)
11.30 amAI for Materials Panel
Moderator: Prof. Charles Schroeder
Panelists: Prof. Brett Savoie, Dr. Jie Xu, Dr. Adam Gormley
12.00 pmClosing Remarks
12.30 pmBoxed Lunch

Talks & Speaker


Session 1: Foundational AI

Session 1 Talk Details

Foundational AI: Where are we? Where do we want to go next?

Session Chair & Panel Moderator: Prof. Jiawei Han, MMLI

“Language Agents for Scientific Discovery: Closer Than Ever, Yet Further Than We Think”

Dr. Huan Sun, The Ohio State University

Abstract

With each passing day, the vision of language agents capable of driving scientific discovery feels ever closer to reality. LLM-based systems that can follow complex instructions, use external tools, and take actions to complete sophisticated scientific tasks are beginning to resemble AI scientists.

Yet, despite this accelerating progress, today’s LLMs continue to struggle with basic reasoning and generalization failures (e.g., the Reversal Curse). In this talk, I will explore both sides of this paradox and reflect on a central question: What would it take to build truly transformative language agents for scientific discovery? I will advocate for two key lines of effort: rigorous benchmarking and fundamental understanding. First, we will discuss rigorously benchmarking agents in realistic scientific tasks, including chemistry and scientific coding, highlighting both their capabilities and limitations. Then, we will turn inward to examine the architectural foundations of these systems, exploring the limits of the Transformer in multi-step reasoning and the fundamental causes of failures like the Reversal Curse.

By contrasting rapid progress with foundational constraints, this talk aims to surface some of the key ingredients and missing pieces on the path toward truly transformative language agents for scientific discovery.

Speaker’s Biography

Huan Sun is an endowed College of Engineering Innovation Scholar and an associate professor in the Department of Computer Science and Engineering at The Ohio State University. Her research focuses on natural language processing and artificial intelligence, with particular interest in large language models, agents, and their safety risks. Her recent notable work includes Mind2Web, SeeAct, MMMU, ScienceAgentBench, Grokked Transformers, AmpleGCG, and EIA. She has received many awards, including Best Paper Finalist at CVPR 2024, two Honorable Mentions for Best Papers at ACL 2023, ACM SIGMOD Research Highlight Award 2022, Best Paper Award at BIBM 2021, NSF CAREER Award, and the SIGKDD Dissertation Runner-Up Award. Her team got third place in the inaugural Alexa Prize TaskBot Challenge in 2022, as the only award-winning team from the US.

“Multivariate Tails for Active Molecular Design”

Dr. Ji Won Park, Genentech

Abstract

Active design of therapeutic molecules requires the joint optimization of multiple, potentially competing properties. Multi-objective Bayesian optimization (MOBO) offers a sample-efficient framework for identifying Pareto-optimal drug candidates. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In this talk, I show a natural connection between the Pareto front and the extreme quantile of the joint cumulative distribution function (CDF). This link motivates the proposed Pareto-compliant CDF indicator and the associated acquisition function, BOtied. BOtied inherits invariance properties of the CDF well suited for the functional landscape of molecules. Moreover, an efficient implementation with copulas allows it to scale to many objectives. Outperforming state-of-the-art MOBO acquisition functions on a variety of synthetic and real-world problems, BOtied promises to drive model-based decisions for drug discovery.

Speaker’s Biography

Ji Won is a Principal Machine Learning Scientist at Prescient Design, Genentech. Her current research probes hierarchical, sparsity-inducing structures in high-dimensional data that can inform inference and adaptive decision-making. She focuses on developing algorithms in Bayesian optimization, calibration, and MCMC sampling inspired by challenges in molecular design. She received her Ph.D. in Physics from Stanford University, where she worked on hierarchical Bayesian methods for cosmology. During her Ph.D., she interned at NASA Ames and the Center for Computational Astrophysics at the Flatiron Institute. She holds a B.S. in Mathematics and a B.S. in Physics from Duke University.

“The Virtual Lab of AI Scientists”

Dr. James Zou, Stanford University

Abstract

This talk will explore how generative AI agents can enable drug discovery and development. I’ll introduce the Virtual Lab—a collaborative team of AI scientist agents conducting in silico research meetings to tackle open-ended R&D projects. The Virtual Lab designed new nanobody binders to recent Covid variants that we experimentally validated. Then I will discuss some interesting opportunities in designing and optimizing multi-agent interactions.

Speaker’s Biography

James Zou is an associate professor of Biomedical Data Science, CS and EE at Stanford University. He works on advancing the foundations of ML and in-depth scientific and clinical applications. Many of his innovations are widely used in tech and biotech industries. He has received a Sloan Fellowship, the Overton Prize, an NSF CAREER Award, two Chan-Zuckerberg Investigator Awards, a Top Ten Clinical Achievement Award, several best paper awards, and faculty awards from Google, Amazon, Adobe and Apple. His research has also been profiled in popular press including the NY Times, WSJ, and WIRED.


Session 2: Small Molecules Discovery & Development

Session 2 Talk Details

AI Scientists in Small Molecule Discovery and Development – How does it change the game?

Session Chair & Panel Moderator: Prof. Martin Burke, MMLI

“Predicting Chiral Structures: Data-Driven Approaches to Asymmetric Catalysis”

Dr. Jolene Reid, University of British Columbia

Abstract

We are developing and applying data-driven methodologies to study, synthesize, and utilize chiral molecular structures, with a particular focus on advancing asymmetric catalysis. In this talk, I will highlight recent efforts to address key challenges in predicting and generalizing the behavior of chiral catalysts across diverse reaction spaces.

I will first present a framework for quantifying catalyst generality by evaluating both the selectivity and the breadth of performance across chemical space. I will then show how this approach can be used to uncover underexplored yet broadly applicable chiral catalysts. By correcting for reporting bias in the workflow, I identify catalysts with high generality that may otherwise be overlooked in traditional discovery efforts.

Finally, I will demonstrate the tradeoffs in data representation for enantioselectivity prediction. Using a local chemical space-aware algorithm, I highlight the importance of rational dataset design in building efficient and accurate predictive models.

Together, these studies illustrate how the integration of statistical modeling, mechanistic insight, and chemical intuition enables a more strategic and scalable approach to chiral catalyst development and application.

Speaker’s Biography

Jolene Reid is an Assistant Professor at the University of British Columbia, where she leads a research group focused on catalysis, cheminformatics, and machine learning for reaction prediction, catalyst screening, and structure optimization.

She earned her Ph.D. from the University of Cambridge under the supervision of Professor Jonathan Goodman, integrating experiments and computations to study organocatalysis. She then joined the University of Utah as a Marie Skłodowska-Curie Fellow with Professor Matthew Sigman, focusing on statistical modeling of organic molecules and chemical reactions.

Dr. Reid has received prestigious awards, including the Amgen Young Investigator Award (2024) and has published over 40 papers in leading journals.

“Optimization of Catalytic Transformations using Bayesian Optimization in Sparse Data Regimes”

Dr. Richard Walroth, Genentech

Abstract

Applying machine learning algorithms to chemical problems remains challenging, especially in sparse data regimes where there is insufficient data to train complex neural networks. For most chemical reaction optimizations, the amount of data is limited and difficult to expand. Bayesian Optimization (BO) has become a workhorse technique for chemical reaction optimization (as it is well suited to this kind of problem). It has been shown to work exceptionally well with numeric parameters such as temperature, concentration, and reaction time. However, incorporating chemical structure of catalysts into a BO algorithm remains a more challenging task. I will present a new workflow which aims to combine chemical parameterization with standard BO algorithms to optimize a hindered Buchwald C-N coupling reaction. Methods for combining multiple classes of catalysts will be presented, as well as methods for multi-objective optimization capable of navigating search spaces in excess of 2 million potential reaction condition/catalyst pairs.

Speaker’s Biography

Dr. Richard Walroth recieved his undergraduate degree from the University of Florida where he worked with Prof. Lisa McElwee-White on multimetallic catalysts. He went on to receive his PhD from Cornell University, where he worked in the lab of Prof. Kyle Lancaster on bioinorganic chemistry and spectroscopy. Following his PhD studies, he worked as a postdoc at NASA where he used machine learning to automate data processing on space craft instruments. From there he went on to SLAC National Lab to work on applying machine learning techniques to powder X-ray diffraction interpretation. Finally, he joined Genentech in 2021 as a Senior Scientist working on using machine learning to optimize catalytic transformations.

“Teaching Language Models to Speak Chemistry”

Dr. Philippe Schwaller, EPFL & NCCR Catalysis

Abstract

Artificial Intelligence is transforming how we approach chemical research and synthesis. By teaching language models to understand and generate the language of chemistry, we have developed complementary AI systems that bridge the gap between computational design and experimental reality. Our large language model system, ChemCrow, represents one of the first demonstrations of an AI system directly controlling robotic synthesis platforms, successfully executing the synthesis of compounds including organocatalysts and chromophores. Complementing this, our small language model system, Saturn, currently the most sample-efficient molecular design algorithm, enables precise molecular generation with built-in synthesizability constraints. Saturn’s innovations include direct optimization against retrosynthetic predictions and integration of building block availability, ensuring that generated molecules are practically accessible. Our work demonstrates how different scales of language models can work together to transform chemical research, from initial molecular design through to physical synthesis, potentially revolutionizing drug discovery, catalysis, and materials development.

Speaker’s Biography

Philippe Schwaller joined EPFL as a tenure-track assistant professor in the Institute of Chemical Sciences and Engineering in February 2022. He leads the Laboratory of Artificial Chemical Intelligence, which works on AI-accelerated discovery and synthesis of molecules and materials. Philippe is a core PI of the NCCR Catalysis, a Swiss centre for sustainable chemistry research, education, and innovation, and a co-lead of the foundation models for sciences pillar in the Swiss AI initiative. He belongs to a new generation of scientists with a broad set of skills – in his case, a combination of chemistry, materials science, computer science, and experimental research. Before EPFL, Philippe worked for 5 years at IBM Research and simultaneously completed an MPhil in Physics (University of Cambridge) and a PhD in Chemistry and Molecular Sciences (University of Bern). He also holds a BSc and MSc degree in Materials Science and Engineering (EPFL).

“Data Science Enabled Synthesis”

Dr. Kaid Harper, AbbVie

Abstract

Coming soon…

Speaker’s Biography

Coming soon…


Session 3: Materials Discovery & Development

Session 3 Talk Details

AI Scientists in Material Discovery and Development: Where are we? Where do we want to go next?

Session Chair & Panel Moderator: Prof. Charles Schroeder, MMLI

“Where is the Prediction Frontier in Materials Chemistry?”

Prof. Brett Savoie, University of Notre Dame

Abstract

Coming soon…

Speaker’s Biography

Brett Savoie is the inaugural Coyle Mission Collegiate Professor of Engineering in the Department of Chemical and Biomolecular Engineering at the University of Notre Dame. Brett graduated with degrees in chemistry and physics from Texas A&M University in 2008, obtained his Ph.D. in theoretical chemistry from Northwestern University in 2014, and from 2014-2017 was a postdoc with Thomas Miller at Caltech. In 2017, Brett joined the faculty of the Davidson School of Chemical Engineering at Purdue University, where he established an independent research group to develop physics-based and machine learning methods to characterize and discover new organic materials. In 2022, Brett was promoted to the Charles Davidson Associate Professor of Chemical Engineering at Purdue University. In July 2024, Brett joined the faculty at Notre Dame to advance computational materials research and lead the university’s Scientific AI (SAI) initiative. Brett is the recipient of the ACS PRF, NSF CAREER, Dreyfus Machine Learning in the Chemical Sciences, and ONR YIP awards.

“Engineering Electronic Polymers using Self-Driving Laboratory”

Dr. Jie Xu, Argonne National Laboratory

Abstract

The development of electronic polymers has lagged behind the rapidly growing demand for advanced materials in flexible devices, large-scale printable electronics, and sustainable energy applications. This slow progress stems from the vast design space and complex processing conditions required, making precise design a formidable challenge. Balancing critical properties—like electronic mobility, strength, ionic conductivity, sustainability, and processability—further complicates the development pipeline. To address these challenges, we are pioneering new approaches to accelerate the electronic polymer development pipeline. While AI-driven materials research has seen rapid advances, applying these technologies to electronic polymer design remains challenging, particularly due to the limited data availability stemming from the lengthy design-make-test-analyze cycle in electronics. Our work focuses on accelerating the design of functional polymers by leveraging AI and automated robotic experimentation. This talk will highlight research conducted in our self-driving lab, Polybot, covering topics from the inverse discovery of electrochromic polymer structures, the controlled assembly of conducting polymers through solution processing, and the discovery of design principles for mixed-conducting polymers in electrochemical transistors. We will also discuss ongoing efforts to evolve Polybot into a more adaptive system with enhanced human-machine interfaces and as a community resource by building a specialized functional polymer digital ecosystem.

Speaker’s Biography

Jie Xu is a scientist at Argonne National Lab and a CASE Affiliated Scientist at the University of Chicago’s Pritzker School of Molecular Engineering. Her research focuses on precision engineering of functional polymers through molecular packing structure engineering, chemical design, and self-driving laboratories (https://www.anl.gov/cnm/polybot). Jie earned her PhD in chemistry from Nanjing University, specializing in nanoconfined soft matter, and completed postdoctoral training at Stanford in stretchable electronics. She received the Materials Research Society Postdoctoral Award and is named to the MIT Technology Review’s list of Innovators Under 35, Newsweek list of America’s Greatest Disruptors as a budding disruptor, and 2023 Polymeric Materials: Science and Engineering Early Investigator Honoree by the American Chemical Society.

“Automation and Machine Learning for Polymer Biomaterials”

Dr. Adam Gormley, Rutgers University

Abstract

The seamless integration of synthetic materials with biological systems long remains a grand challenge, often curtailed by the sheer complexity of the cell-material interface. For decades, biomaterial scientists and engineers have designed around this complexity by rationally designing new materials one experiment at a time. However, recent advances in laboratory automation, high throughput analytics, and artificial intelligence / machine learning (AI/ML) now provide a unique opportunity to fully automate the design process. In this seminar, we put forth our efforts to develop a biomaterials acceleration platform (BioMAP) (i.e., self-driving biomaterials lab) that can rapidly iterate through design spaces and identify unique material properties that perfectly synergize with biological complexity.

Speaker’s Biography

Adam Gormley is an Associate Professor of Biomedical Engineering at Rutgers University, Executive Editor of Advanced Drug Delivery Reviews, and co-founder of Plexymer, Inc. Prior to Rutgers, Adam was a Marie Skłodowska-Curie Research Fellow at the Karolinska Institutet (2016) and a Whitaker International Scholar at Imperial College London (2012-2015) in the laboratory of Professor Molly Stevens. He obtained his PhD in Bioengineering from the University of Utah in the laboratory of Professor Hamid Ghandehari (2012), and a BS in Mechanical Engineering from Lehigh University (2006). In January 2017, Adam started the Gormley Lab which seeks to develop bioactive nanobiomaterials using robotics and artificial intelligence. Dr. Gormley is currently the PI of an NIH R35 MIRA Award, an NSF CBET Award, and an NSF Designing Materials to Revolutionize and Engineer our Future (DMREF) Award. He is the recipient of the A. Walter Tyson Assistant Professorship, the Young Innovator Award by Cellular and Molecular Bioengineering, and the Presidential Fellowship for Teaching Excellence.

Poster NoNameOrganizationTitle
1Utkarsh SharmaUniversity of Illinois Urbana-ChampaignEquiCat: An Equivariant Neural Network Architecture for Predicting Enantioselectivity in Asymmetric Catalysis
2Carl EdwardsUniversity of Illinois Urbana-ChampaignKnowledge Empowered Joint Language and Molecule Modeling to Accelerate Drug Discovery
3Siru OuyangUniversity of Illinois Urbana-ChampaignStructChem: Structured Chemistry Reasoning with Large Language Models
4Thao NguyenUniversity of Illinois Urbana-ChampaignFARM: Functional Group-Aware Representations for Small Molecules
5Ayush ShahRochester Institute of TechnologyChemScraper: Pipeline for Parsing Raster and Vector Molecule Diagrams from PDFs
6Hongxiang LiUniversity of Illinois Urbana-ChampaignReaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data
7Mohit AnandPennsylvania State UniversityAI-Enabled Next-Gen Biosynthesis Planning
8Tianhao YuUniversity of Illinois Urbana-ChampaignEnzyme function prediction using contrastive learning
9Xuan LiuUniversity of Illinois Urbana-ChampaignAI for retrosynthesis and small molecule discovery
10Blake OcampoUniversity of Illinois Urbana-Champaignmolli: A General-Purpose Python Toolkit for Combinatorial Small Molecule Library Generation, Manipulation, and Feature Extraction
11Le YuanUniversity of Illinois Urbana-ChampaignLeveraging Large Language Models for Enzyme Catalytic Efficiency Prediction
12David AshleyChemetrianChemetrian: User-friendly machine learning tools for chemists
13Jason WuUniversity of Illinois Urbana-ChampaignSolid-Phase Synthesis of Sequence-Defined Oligomers: Enabling Closed-Loop Discovery of Functional Materials
14David FridayUniversity of Illinois Urbana-ChampaignMMLI Kaggle Competition

Venue


The MMLI Annual Symposium will take place in the Auditorium at the National Center for Supercomputing Applications (NCSA), on the north end of the campus of the University of Illinois Urbana-Champaign.

Symposium Location on Google Map

Venue Map

All talks will be in the Auditorium and meals/breaks will be in the Atrium