Chemists and chemical engineers have modeled molecules for decades, but artificial intelligence and foundation models offer the prospect that researchers could train models with predictive abilities in one area of chemistry that could be fine-tuned for another. Trustworthy chemistry foundation models could help streamline the experimental time and resources needed to discover new medicines or design new batteries. Massachusetts Institute of Technology Ph.D. student Jackson Burns is working on these questions. He describes the promise and challenges of building foundation models in chemistry, his work on chemprop, and his advice to other researchers interested in working on foundation models for chemistry and science in general.
You’ll meet:
- Jackson Burns is a Ph.D. student in William Green’s chemical engineering group at MIT. He’s also a third-year Department of Energy Computational Science Graduate Fellowship (DOE CSGF) recipient. He completed his undergraduate degree in chemical engineering at the University of Delaware.

From the episode:
Jackson talked about chemprop, a molecular property prediction tool developed in the Green group at MIT.
He discussed some of the challenges of building foundation models. He presented a poster about this research at the 2024 DOE CSGF annual program review.
For more reading about foundation models for chemistry and materials, check out these two recent review articles:
- Foundation models for materials discovery – current state and future directions
- A Perspective on Foundation Models in Chemistry
Related episodes:
Transcript
SPEAKERS
Jackson Burns, Sarah Webb
Sarah Webb 00:03
In this episode, in our AI for Science series on Science in Parallel, we’ll take a look at chemistry. It’s another area rich with data and important problems to solve, such as finding new energy storage materials and developing new drugs. I’m your host, Sarah Webb, and my guest is Jackson Burns, a Ph.D. candidate in chemical engineering and computation in William Green’s group at the Massachusetts Institute of Technology. Jackson is also in the third year of his Department of Energy Computational Science Graduate Fellowship. This podcast is a media outreach project of that program.
Sarah Webb 00:55
Like our previous guests in this series, Jackson works on foundation models, a flavor of artificial intelligence that makes predictions using a similar strategy to large language models. We talk about the challenges and opportunities that come in building chemistry foundation models, his work on a tool called chemprop, and his advice to other researchers interested in working in the field.
Sarah Webb 01:28
Jackson, it is great to have you on the podcast.
Jackson Burns 01:31
It’s a pleasure to be here.
Sarah Webb 01:32
So let’s kind of start with some of the roots of what you do. Tell me a little bit about how you got interested in chemistry. What’s fascinating about it to you?
Jackson Burns 01:42
Chemical engineering is sort of a marriage of math with chemistry, so we still think about a lot of the same fundamental questions that chemists do. So what is this molecule going to do in a reactor or in the human body or in a solar cell. But as engineers, we’re very happy to come up with a nice, concise mathematical model and then sort of abstract away a lot of the details about how things work at the very lowest level, so that we can, you know, build emergent systems using those technologies. Chemistry in particular, has always been really impressive to me. I love the granularity of going down to the level of individual molecules and changing around single atoms at a time. Being able to control a system at that level of precision has always just been so impressive to me. So what about computing? As an undergraduate, I was doing research with Don Watson at the University of Delaware. We were designing these ligands, these really complex molecules to catalyze some relevant pharmaceutical reactions, making bonds between carbon and carbon or carbon and nitrogen. And these ligands that we were designing were enormously complex, very heavy molecules, tons of different functional groups on there, and lots of sort of empirical rules about, you know, what might happen if I were to substitute one functional group for another in combination with, you know, these other factors.
Sarah Webb 03:14
Some quick chemistry background: Molecules are made of individual atoms, but clumps of atoms known as functional groups within those molecules often work together to shape chemical reactions and other properties.
Jackson Burns 03:30
Being able to design those to promote one reaction that you really like while avoiding another reaction that you don’t can be done one of two ways. One of them is to do chemistry for decades and get a really good intuitive understanding of what each of those groups does and how you can control things. The other way that I ended up sort of being drawn towards is using computing. So at that time, it was not AI, it was not ML. It was just data science, you know, identifying heuristic trends in the data we had about all of the different ligands on hand, and trying to use that to draw conclusions, rather than just pure intuition. So then, virtual screening was a really common technology that we were using, at the time, basic property prediction models to try to avoid having to do experiments.
Sarah Webb 04:24
What does AI bring to the table?
Jackson Burns 04:26
When I started my first-ever data science and chemistry project, it was very myopic. We were focused on exactly one type of reaction and exactly one outcome for our prediction, and that was completely appropriate. But as I went on, I thought, how could we expand this to work on different systems? What AI really brings to the table in chemistry is the ability to learn a relatively small set of modeling approaches, but approach then a very wide range of chemical problems. So the same software that our group develops for molecular property prediction has found use across the pharma industry. It’s found use in materials science, developing solar cells. It’s found use in combustion technologies, predicting thermodynamic properties of molecules.
Sarah Webb 05:20
And foundation models, obviously, are this, you know, emerging flavor of AI. What do you think is interesting and exciting about them?
Jackson Burns 05:28
Yeah, I think to understand why foundation models are so interesting to the chemistry world, we have to sort of think about what makes them interesting to the world at large. So what I like to think of as the example of the AI assistant on every website. So even the car dealership down the road from me, if I want to go buy a truck, I can go talk to their AI agent for a little while and maybe try to get it to agree to sell me the truck for like, $5 or something. The car dealership down the street does not have the capacity to train an AI model to speak English. They don’t even have GPUs. They don’t have programmers, probably, on staff. But there are other companies, like OpenAI, for example, that have trained these foundation models in the language space, which are broadly capable of speaking English. And then what you can do with those foundation models is take some significantly smaller portion of data and basically train the model just a tiny little bit more on this more specific application.
Jackson Burns 06:37
And then you can deliver like a customer service agent for your car dealership with just the historical records you have of previous customer service interactions. Now in the chemistry space, you know, obviously we’re not selling cars, but we’re selling chemicals, I guess. And what we have a lot of are very, very small data sets of chemical information. So we might have run a handful of experiments, maybe just a few dozen experiments, running really expensive individual molecules through a really long experimental pipeline. So it’s both a long monetary investment to figure out how a molecule behaves, but also a huge time investment. So we’re really drawn to foundation models because by analogy to what they do in natural language processing, chemical foundation models are starting from a really rich understanding of chemistry. They’re basically got a leg up over a model that we would have to develop from scratch. So then we can take our really small datasets and fine tune these existing chemistry foundation models to work really well with very few examples to start from.
Sarah Webb 07:49
So tell me a bit more about what you’re specifically working on in this area.
Jackson Burns 07:55
In the Green group at MIT, we sort of have a dual focus. A lot of our work goes into energy systems, so combustion of common feed stock, petrochemicals, and then a lot of our work is also in the pharma space. We’ve developed machine learning package called chemprop. So a good motivating example, then, inside the pharmaceutical space, it’s very important to understand how much of a drug molecule will be taken up by the body. That value can be represented by a few different sort of summary numbers that we think about. There’s dose-response curves, solubility, partition coefficients, things like that. And very often, because these are expensive molecules that need to go through a pretty extensive experimental pipeline, it can be very challenging to come up with these computational models to predict these values, just because we have so few examples, which is why we’re drawn to AI to supplement a place where experiments would be really expensive.
Sarah Webb 08:59
I’d like to hear more about this notion of chemprop. What have you been working on?
Jackson Burns 09:03
I’ve been approaching foundation modeling in two different ways, one of which is derived from the literature methods on graph-based modeling, and then the other is a little bit more exploratory and some mixed results at the beginning. So the former, we’re using as a starting point a software package developed in the Green group at MIT called chemprop. Chemprop follows the procedure where we take a molecular graph, and we represent each of the atoms as a node with some really basic information about it, and we represent each of the bonds in a similar way. And then through chemprop, we can learn a representation of that graph as just a vector of scalar valued numbers, which then has some meaning about what the molecule will actually do and then in foundation modeling as an extension of that. So our hope is that in order to be able to fit on really, really small datasets, you know, 100 or fewer examples, we need to basically give chemprop a head start when it’s doing its modeling. So we’re on the lookout for data sets which we can train chemprop on that are covering a wide swath of interesting chemistry or cover targets that people commonly care about for various industries.
Jackson Burns 10:30
And our current mission is training a chemprop model on such a dataset and then being able to share the resulting model as a starting point for other people to continue their own modeling work and again, achieve, hopefully very accurate predictions on much smaller amounts of data. There’s a number of groups in the literature doing similar work. Some of the most successful foundation models we’ve seen in the chemistry space are operating on graphs directly. There are some operating on text-based representations as well, but many of the best are operating on graphs, and they use similar mechanisms to chemprop. We’re confident that that will work, but we’re keeping open another line of inquiry, which actually takes this modern approach of foundation modeling and combines it with that historical approach of using these molecular descriptors.
Jackson Burns 11:24
So because people have been doing chemistry modeling for 100 years at this point using computers, there are massive numbers of molecular descriptors that, at one time or another, someone has noticed are really useful for property prediction, and you can use software tools to calculate thousands of these descriptors for any given molecule that you care about. And that actually sort of gets you back into the domain of you need a lot of data in order to fit a model using all of those descriptors naively. So what we’re trying to do is down-select the number of descriptors that you start from in a really intelligent and general way, just like a foundation model would do, in order to achieve, again, those predictions on really small data sets.
Sarah Webb 12:19
So basically, trying to figure out exactly what you need to make a foundation model that can predict what you want it to predict.
Jackson Burns 12:28
Yeah, absolutely so if you’re starting from scratch with a model like chemprop, it has very basic information about chemistry. I was mentioning it only knows things like how heavy each atom is and the degree of the bond in between atoms. So during training, when you’re trying to make these models accurate, it first has to relearn a lot of the basics of chemistry before it can really start paying attention to the target that you care about. And in the historical approach, when you are designing a molecular descriptor specifically to work with the property that you care about, you wouldn’t have that problem because the model at the very beginning is just looking for a relationship which we know exists. It doesn’t have to relearn chemistry because we’ve given it a really intelligent, well-informed starting point. But when we combine these huge numbers of molecular descriptors, we actually end up in sort of an in-between world where the model has a lot of information to start with, but still needs to filter out some of the data, which might not be helpful for the problem that we specifically want to solve. So that’s where the foundation modeling comes in. With molecular descriptors we really want to get a head start on that descriptor selection process, and in the process of doing so, come up with a smaller, more meaningful representation, which hopefully applies to a pretty wide space of chemistry.
Sarah Webb 14:05
Any surprises along the way? I’m sort of curious where you are with this.
Jackson Burns 14:09
As is very common in the chemistry modeling world, we end up with a mixed bag of results. And what we’ve seen so far is it gets a little bit better in some places and then dramatically worse in others.
Sarah Webb 14:21
Wow.
Jackson Burns 14:22
So it’s always surprising when you get a result like that, and it can definitely be frustrating at the outset. But the helpful part about having really obvious failures is that you have really obvious starting points for what to try and fix. One good example that we’ve found so far is that this compressed descriptor-based representation really, really struggles at predicting solubility. There is probably a very precise scientific reason as to why that is happening. But what we’ve noticed, you know, sort of deducing meaning from the representation that we ended up with is that our original set of descriptors that we came up with only had a few that were highly correlated to the solubility. And it happened to be that during the training process of the foundation model, it didn’t pay any attention to those purely for the fact that there were so few.
Jackson Burns 15:23
So we’ve had to come up with a lot of different approaches, and this is where the work is ongoing, of encouraging the model not to just come up with the most accurate representation on average, but maybe the most helpful representation that applies to a wider part of chemical space, broadly speaking. When we as chemical engineers mentioned chemical space, we’re talking about the set of molecules that are interesting to the task that we care about. So when we’re talking about combustibles, for example, chemical space might mean primarily things that you could pull out of the ground: so long alkane chains, things with maybe just a couple oxygens on there, some aromatic compounds. But that section of chemical space would be very different than that of a pharmaceutical sort of application. And the fundamental challenge, then, of foundation modeling is you want to come up with a really generic representation that applies across a lot of chemical space. But if you’re just training a model on a single task from scratch, you can come up with a representation that works really well in that narrow set of chemical space. So there’s this interplay between breadth and specificity that is really challenging but really exciting to be at the forefront of.
Sarah Webb 16:50
I want to zoom out just a little bit. AI is moving incredibly rapidly. How do you see this interface changing affecting the world of chemistry and chemical engineering? In the next few years, what are your kind of big-picture predictions about where things are going and how this might help with innovation?
Jackson Burns 17:11
I think the single most promising aspect of AI and chemistry is going to be avoiding dead ends, so something that’s sort of inherent in the discovery process in all chemical fields is the fact that you will design molecules that, on paper, look like they should work, and in reality, don’t work at all. And it takes a significant amount of time and money to go through the process of creating these molecules in the real world, translating them from paper, running them through some experimental procedure only to find out that they don’t work. And negative results are important, and they can be really helpful, but at some point it can be really, really difficult to make forward progress. So what I think these AI models are going to be able to provide to us is a relatively high-quality estimation of whether or not we should invest human time in trying something.
Jackson Burns 18:14
One of my personal favorite use cases of language models broadly is in coding. So if I have some function that I want to implement, I can usually ask a language model and describe what inputs I want to give it, what outputs I want to give it, and what I expect the broad approach should be, and then it will generally do a pretty good job. There will be times where I have to fix a couple small things from the suggestion, and that’s okay. And then there’s gonna be rare times too, where it’s totally wrong. But that still accelerates my time to deliver useful code. And I think of the same thing in chemistry. So even if we have a model that only like 50% of the time is able to correctly discern that some sort of a molecule is really not going to work for whatever pipeline that we care about, we can still filter down a lot of really unhelpful chemistry that we can avoid, and in doing so, accelerate the discovery process
Jackson Burns 19:18
A very recent application that looks even more like natural language models is in generative AI for chemistry. So once you have models that can predict individual properties that you really care about, we’ve mentioned things like uptake inside the human body or solubility. Once you have models that can predict those properties accurately, you can use generative AI to design new molecules that have properties that you want. And human creativity is completely unlimited, of course, but machine creativity, if you want to call it, that, has the advantage of just breadth. So a computer model producing new molecules has very little bias in terms of what it thinks may or may not work. So it’s really going to throw everything at the wall and see what sticks by calling on those models that you’ve developed for individually very accurate property prediction. That’s definitely becoming much more pervasive the use of generative AI, at least as a lead sort of generator, in some cases, even as like a medicinal chemist, in terms of suggesting exact therapeutic molecules to achieve some sort of an end.
Sarah Webb 20:35
That’s really cool. As you look at the broader space of foundation models, what are you excited about there? Are there any trends that you’re watching going: Hey, I think that’s where things might be headed?
Jackson Burns 20:47
Yeah, I’ve seen just recently, there’s been an explosion of interest in what are called mixture-of-expert models. We’re familiar with this idea that an individual language model will sometimes just make things up that sound realistic and sound possible, but in reality, you know, are not in reality. But these mixture-of-expert models and the reasoning models, to some extent too, they have this capacity to self-correct and reason, as the name suggests, through the responses that they’re giving you, and reduce the amount of erroneous information that it’s putting out. And I am very hopeful that that technology is going to make a transition into the science domain. So we’ve seen a few foundation models and language models that are specific to reading scientific papers, and at the moment, they do okay. They will still occasionally make up papers that don’t exist or cite fictional authors. But, again, in terms of accelerating the pipeline of science, I think that the promise of those models is huge, and if we’re able to get even a close approximation of some of the most pie-in-the-sky promises that there are out there about them, we’ll really have something useful that I think would make a huge contribution to science.
Sarah Webb 22:14
Are there any other big challenges that you see for this field?
Jackson Burns 22:18
The other outstanding challenge for chemical foundation models is the practice of actually deploying them. So many of the people that are contributing in the language-model space are using the exact same tool kits. I think chemistry has yet to see a real standardization within that domain. So one of the outstanding challenges, then, is really sort of just a software-engineering challenge. We need a good standardized library that everyone agrees to use. We’re definitely at a place right now as a field where using a foundation model requires you to read a scientific paper about the model, find the code online, hopefully, and then you’ll yourself have to understand the code at least a little bit in order to be able to make some sort of a meaningful prediction using their foundation model. Whereas, again, I look at the natural language space, there are online platforms, for example, that are also open-source, and some of them are even free, where you can access foundation models with no code at all, with no research papers to be read in the way of accessing them. So we’re definitely quite far away from that, but a standard tooling like that would really contribute to my greater hope for the field, that we can start to deploy these at scale and really accelerate discovery.
Sarah Webb 23:48
What advice do you have for other people who are interested in this intersection of either AI in chemistry or AI in science more broadly?
Jackson Burns 23:58
I would suggest going way down to the fundamentals of how AI actually works, and I’m talking linear algebra and optimization and numerical methods, things like that. Very often when you are troubleshooting AI models, regardless of if it’s just for a specific problem, or if it’s a foundation model in general, you can only get so far before you need to make some very careful decisions about how things are implemented or how you should best run things in order to take advantage of the hardware that you have. And without a very low-level understanding of what’s really going on behind the scenes, it can be difficult to make those decisions in an informed way. So I think to really stand out as a researcher in that space, or as someone who wants to take advantage of these technologies, really put in the time and learn those fundamentals about linear algebra and optimization and numerical methods, all of those core components that go into making these emergent tools.
Sarah Webb 25:03
Jackson, thank you so much. It’s been such a pleasure talking with you and learning more about this interface between foundation models and chemistry. Thanks for your time.
Jackson Burns 25:14
Thank you. I’m happy to be here.
Sarah Webb 25:19
To learn more about Jackson Burns, foundation models for chemistry and chemprop, please check out our show notes at scienceinparallel.org. Science in Parallel is produced by the Krell Institute and is a media project at the Department of Energy Computational Science Graduate Fellowship program. Any opinions expressed are those of the speaker and not those of their employers, the Krell Institute or the U.S. Department of Energy. Our music is by Steve O’Reilly. This episode was written and produced by Sarah Webb and edited by Susan Valot.