Wrapping up our discussion of the 2024 Nobel Prizes in Physics and Chemistry, computer scientist Mansi Sakarvadia and computational structural biologist Josh Vermaas talk about the recent prizes and what they mean for science. You’ll hear about how the prizes both break down research barriers and introduce concerns about misinformation and public trust. The research honored with the chemistry prize has already changed how researchers study questions that involve understanding proteins’ structures. For more on the 2024 Nobel Prizes, check out our recent interview with Anil Ananthaswamy.
You’ll meet:
- Mansi Sakarvadia is a Ph.D. student in the computer science department at the University of Chicago and a current Department of Energy Computational Science Graduate Fellow (DOE CSGF). She completed a bachelor’s degree in computer science and mathematics with a minor in environmental science at the University of North Carolina, Chapel Hill.
- Mansi studies ways to interpret how machine learning models work. For example, if a model doesn’t behave as expected, she attempts to diagnose which parts don’t work correctly and find ways to optimize the model without starting over from scratch. Her recent preprint Mitigating Memorization in Language Models examines strategies to curb LLMs from regurgitating their training data, a situation known as “memorization. She explains more about the paper in this blog post.
- Josh Vermaas is an assistant professor at Michigan State University. His research in computational structural biology focuses on understanding photosynthesis and energy transfer processes in plants as part of the MSU-DOE Plant Research Laboratory. He completed his Ph.D. at the University of Illinois, Urbana-Champaign and was a DOE CSGF recipient from 2011 to 2016.
- In a recent paper from Josh’s group in Proceedings of the National Academy of Sciences, the team showed that microcompartments within blue-green algae help to enhance the capture and use of carbon dioxide within these single-celled organisms. This work could help with future bioengineering strategies for fixing carbon, bioenergy and other applications.
Transcript
Sarah Webb 00:00
I’m your host, Sarah Webb, and this is Science in Parallel, a podcast about people and projects in computational science. Recently, we have been discussing creativity in computing. In our last episode, I spoke with science journalist Anil Ananthaswamy about artificial intelligence and his recent book, Why Machines Learn, and the 2024 Nobel Prizes in Physics and Chemistry. In this shorter episode, you’ll hear from computational scientists whose work is shaping and has been shaped by AI.
Sarah Webb 00:42
These two episodes also mark a transition into season six of the podcast, which will focus on AI. Specifically, we’ll be talking about foundation models, how researchers are applying them, how they could change the process of doing science, and more. That will start next month. I encourage you to subscribe to the podcast, so you’ll get those episodes as soon as they drop.
Sarah Webb 01:07
But back to today’s topic: AI is already disrupting science’s norms with a mix of enthusiasm, hope, hype and concerns. My two guests are current or past recipients of the Department of Energy Computational Science Graduate Fellowship, commonly known as CSGF. That program also sponsors this podcast. One of my guests focuses on machine learning research and the other on computational structural biology. We spoke at the annual Supercomputing conference, SC24, in Atlanta in November.
Sarah Webb 01:52
I started with Mansi Sakarvadia, a computer science Ph.D. student at the University of Chicago. She works on machine learning, the branch of AI that gleans insights from data without explicit programming.
Mansi Sakarvadia 02:06
I’m a Ph.D. student, and my work is mostly focused on research projects about interpreting how machine learning models work. So an example of this might be if you have a machine learning model deployed in some workflow and it’s exhibiting behavior that was unintended or unanticipated. My job would be to come in and try and diagnose what parts of the model are malfunctioning and come up with ways to make that model work better and faster without retraining the model from scratch.
Sarah Webb 02:35
When the Nobel Prizes were announced, Mansi followed the news pretty closely.
Mansi Sakarvadia 02:40
I believe physics came out first, and news about it. And it was very surprising. I was getting, like, the breaking news alerts on my phone. And I was especially surprised because it was Geoffrey Hinton, who’s a huge name in the field of machine learning. And I, at first, I did not see the connection to physics specifically. And I had to read a little bit about why the Nobel Committee justified that, and I believe the way they justified that prize to a machine learning person in the field of physics was that one the neural networks they claim are inspired by physical phenomenon related to how the brain functions, which I think got a lot of immediate backlash, as that is, I believe, not really, actually the direct intuition as to why machine learning models work the way they do. But then also that these machine learning models have made great strides in enabling physical discoveries, which I think is a well-founded claim. And so I was really excited about that one. And then the one that came out shortly after that was chemistry, and one of the prize winners was at U Chicago.
Sarah Webb 03:36
She’s referring to John Jumper, a research scientist at Google DeepMind who completed his Ph.D. there.
Mansi Sakarvadia 03:44
It was a unique experience. It was very exciting. It was also, I think, a bit polarizing for some of the departments. Like in computer science, people were very many people were very pro the decision the Nobel committee made. But a lot of people, I think, in the physics and the chemistry department, felt that perhaps the prize was being given out of the actual context of the subject. So I think there were a lot of discussions being had, but I think overall, there’s a sense of excitement on campus.
Sarah Webb 04:10
I asked Mansi what she thinks the prizes mean for the field of machine learning going forward.
Mansi Sakarvadia 04:16
One thing is, I think machine learning in the past five to 10 years has a really hard time being recognized in some domain science applications as a legitimate tool to help drive the field forward and has sometimes been recognized as something that’s almost adversarial to like a domain scientist’s main job, rather than a tool that may augment a domain scientist’s main job. And I think these prizes somewhat legitimize the role that a well-trained, well-calibrated and measured and evaluated model can have in a field. And at the same time, I think there are perhaps some negative consequences with people who don’t take the proper steps to evaluate their model and maybe just make claims that are unfounded about what machine learning can do, which may actually set back the domain sciences. So I think there’s a conversation to be had about how can machine learning practitioners appropriately collaborate with the domain sciences and understand what needs are and address those needs directly in, like, a well-engineered way, rather than saying things like, Hey, I’m also going to take over the world, or they’re going to replace scientists altogether. Because, you know, that has implications on public trust of the scientific process.
Mansi Sakarvadia 05:24
But at the same time, it helps machine learning practitioners step into those roles that they were having trouble with, I think a few years ago.
Sarah Webb 05:32
What does it mean to you personally?
Mansi Sakarvadia 05:34
I think it’s super exciting. I think my field in particular, because I focus on asking the question, can you even understand what does a machine learning model do? And I’m of the opinion, maybe a little biased, that you can, to somewhat of an extent, try and understand how a model works. They’ve often been described as the word black box, but I personally think that that is a limited view of how you can use machine learning as a tool. And so I think the prize for my work, again, legitimizes some of the work I do, and it maybe shines a light on the fact that by interpreting the models a little better, you can make them more useful tools for scientists and even people outside of science.
Mansi Sakarvadia 06:13
I think education around misinformation, especially as machine learning-generated content becomes pervasive, is going to be really important, both in the sciences, but even at a younger age. I know we have literacy education in schools, but I think even enhanced media consumption education in light of AI generated content is really, really important, because sometimes it’s really hard to tell if a photo is AI generated, if it’s real or if it’s fake. And the same goes for scientific content, especially as we’re seeing that some papers are being authored by in sciences by language models, which may have unfounded claims or incorrect information in them. So I think machine learning has really good impact when it’s used in appropriate ways, with people who understand how it’s working, in collaboration with those who understand the domain science. But it can be dangerous if you don’t understand the shortcomings.
Sarah Webb 06:50
Josh Verrmaas had a different insider perspective. When I mentioned this podcast episode and what I’d hoped to learn, he told me that the innovation recognized by this year’s chemistry prize had shifted his computational research in profound ways. So we met to talk more about that.
Josh Vermaas 07:24
My name is Josh Vermaas, and I work as an assistant professor at Michigan State University in the Plant Research Lab. So as part of the MSU-DOE Plant Research Laboratory the big thing that we’re studying is photosynthesis and energy transfer from the light-harvesting processes in plants. And I am the computational arm and muscle for a big group of plant scientists who are trying to understand the inner workings of photosynthesis and associated processes.
Sarah Webb 07:49
Josh saw the 2024, Nobel Prize in Chemistry coming.
Josh Vermaas 07:54
I was like, yeah, it was going to either happen this year or next year, but it’s going to happen at some point, because, particularly for the chemistry Nobels, AlphaFold, RosettaFold have been such big influences in changing how science is done so quickly that there’s really no comparison in terms of impact for other techniques, right? So I think the biggest previous Nobel Prize for me was the 2013 chemistry Nobels, when it was all about molecular simulation. And that work was 20, 30 years before what they were awarding it for, but the impact has just been so big that, yeah, it was either going to happen right now or maybe a year from now. But it was going to happen,
Sarah Webb 08:31
AI now overcomes a major research bottleneck in biology, the labor-intensive work to determine the structure of a protein using experiments.
Josh Vermaas 08:43
So back when I was a graduate student as a computational structural biologist, the first question I would always have to ask is, do you have a structure of this protein? And normally the answer was no, and then my response would always be, well, it’s not going to be worth my time to predict a structure, because the structure is probably wrong. Come back to me when you actually have a real structure. Now, with AlphaFold, I just have to ask the question, is there a FASTA file for this?
Sarah Webb 09:07
So FASTA files are text files that include the sequence of amino acids that are linked together to produce a protein. Each amino acid is noted by a single letter.
Josh Vermaas 09:19
Now, since I work in plants now, where the number of structures of the exact right protein is not very large relative to what you might see in humans or mice or other well-studied species. This has been a tremendous advantage, because now I don’t have to depend on, did someone happen to crystallize or take the cryo-EM structure in the exact right conditions? It’s does AlphaFold give me something that looks reasonable. And in my short experience with it, even AlphaFold 2, if you started from that starting structure, would give you something you could simulate and actually learn about the structure and dynamics of that system. So it’s really opened up a lot of new avenues for the kinds of research that we can do.
Sarah Webb 09:56
So I asked Josh to tell me more about some of these new ideas that he can now pursue.
Josh Vermaas 10:02
About a week ago, as we’re recording this, so AlphaFold 3 finally released its source, and now we can use AlphaFold 3, which involves small molecules that you can bind to proteins. So right now, we are looking at how do small molecules bind to plant proteins, investigating like the reverse of, hey, we know that plants make these signaling molecules. What do they actually bind to? Are there things that people have missed? Another thing that we’re working on right now is there are a lot of proteins of unknown function. Maybe, if we can figure out which proteins interact with what, we might be able to associate what those functions are based off of what else they are interacting with. And those are the kinds of questions that until you had a structure prediction software that was fast enough and easy enough to use that you can get an answer in a couple of hours, you really didn’t have another way of getting at these kinds of details.
Sarah Webb 10:58
The way that AI can predict protein structures is just one example of how it can enhance scientific creativity, and we’re at the messy beginning of figuring out what else AI can make possible, while also making sure that it’s used wisely, safely and ethically.
Sarah Webb 11:17
If you’d like to learn more about Mansi Sakarvadia and Josh Vermaas and their research, please check out our show notes at scienceinparallel.org. As I mentioned at the top of the episode, we’ll focus on foundation models next and the many ways that researchers are thinking about applying them. These models aren’t just tools to explore ideas and solve challenging problems, they could change how we do research and what it means to be a scientist in the future.
Sarah Webb 11:49
Science in Parallel is produced by the Krell Institute and is a media project of the Department of Energy Computational Science Graduate Fellowship program. Any opinions expressed are those of the speaker and not those of their employers, the Krell Institute or the U.S. Department of Energy. Our music is by Steve O’Reilly. This episode was written, produced and edited by me. Sarah Webb.
Transcript prepared using otter.ai and then copyedited.
Episode cover image of Nobel Prize by David Monniaux and licensed via Creative Commons.