MRS Bulletin Materials News Podcast

Episode 4: Researchers pinpoint AI/ML training set to achieve accurate predictions

MRS Bulletin Season 7 Episode 4

In this podcast episode, MRS Bulletin’s Sophia Chen interviews Bowen Deng, a graduate student in Gerbrand Ceder’s group at the University of California, Berkeley, about their work on increasing the accuracy of artificial intelligence/machine learning materials prediction models. The use of computer simulations to predict the interaction between atoms in a given molecule is being replaced by machine learning. Researchers describe the atoms’ collective interactions as a quantity of energy, where higher energies correspond to stronger forces holding the molecule together. Now, Deng’s research group studied three machine learning models and found that they tend to predict lower energies than what is accurate by about 20 percent. The researchers have determined that these underpredictions were caused by biased training data and they found a way to remedy the situation. This work was published in a recent issue of NPJ Computational Materials.

SOPHIA CHEN: Welcome to MRS Bulletin’s Materials News Podcast, providing breakthrough news & interviews with researchers on hot topics in materials research. My name is Sophia Chen. Materials researchers have long used computer simulations of molecules to aid the discovery process. Conventionally, researchers use a quantum mechanical framework known as density functional theory, but it’s extremely time-consuming and computationally expensive. In the last three years or so, materials researchers have developed AI models that are much faster than the conventional methods. Bowen Deng, a materials research PhD student at the University of California, Berkeley, explains. 

BOWEN DENG: People have been replacing the very expensive quantum chemical calculations with this cheap, artificial based calculations.

SOPHIA CHEN: The point of the simulations is to predict the interaction between atoms in a given molecule. These interactions ultimately give rise to the material’s properties, such as its chemical reactions with other molecules and its mechanical responses. Researchers describe the atoms’ collective interactions by calculating their energy and forces, where lower energies correspond to the more stable formation of the compounds. The current challenge is to make these AI-assisted simulations more reliable. Deng’s team studied three machine learning models and found that they all tend to predict weaker interactions than what is accurate by about 20 percent. They describe this underprediction as a softening. For example, if you heat a piece of rubber, that heat actually lessens the forces between atoms, literally causing the material to soften. 

BOWEN DENG: The interaction between the atoms within the system becomes weaker.

SOPHIA CHEN: In this case, the models systematically predict the material to be softer than it actually is. Deng’s team realized that these underpredictions were caused by biased training data. In particular, their training data consisted of the output of conventional chemistry calculations, those time-consuming quantum mechanical calculations mentioned earlier. These calculations tended to be simulations of atoms in stable configurations.

BOWEN DENG: A lot of the previous quantum chemical calculations have been focused on the calculation of these stable materials.

SOPHIA CHEN: Materials that are stable are in their lowest energy configuration. This led the simulations to systematically predict energies to be lower. To remedy the issue, they gave the machine learning models more data. This time, they fed it simulations of materials out of equilibrium. They found they could reduce the error by 15% by giving the model one additional training example. 

BOWEN DENG: Since the errors in these prediction models become so systematic, the errors are also become easy to solve.

SOPHIA CHEN: This work shows that it’s relatively straightforward to correct systematic errors in these machine learning models. Such conclusions help guide Deng’s team within their larger goal of expanding the Materials Project. This is a large open-source database of materials simulations that anybody can use to aid the materials discovery process. It’s hosted by Lawrence Berkeley National Laboratory.

BOWEN DENG: This project is essentially to expand our database to not only containing these stable materials, but also contains this materials calculations that are out of equilibrium, and therefore providing an improved training data set for this AI models.

SOPHIA CHEN: This work was published in a recent issue of NPJ Computational Materials. My name is Sophia Chen from the Materials Research Society. For more news, log onto the MRS Bulletin website at mrsbulletin.org and follow us on X, @MRSBulletin. Don’t miss the next episode of MRS Bulletin Materials News – subscribe now. Thank you for listening.