×
Material Visions: Advancing Crystal Structure Prediction with Powerful Text Generation Models

Authors

Ndidi Anyakora and Cajetan M. Akujuobi, Prairie View A &M University, USA

Abstract

The discovery of new materials has been a protracted and labor-intensive endeavor, relying on iterative trial-and-error methodologies. Recently, materials informatics has been transforming this process by employing advanced data science and computational tools to expedite the discovery of novel materials, such as generative design material formulas, and predict material properties. However, predicting crystal three-dimensional structures remains a challenging task rooted in both the fundamental nature of materials and the limitations of current computational methods. Inspired by the power and success of artificial intelligence (AI) models, especially the deep learning techniques and natural language processing (NLP) algorithms, we consider capturing complex atom descriptions and relationships as text information and explore whether we can use the ability of language models to predict atomic coordinates. In this work, we explore multiple text generation models and employ the Longformer-Encoder-Decoder (LED) model to construct preliminary crystal structures based on detailed atom descriptions. Subsequently, these structures are further refined by a random forest regressor, which generates the final crystal configurations. Our experiments show this method excels in capturing the intricate atom relationships and effectively translating these associations into the specified crystal formats. We also focus on optimizing data representation for both atom descriptions and crystal structures and use clear metrics to evaluate accuracy and stability. Our results indicate that this method has promising potential and could improve the prediction of material crystal structures. Our source code can be accessed freely at https://github.com/RMaarefdoust/Crystal-Structure-Prediction.

Keywords

Material informatics, Crystal structure prediction, Text generation, Longformer-Encoder-Decoder