Abstract:
Molecular-dynamics (MD) simulations convert Newton’s laws into atom-by-atom trajectories,
from which pressure, temperature, and free energy are extracted for various experiments. While
simulating any atomic environment, the most crucial feature of the atomic system is the energy. Also, it is the most challenging feature to estimate. The Machine Learning Interatomic
Potentials (MLIP) solve this by fitting models to the system’s definitive functions, which are
utilized to make fast inferences regarding the energy of a system. The accuracy of these models depends on the domain-specific hyperparameter optimization, which is quite slow due to
the use of complex deep neural networks. With this work, a new interatomic descriptor called
quadratic ACE (qACE) is proposed, which surpasses the neural network’s accuracy. Then, we
explore and benchmark possible ways to fit a linear regression model to this computationally demanding solution in CPU and memory-bound environments. To solve the regression
problem, several strategies are explored, including data reduction techniques and parallel processing. By leveraging Dask’s comprehensive task scheduling infrastructure, we compute the
direct least-squares regression on a 27 GB dataset in under five minutes, demonstrating both
scalability and computational efficiency.
Overall, this work demonstrates that efficient feature engineering, combined with lightweight
parallel regression strategies, can substitute for deep models without sacrificing accuracy or
scalabilit