ADA Library Digital Repository

Regression on Interatomic Descriptor Data: Direct Solution Strategies for Linear Regression in CPU and Memory-Constrained Environments

Show simple item record

dc.contributor.author Dashdamirov, Dursun
dc.date.accessioned 2025-08-27T08:32:52Z
dc.date.available 2025-08-27T08:32:52Z
dc.date.issued 2024
dc.identifier.uri http://hdl.handle.net/20.500.12181/1439
dc.description.abstract Molecular-dynamics (MD) simulations convert Newton’s laws into atom-by-atom trajectories, from which pressure, temperature, and free energy are extracted for various experiments. While simulating any atomic environment, the most crucial feature of the atomic system is the energy. Also, it is the most challenging feature to estimate. The Machine Learning Interatomic Potentials (MLIP) solve this by fitting models to the system’s definitive functions, which are utilized to make fast inferences regarding the energy of a system. The accuracy of these models depends on the domain-specific hyperparameter optimization, which is quite slow due to the use of complex deep neural networks. With this work, a new interatomic descriptor called quadratic ACE (qACE) is proposed, which surpasses the neural network’s accuracy. Then, we explore and benchmark possible ways to fit a linear regression model to this computationally demanding solution in CPU and memory-bound environments. To solve the regression problem, several strategies are explored, including data reduction techniques and parallel processing. By leveraging Dask’s comprehensive task scheduling infrastructure, we compute the direct least-squares regression on a 27 GB dataset in under five minutes, demonstrating both scalability and computational efficiency. Overall, this work demonstrates that efficient feature engineering, combined with lightweight parallel regression strategies, can substitute for deep models without sacrificing accuracy or scalabilit en_US
dc.language.iso en en_US
dc.publisher ADA University en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject Molecular dynamics en_US
dc.subject Machine learning -- Applications in science en_US
dc.subject Regression analysis -- Data processing en_US
dc.subject Parallel processing (Electronic computers) en_US
dc.subject Feature engineering (Machine learning) en_US
dc.title Regression on Interatomic Descriptor Data: Direct Solution Strategies for Linear Regression in CPU and Memory-Constrained Environments en_US
dc.type Thesis en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search ADA LDR


Advanced Search

Browse

My Account