dc.contributor.author | Dashdamirov, Dursun | |
dc.date.accessioned | 2025-08-27T08:32:52Z | |
dc.date.available | 2025-08-27T08:32:52Z | |
dc.date.issued | 2024 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12181/1439 | |
dc.description.abstract | Molecular-dynamics (MD) simulations convert Newton’s laws into atom-by-atom trajectories, from which pressure, temperature, and free energy are extracted for various experiments. While simulating any atomic environment, the most crucial feature of the atomic system is the energy. Also, it is the most challenging feature to estimate. The Machine Learning Interatomic Potentials (MLIP) solve this by fitting models to the system’s definitive functions, which are utilized to make fast inferences regarding the energy of a system. The accuracy of these models depends on the domain-specific hyperparameter optimization, which is quite slow due to the use of complex deep neural networks. With this work, a new interatomic descriptor called quadratic ACE (qACE) is proposed, which surpasses the neural network’s accuracy. Then, we explore and benchmark possible ways to fit a linear regression model to this computationally demanding solution in CPU and memory-bound environments. To solve the regression problem, several strategies are explored, including data reduction techniques and parallel processing. By leveraging Dask’s comprehensive task scheduling infrastructure, we compute the direct least-squares regression on a 27 GB dataset in under five minutes, demonstrating both scalability and computational efficiency. Overall, this work demonstrates that efficient feature engineering, combined with lightweight parallel regression strategies, can substitute for deep models without sacrificing accuracy or scalabilit | en_US |
dc.language.iso | en | en_US |
dc.publisher | ADA University | en_US |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Molecular dynamics | en_US |
dc.subject | Machine learning -- Applications in science | en_US |
dc.subject | Regression analysis -- Data processing | en_US |
dc.subject | Parallel processing (Electronic computers) | en_US |
dc.subject | Feature engineering (Machine learning) | en_US |
dc.title | Regression on Interatomic Descriptor Data: Direct Solution Strategies for Linear Regression in CPU and Memory-Constrained Environments | en_US |
dc.type | Thesis | en_US |
The following license files are associated with this item: