LASP2 | Carlos R. Salazar

Python code developed during my master’s internship in Toulouse in 2022, under the supervision of Dr. Julien Lam and Dr. Akshay Krishna. The code makes use of the Python interface library for LAMMPS, and uses mpi4py to perform computations in parallel. The n2p2 and VASP binaries are executed using the os.system() function in Python. The code can be viewed here.

For my Science Watch project I kept working on the code and performing tests on gold systems. The report can be seen here.

Abstract:

Machine learning (ML) methods have become more relevant in materials science over the last decade. They are now well recognized as effective methods for approximating extremely complex functions. Machine learning interaction potentials (MLIP) offer computation speeds close to empirical potentials, while having an accuracy close to the method used in training. In this project, an active learning approach was tested for the extension of gold databases with BCC and surface structures. The active learning approach used was Query by Committee (QbC), implemented in the lasp2 interface. The machine learning framework n2p2 was used for the creation of potentials and Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) was used for exploring the phase space with NVT simulations. For selected structures, the vasp software was used to compute forces and energy using Density-Functional Theory (DFT), which where then added to the database. Potentials were then trained on-the-fly using lasp2 to measure the disagreement between the committee and then deciding when a certain structure should be added to the database. LAMMPS was then used to calculate surface energies for (100) and (111) structures, as well as BCC bulk modulus. These physical properties served as control parameters to test whether the potential was improving with on-the-fly training. It was found that the lasp2 method can improve a potential by doing on-the-fly training, however it is not always the case, as seen with BCC structures. For the (100) and (111) surfaces we were able to quickly recover the values predicted with a manually created database. In the case of BCC, no clear improvement could be seen, and the error in bulk modulus seemed to behave arbitrarily, even when using the complete manually created database as a starting point.