GapTrain: a faster and automated way to generate GA potentials

07/27/2022, 7:50 PM8:00 PM UTC


Chemical predictions have gained ground in the last decade as a way to automate the streamlining of chemical reactivity of multiple substrates. This procedure requires the modeling of interatomic potentials, which can be done by fitting these potentials to data obtained at the quantum-mechanical level. Therefore, the aim of this work is to propose GapTrain.jl, a fast, automatic and broad model to develop the Gaussian approximation potential based on a hundred or thousand data.



<div align=justify> Molecular simulations are a key point in computational chemistry for reproducing experimental reality. The accuracy of these models involves a number of elements, such as the inclusion of the solvation medium. Thus, interatomic potentials combined with molecular dynamics and Monte Carlo (MC) has been widely applied to surface potential energy. Moreover, most of these potentials are parameterised for isolated entities with fixed connectivity and thus unable to describe bond breaking/forming processes.

Machine learning approaches have revolutionized force field-based simulations and can be implemented for the entire periodic table. Within small chemical subspaces, models can be achieved using neural networks (NNs), kernel-based methods such as the Gaussian Approximation Potential (GAP) framework or gradient-domain machine learning (GDML), and linear fitting with properly chosen basis functions, each with different data requirements and transferability. GAPs have been used to study a range of elemental, multicomponent inorganic, gas-phase organic molecular, and more recently condensed-phase systems, such as methane and phosphorus. These potentials, while accurate, have required considerable computational effort and human oversight. Indeed, condensed-phase NN and GAP fitting approaches typically require several thousand reference (“ground truth”) evaluations.

In the present work – with a view to developing potentials to simulate solution phase reactions – we consider bulk water as a test case and develop a strategy which requires just hundreds of total ground truth evaluations and no a priori knowledge of the system, apart from the molecular composition. We show how this methodology is directly transferable to different chemical systems in the gas phase as well as in implicit and explicit solvent, focusing on the applicability to a range of scenarios that are relevant in computational chemistry.



1 D. Frenkel and B. Smit, Understanding Molecular Simulation: From Algorithms to Applications, Academic Press, Cambridge, Massachusetts, 2nd edn, 2002.

2 K. Lindorff-Larsen, P. Maragakis, S. Piana, M. P. Eastwood, R. O. Dror and D. E. Shaw, PLoS One, 2012, 7, e32131.

3 R. Iimie, P. Minary and M. E. Tuckerman, Proc. Natl. Acad. Sci. U. S. A., 2005, 102, 6654–6659.

4 F. No ́e, A. Tkatchenko, K.-R. M ̈uller and C. Clementi, Annu. Rev. Phys. Chem., 2020, 71, 361–390.

5 T. Mueller, A. Hernandez and C. Wang, J. Chem. Phys., 2020, 152, 050902.

6 O. T. Unke, D. Koner, S. Patra, S. K ̈aser and M. Meuwly, Mach. Learn. Sci. Technol., 2020, 1, 013001.

7 R. Z. Khaliullin, H. Eshet, T. D. K ̈uhne, J. Behler and M. Parrinello, Nat. Mater., 2011, 10, 693–697.

8 G. C. Sosso, G. Miceli, S. Caravati, F. Giberti, J. Behler and M. Bernasconi, J. Phys. Chem. Lett., 2013, 4, 4241–4246.

9 H. Niu, L. Bonati, P. M. Piaggi and M. Parrinello, Nat. Commun., 2020, 11, 2654.

Platinum sponsors

Julia ComputingRelational AIJulius Technology

Gold sponsors


Silver sponsors

Invenia LabsBeacon BiosignalsMetalenzASMLG-ResearchConningPumas AIQuEra Computing Inc.Jeffrey Sarnoff

Media partners

Packt PublicationGather TownVercel

Community partners

Data UmbrellaWiMLDS

Fiscal Sponsor