Background and Theory¶

Overall Framework¶

PyXtalFF involves two important components: descriptors and force field training. Four types of descriptors are supported in the code, including,

(Weighted) Behler-Parrinello Symmetry Functions,
Embedded Atom Descriptors,
SO(4) Bispectrum Components,
Smooth SO(3) power spectrum.

For the force field training, the code consists of

Artificial neural network,
Generalized linear regressions.

Atomic Descriptors¶

In an atomic structure, Cartesian coordinates poorly describe the structural environment. While the energy of a crystal structure remains unchanged, the Cartesian coordinates change as translational or rotational operation is applied to the structure [1]. Thus, physically meaningful descriptor must withhold the energy change as the alterations are performed to the structural environment. In another words, the descriptor needs to be invariant with respect to translation and rotational operations, and the exchanges of any equivalent atom. To ensure the descriptor mapping from the atomic positions smoothly approaching zero beyond the \(R_c\), a cutoff function (\(f_c\)) is included to most decriptor mapping schemes, here the exception is the Smooth SO(3) Power Spectrum:

\[\begin{split}f_c(r) = \begin{cases} \frac{1}{2}\cos\left(\pi \frac{r}{R_c}\right) + \frac{1}{2} & r \leq R_c\\ 0 & r > R_c \end{cases}\end{split}\]

In addition to the cosine function, we also support other types of functions (see cutoff functions)

In the following, the types of descriptors will be explained in details.

Atom Centered Symmetry Function (ACSF)¶

Behler-Parrinello method—atom-centered descriptors—utilizes a set of symmetry functions [2]. The symmetry functions map two atomic Cartesian coordinates to a distribution of distances between atom (radial functions) or three atomic Cartesian coordinates to a distribution of bond angles (angular functions). These mappings are invariant with respect to translation, rotation, and permutation of atoms of the system. Therefore, the energy of the system will remain unchanged under these mapping of symmetry functions.

PyXtal_FF supports three types of symmetry functions:

\[G^{(2)}_i = \sum_{j\neq i} e^{-\eta (R_{ij}-\mu)^2} \cdot f_c(R_{ij})\]

\[G^{(4)}_i = 2^{1-\zeta}\sum_{j\neq i} \sum_{k \neq i, j} [(1+\lambda \cos \theta_{ijk})^{\zeta} \cdot e^{-\eta (R_{ij}^2 + R_{ik}^2 + R_{jk}^2)} \cdot f_c(R_{ij}) \cdot f_c(R_{ik}) \cdot f_c(R_{jk})]\]

\[G^{(5)}_i = 2^{1-\zeta}\sum_{j\neq i} \sum_{k \neq i, j} [(1+\lambda \cos \theta_{ijk})^{\zeta} \cdot e^{-\eta (R_{ij}^2 + R_{ik}^2)} \cdot f_c(R_{ij}) \cdot f_c(R_{ik})]\]

where \(\eta\) and \(R_s\) are defined as the width and the shift of the symmetry function. As for \(G^{(4)}\) and \(G^{(5)}\), they are a few of many ways to capture the angular information via three-body interactions (\(\theta_{ijk}\)). \(\zeta\) determines the strength of angular information. Finally, \(\lambda\) values are set to +1 and -1, for inverting the shape of the cosine function.

By default, ACSF splits each different atomic pair and triplets into different descriptors. For instance, a G2 function of SiO2 system for each Si has Si-Si, Si-O descriptors, while G4 has Si-Si-Si, Si-Si-O, O-Si-O. This is not convenient for its extension to multiple component systems. Therefore, an alterative solution is to assign the weight function to each G2 and G4 distances by the atomic number.

\[G^{(2)}_i = \sum_{j\neq i} Z_j e^{-\eta (R_{ij}-\mu)^2} \cdot f_c(R_{ij})\]

\[G^{(4)}_i = 2^{1-\zeta}\sum_{j\neq i} \sum_{k \neq i, j} Z_j Z_k [(1+\lambda \cos \theta_{ijk})^{\zeta} \cdot e^{-\eta (R_{ij}^2 + R_{ik}^2 + R_{jk}^2)} \cdot f_c(R_{ij}) \cdot f_c(R_{ik}) \cdot f_c(R_{jk})]\]

\[G^{(5)}_i = 2^{1-\zeta}\sum_{j\neq i} \sum_{k \neq i, j} Z_j Z_k [(1+\lambda \cos \theta_{ijk})^{\zeta} \cdot e^{-\eta (R_{ij}^2 + R_{ik}^2)} \cdot f_c(R_{ij}) \cdot f_c(R_{ik})]\]

The above formula is called wACSF [3].

Embedded Atom Density¶

Embedded atom density (EAD) descriptor [4] is inspired by embedded atom method (EAM)—description of atomic bonding by assuming each atom is embedded in the uniform electron cloud of the neighboring atoms. The EAM generally consists of a functional form in a scalar uniform electron density for each of the “embedded” atom plus the short-range nuclear repulsion potential. Given the uniform electron gas model, the EAM only works for metallic systems, even so the EAM can severely underperform in predicting the metallic systems. Therefore, the density can be modified by including the square of the linear combination the atomic orbital components:

\[\rho_i(R_{ij}) = \sum_{l_x, l_y, l_z}^{l_x+l_y+l_z=L} \frac{L!}{l_x!l_y!l_z!} \bigg(\sum_{j\neq i}^{N} Z_j \Phi(R_{ij})\bigg)^2\]

where \(Z_j\) represents the atomic number of neighbor atom \(j\). \(L\) is the quantized angular momentum, and \(l_{x,y,z}\) are the quantized directional-dependent angular momentum. For example, \(L=2\) corresponds to the \(d\) orbital. Lastly, the explicit form of \(\Phi\) is:

\[\Phi(R_{ij}) = x^{l_x}_{ij} y^{l_y}_{ij} z^{l_z}_{ij} \cdot e^{-\eta (R_{ij}-\mu)^2} \cdot f_c(R_{ij})\]

According to quantum mechanics, \(\rho\) follows the similar procedure in determining the probability density of the states, i.e. the Born rule.

Furthermore, EAD can be regarded as the improved Gaussian symmetry functions. EAD has no classification between the radial and angular term. The angular or three-body term is implicitly incorporated in when \(L>0\). By definition, the computation cost for calculating EAD is cheaper than angular symmetry functions by avoiding the extra sum of the \(k\) neighbors. In term of usage, the parameters \(\eta\) and \(\mu\) are similar to the strategy used in the Gaussian symmetry functions, and the maximum value for \(L\) is 3, i.e. up to \(f\) orbital.

SO(4) Bispectrum Components¶

The SO(4) bispectrum components [5], [6] are another type of atom-centered descriptor based on triple correlation of the atomic neighbor density function on the 3-sphere. The distribution of atoms in an atomic environment can be represented as a sum of delta functions, this is known as the atomic neighbor density function.

\[\rho(\boldsymbol{r}) = \delta(\boldsymbol{r}) + \sum_i \delta(\boldsymbol{r}-\boldsymbol{r_i})\]

Then this function can mapped to the 3 sphere by mapping the atomic coordinates \((x,y,z)\) to the 3-sphere by the following relations:

\[\theta = \arccos\left(\frac{z}{r}\right)\]

\[\phi = \arctan\left(\frac{y}{x}\right)\]

\[\omega = \pi \frac{r}{r_{cut}}\]

Using this mapping, the Atomic Neighbor Density Function is then expanded on the 3-sphere using the Wigner-D matrix elements, the harmonic functions on the 3-sphere. The resulting expansion coefficients are given by:

\[c^j_{m',m} = D^{j}_{m',m}(\boldsymbol{0}) + \sum_i D^{j}_{m',m}(\boldsymbol{r}_i)\]

The triple correlation of the Atomic Neighbor Density Function on the 3-sphere is then given by a third order product of the expansion coefficients by the Fourier theorem.

\[B_{j_1,j_2,j} = \sum_{m',m = -j}^{j}c^{j}_{m',m}\sum_{m_1',m_1 = -j_1}^{j_1}c^{j_1}_{m_1',m_1}\times \sum_{m_2',m_2 = -j_2}^{j_2}c^{j_2}_{m_2',m_2}C^{jj_1j_2}_{mm_1m_2}C^{jj_1j_2}_{m'm_1'm_2'},\]

Where C is a Clebsch-Gordan coefficient.

Smooth SO(3) Power Spectrum¶

Now instead of considering a hyperdimensional space, we can derive a similar descriptor by taking the auto correlation of the atomic neighbor density function through expansions on the 2-sphere and a radial basis on a smoothened atomic neighbor density function [6].

\[\rho ' = \sum_i e^{- \alpha |\boldsymbol{r}-\boldsymbol{r}_i|^2}\]

This function is then expanded on the 2-sphere using Spherical Harmonics and a radial basis \(g_n(r)\) orthonormalized on the interval \((0, r_\textrm{cut})\).

\[c_{nlm} = \left<g_n Y_{lm}|\rho '\right> = 4\pi e^{- \alpha r_i^2} Y^*_{lm}(\boldsymbol{r}_i)\int_0^{r_{\textrm{cut}}}r^2 g_n(r) I_l(2\alpha r r_i) e^{-\alpha r^2}dr\]

Where \(I_l\) is a modified spherical bessel function of the first kind. The autocorrelation or power spectrum is obtained through the following sum.

\[p_{n_1 n_2 l} = \sum_{m=-l}^{+l}c_{n_1lm} c^*_{n_2 l m}\]

Expression of Target Properties¶

For all of the regression techniques, the force field training involves fitting of energy, force, and stress simultaneously, although PyXtal_FF allows the fitting of force or stress to be optional. The energy can be written in the sum of atomic energies, in which is a functional (\(\mathscr{F}\)) of the descriptor (\(\boldsymbol{X}_i\)):

\[E_\textrm{total} = \sum_{i=1}^{N} E_i = \sum_{i=1}^{N} \mathscr{F}_i(\boldsymbol{X}_i)\]

Since neural network and generalized linear regressions have well-defined functional forms, analytic derivatives can be derived by applying the chain rule to obtain the force at each atomic coordinate, \(\boldsymbol{r}_m\):

\[\boldsymbol{F}_m=-\sum_{i=1}^{N}\frac{\partial \mathscr{F}_i(\boldsymbol{X}_{i})}{\partial \boldsymbol{X}_{i}} \cdot \frac{\partial\boldsymbol{X}_{i}}{\partial \boldsymbol{r}_m}\]

Finally, the stress tensor is acquired through the virial stress relation:

\[\boldsymbol{S}=-\sum_{m=1}^N \boldsymbol{r}_m \otimes \sum_{i=1}^{N} \frac{\partial \mathscr{F}_i(\boldsymbol{X}_{i})}{\partial \boldsymbol{X}_{i}} \cdot \frac{\partial \boldsymbol{X}_{i}}{\partial \boldsymbol{r}_m}\]

Force Field Training¶

Here, we reveal the functional form (\(\mathscr{F}\)) presented in equation above. The functional form is essentially regarded as the regression model. Each regression model is species-dependent, i.e. as the the number of species increases, the regression parameters will increase. This is effectively needed to describe the presence of other chemical types in complex system. Hence, explanation for the regression models will only consider single-species for the sake of simplicity.

Furthermore, it is important to choose differentiable functional as well as its derivative due to the existence of force (\(F\)) and stress (\(S\)) contribution along with the energy (\(E\)) in the loss function:

\[\Delta = \frac{1}{2M}\sum_{i=1}^M\Bigg[\bigg(\frac{E_i - E^{\textrm{Ref}}_i}{N_{\textrm{atom}}^i}\bigg)^2 + \frac{\beta_f} {3N_{\textrm{atom}}^i}\sum_{j=1}^{3N_{\textrm{atom}}^i} (F_{i, j} - F_{i, j}^{\textrm{Ref}})^2 + \frac{\beta_s} {6} \sum_{p=0}^{2} \sum_{q=0}^{p} (S_{pq} - S_{pq}^{\textrm{Ref}})^2 \Bigg]\]

where M is the total number of structures in the training pool, and \(N^{\textrm{atom}}_i\) is the total number of atoms in the \(i\)-th structure. The superscript \(\textrm{Ref}\) corresponds to the target property. \(\beta_f\) and \(\beta_s\) are the force and stress coefficients respectively. They scale the importance between energy, force, and stress contribution as the force and stress information can overwhelm the energy information due to their sizes. Additionally, a regularization term can be added to induce penalty on the entire parameters preventing overfitting:

\[\Delta_\textrm{p} = \frac{\alpha}{2M} \sum_{i=1}^{m} (\boldsymbol{w}^i)^2\]

where \(\alpha\) is a dimensionless number that controls the degree of regularization.

Generalized Linear Regression¶

This regression methodology is a type of polynomial regression. Essentially, the quantum-mechanical energy, forces, and stress can be expanded via Taylor series with atom-centered descriptors as the independent variables:

\[E_{\textrm{total}} = \gamma_0 + \boldsymbol{\gamma} \cdot \sum^{N}_{i=1}\boldsymbol{X}_i + \frac{1}{2}\sum^{N}_{i=1}\boldsymbol{X}_i^T\cdot \boldsymbol{\Gamma} \cdot \boldsymbol{X}_i\]

where \(N\) is the total atoms in a structure. \(\gamma_0\) and \(\boldsymbol{\gamma}\) are the weights presented in scalar and vector forms. \(\boldsymbol{\Gamma}\) is the symmetric weight matrix (i.e. \(\boldsymbol{\Gamma}_{12} = \boldsymbol{\Gamma}_{21}\)) describing the quadratic terms. In this equation, we only restricted the expansion up to polynomial 2 due to to enormous increase in the weight parameters.

In consequence, the force on atom \(j\) and the stress matrix can be derived, respectively:

\[\boldsymbol{F}_m = -\sum^{N}_{i=1} \bigg(\boldsymbol{\gamma} \cdot \frac{\partial \boldsymbol{X}_i}{\partial \boldsymbol{r}_m} + \frac{1}{2} \bigg[\frac{\partial \boldsymbol{X}_i^T}{\partial \boldsymbol{r}_m} \cdot \boldsymbol{\Gamma} \cdot \boldsymbol{X}_i + \boldsymbol{X}_i^T \cdot \boldsymbol{\Gamma} \cdot \frac{\partial \boldsymbol{X}_i}{\partial \boldsymbol{r}_m} \bigg]\bigg)\]

\[\boldsymbol{S} = -\sum_{m=1}^N \boldsymbol{r}_m \otimes \sum^{N}_{i=1} \bigg(\boldsymbol{\gamma} \cdot \frac{\partial \boldsymbol{X}_i}{\partial \boldsymbol{r}_m} + \frac{1}{2} \bigg[\frac{\partial \boldsymbol{X}_i^T}{\partial \boldsymbol{r}_m} \cdot \boldsymbol{\Gamma} \cdot \boldsymbol{X}_i + \boldsymbol{X}_i^T \cdot \boldsymbol{\Gamma} \cdot \frac{\partial \boldsymbol{X}_i}{\partial \boldsymbol{r}_m} \bigg]\bigg)\]

Notice that the energy, force, and stress share the weights parameters \(\{\gamma_0, \boldsymbol{\gamma}_1, ..., \boldsymbol{\gamma}_N, \boldsymbol{\Gamma}_{11}, \boldsymbol{\Gamma}_{12}, ..., \boldsymbol{\Gamma}_{NN}\}\). Therefore, a reliable MLP must satisfy the three conditions in term of energy, force, and stress.

Neural Network Regression¶

Another type of regression model is neural network regression. Due to the set-up of the algorithm, neural network is suitable for training large data sets. Neural network gains an upper hand from generalized linear regression in term of the flexibility of the parameters.

A mathematical form to determine any node value can be written as:

\[X^{l}_{n_i} = a^{l}_{n_i}\bigg( b^{l-1}_{n_i} + \sum^{N}_{n_j=1} W^{l-1, l}_{n_j, n_i} \cdot X^{l-1}_{n_j} \bigg)\]

The value of a neuron (\(X_{n_i}^l\)) at layer \(l\) can determined by the relationships between the weights (\(W^{l-1, l}_{n_j, n_i}\)), the bias (\(b^{l-1}_{n_i}\)), and all neurons from the previous layer (\(X^{l-1}_{n_j}\)). \(W^{l-1, l}_{n_j, n_i}\) specifies the connectivity of neuron \(n_j\) at layer \(l-1\) to the neuron \(n_i\) at layer \(l\). \(b^{l-1}_{n_i}\) represents the bias of the previous layer that belongs to the neuron \(n_i\). These connectivity are summed based on the total number of neurons (\(N\)) at layer \(l-1\). Finally, an activation function (\(a_{n_i}^l\)) is applied to the summation to induce non-linearity to the neuron (\(X_{n_i}^l\)). \(X_{n_i}\) at the output layer is equivalent to an atomic energy, and it represents an atom-centered descriptor at the input layer. The collection of atomic energy contributions are summed to obtain the total energy of the structure.

[1]	Albert P Bartok, Risi Kondor and Gabor Csanyi, “On representing chemical environments,” Phys. Rev. B 87, 184115 (2013)

[2]	Jorg Behler and Michele Parrinello, “Generalized neural-network representation of high-dimensional potential-energy surfaces,” Phys. Rev. Lett. 98, 146401 (2007)

[3]	Gastegger, L. Schwiedrzik, M. Bittermann, F. Berzsenyi and P. Marquetand, J. Chem. Phys. 148, 241709 (2018)

[4]	Zhang, C. Hu, B. Jiang, “Embedded atom neural network potentials: Efficient and accurate machine learning with a physically inspired representation,” The Journal of Physical Chemistry Letters 10 (17) (2019) 4962–4967 (2019).

[5]	Albert P Bartok, Mike C Payne, Risi Kondor and Gabor Csanyi, “Gaussian approximation potentials: The accuracy of quantum mechan-ics, without the electrons,” Phys. Rev. Lett. 104, 136403 (2010)

[6]	(1, 2) A.P. Thompson, L.P. Swiler, C.R. Trott, S.M. Foiles and G.J. Tucker, “Spectral neighbor analysis method for automated generation ofquantum-accurate interatomic potentials,” J. Comput. Phys. 285, 316–330 (2015)