Efficient implementation of atom-density representations

Page created by Virgil Mccormick
 
CONTINUE READING
Efficient implementation of atom-density representations
                                                              Félix Musil,1, 2, a) Max Veit,1, 2, b) Alexander Goscinski,1 Guillaume Fraux,1 Michael J. Willatt,1 Markus
                                                              Stricker,3, 4 Till Junge,3 and Michele Ceriotti1
                                                              1)
                                                                 Laboratory of Computational Science and Modeling, Institute of Materials,
                                                              École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
                                                              2)
                                                                 National Center for Computational Design and Discovery of Novel Materials (MARVEL)
                                                              3)
                                                                 Laboratory for Multiscale Mechanics Modeling, Institute of Mechanical Engineering,
                                                              École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
                                                              4)
                                                                 Interdisciplinary Centre for Advanced Materials Simulation, Ruhr-University Bochum,
                                                              Universitätsstraße 150, 44801 Bochum, Germany
                                                              Physically-motivated and mathematically robust atom-centred representations of molecular structures
                                                              are key to the success of modern atomistic machine learning (ML) methods. They lie at the foundation
                                                              of a wide range of methods to predict the properties of both materials and molecules as well as to
                                                              explore and visualize the chemical compound and configuration space. Recently, it has become clear
arXiv:2101.08814v1 [physics.chem-ph] 21 Jan 2021

                                                              that many of the most effective representations share a fundamental formal connection: that they can
                                                              all be expressed as a discretization of n-body correlation functions of the local atom density, suggesting
                                                              the opportunity of standardizing and, more importantly, optimizing the calculation of such represen-
                                                              tations. We present an implementation, named librascal, whose modular design lends itself both to
                                                              developing refinements to the density-based formalism and to rapid prototyping for new developments
                                                              of rotationally equivariant atomistic representations. As an example, we discuss SOAP features, per-
                                                              haps the most widely used member of this family of representations, to show how the expansion of
                                                              the local density can be optimized for any choice of radial basis set. We discuss the representation in
                                                              the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how
                                                              the computational effort scales for each of the individual steps of the calculation. By applying data
                                                              reduction techniques in feature space, we show how to further reduce the total computational cost by
                                                              at up to a factor of 4 or 5 without affecting the model’s symmetry properties and without significantly
                                                              impacting its accuracy.

                                                   I.   INTRODUCTION                                               covariance) of the target properties with respect to
                                                                                                                   fundamental symmetry operations, and that the pre-
                                                      Supervised and unsupervised machine learning                 diction of the extensive properties of a structure is
                                                   (ML) methods are gaining increasing importance                  decomposed into that of local contributions, written
                                                   in the field of atomistic materials modeling. Su-               as a function of a description of the neighborhood of
                                                   pervised ML methods – in particular, those used                 individual atoms1,2 .
                                                   to construct interatomic potentials (MLIPs)1–8 , find              We will focus, for the majority of this paper, on
                                                   structure-property mappings across chemical com-                the problem of regression of a property expressed as
                                                   pound space9–12 , and model the dependence on con-              a function of these transformed coordinates (hereafter
                                                   figuration and composition of experimentally rele-              called just “representation”). By far the most com-
                                                   vant quantities like the dipole moment13–16 , polar-            mon application of structure-property regression in
                                                   izability17 , band structure18,19 , and charge distribu-        the context of atomistic simulations is in the fitting
                                                   tion20–22 – are useful tools in the quest for predic-           of potential energy surfaces, which are used in molec-
                                                   tive materials modelling, specifically the use of large,        ular simulations or to compute thermodynamic av-
                                                   complex, quantum-accurate simulations to access ex-             erages. The majority of the considerations we make
                                                   perimental length and time scales23–31 . Furthermore,           here applies to the prediction of any scalar property
                                                   unsupervised ML methods are gaining prominence as               of the system, although the calculation of gradients
                                                   a way to interpret simulations of ever-increasing com-          might be less important than in the case of poten-
                                                   plexity32–38 .                                                  tials, and we use MLIP and “model” interchangeably
                                                      All of these methods fundamentally rely on a trans-          in what follows. Figure 1 schematically illustrates the
                                                   formation of the system’s atomic coordinates into a             typical procedure of a single timestep in an atomistic
                                                   form amenable to the construction of efficient and              machine learning molecular dynamics (ML-MD) sim-
                                                   transferable machine-learning models. Usually, this             ulation. After the atomic coordinates are read in, and
                                                   implies that the features that represent an atomic con-         the neighbor list is computed to determine the local
                                                   figuration reflect the transformations (invariance or           environments around each atom, the coordinates are
                                                                                                                   transformed into an intermediate representation. It is
                                                                                                                   this representation that is then passed to the machine
                                                                                                                   learning model – be it one based on neural networks
                                                   a) These authors contributed equally to this work.              (NN)39 , Gaussian process regression (GPR)40 , or one
                                                   Corresponding author: felix.musil@epfl.ch                       of several other closely-related methods. The accu-
                                                   b) These authors contributed equally to this work.
                                                                                                                   racy and the transferability of the regression model
                                                   Corresponding author: max.veit@epfl.ch                          are usually greatly improved by the use of represen-
2

                           NL               Representation                          Model

        LAMMPS, …
                                          SOAP, ACSF, …

Figure 1: A scheme showing the different components involved in the evaluation of energies and forces for an
atomic-scale machine-learning model. These steps need to be performed for each new structure in a screening
                        procedure, or each step in a molecular-dynamics simulation.

tations that fulfill the requirements of symmetry and       evaluation in a typical molecular dynamics (MD) sim-
locality,39,41–43 while at the same time being sensitive    ulation with the SOAP-GAP method; almost all of the
to all relevant structural changes41,44,45 , being smooth   remaining cost is taken up by the evaluation of ker-
functions of the atomic coordinates, and – ideally –        nel (and its gradients) required to compute the GAP
being free of degeneracies which map completely dif-        energy and forces. We therefore discuss optimization
ferent structures to the same descriptor46 .                strategies aimed at reducing the computational cost
    Here we focus on a class of representations that ful-   of these two critical components. While we focus,
fill these requirements, and that can be constructed        at present, on a serial implementation, the modular
starting from a description of a structure in terms         structure that we introduce to optimize single-core
of an atom density - which is naturally invariant           performance is also very well-suited to parallelization,
to permutations of the atom labels - which is made          which becomes indispensable when aiming at extend-
translationally and rotationally invariant by first sum-    ing simulation size and timescale.
ming over R3 , and then averaging ν-points correla-            We begin in Section II with an overview of density-
tions of the resultant atom-centered density over the       based representations, and present our benchmarking
O(3) improper rotation group43 . The smooth overlap         methodology in Section III. We continue in Section IV,
of atomic position (SOAP) power spectrum is per-            showing how the mathematical formulation of density-
haps the best-known member of this class of repre-          based representations reveals several opportunities for
sentations41 , but a wealth of other descriptors such       optimization, which we implement and systematically
as those underlying the spectral neighbor analysis          benchmark. We then expand these benchmarks to a
potentials (SNAP)47 , the atomic cluster expansion          variety of realistic simulation scenarios, shown in Sec-
(ACE)8,48 , moment tensor potentials (MTP)3 , as well       tion V, comparing against an existing simulation code
as the equivariant extensions λ-SOAP13 and the N-           and investigating the effect of convergence parame-
point contractions of equivariants (NICE)49 represen-       ters. We present further experimental extensions of
tation can be recovered as appropriate limits or exten-     the code’s capabilities in Section VI. Finally, in Sec-
sions. Atom-centred symmetry functions1,39 can also         tion VII, we summarize the improvements and de-
be seen as a projection of these atom-density repre-        scribing the role of our new, modular, efficient code
sentations on a bespoke set of basis functions.             librascal in the modern atomistic ML ecosystem.
    However, even though these representations are re-
lated through a common mathematical formalism8,43 ,
the cost of evaluating them, and the accuracy of the        II.   THEORY
resulting models, can vary greatly. In some cases, dif-
ferent frameworks have been shown to yield compa-              We begin by giving a brief overview of the con-
rable errors50 , while other studies have suggested a       struction of a symmetrized atom-density representa-
trade-off between accuracy and computational cost,          tion43 , introducing the notation we use in the rest of
with the combination of SOAP features and Gaussian          this paper to indicate the various components that
process regression (hereafter termed just SOAP-GAP)         are needed to evaluate the features associated with
emerging as the most accurate, but also the most com-       a given structure. The construction operates by a
putationally demanding method.29,51                         sequence of integrals over symmetry operations, ap-
    In fact, the evaluation of SOAP features and their      plied to a smooth (or Dirac-δ-like) atom density that
gradients can take anywhere from 10 % to 90 % (de-          is taken to describe the structure. After summing over
pending on the system and the parameters chosen) of         translations, one obtains a description of the atomic
the total computational cost of the energy and force        environment Ai around the central atom i, that de-
3

pends on the neighbor positions rji = rj − ri . Each                 following text we discuss and benchmark the efficient
atomic species a is associated with a separate den-                  implementation of the density expansion and spheri-
sity built as a superimposition of Gaussian functions                cal invariant of order 2, i.e. the power spectrum, in
gσ (x) ≡ hx|gσ i with variance σ 2 , restricted to a local           librascal.
spherical cutoff rcut by a smooth function fcut :
                   X
     hax|A; ρi i =     δaaj hx − rji |gσ i fcut (rij ). (1)          III.   METHODS
                    j∈Ai

We use a notation that mimics the Dirac bra-ket for-                             In order to provide a concrete assessment of the
malism52 , in which the bra indicates the entity being                        impact of the optimizations we describe in this pa-
represented (the density field ρ centred on atom i of                         per, and to demonstrate the performance of the op-
structure A) and the ket the indices that label dif-                          timized code on a variety of realistic systems, we ap-
ferent features (in this case, the chemical species a                         ply a comprehensive benchmarking strategy that com-
and the position at which the field is evaluated, x,                          pares different classes of systems and breaks down the
that serves as a continuous index). To simplify the                           overall computational cost into contributions associ-
notation, when discussing the construction of the fea-                        ated with different steps of the evaluation of the ML
tures for an arbitrary configuration we omit the refer-                       model. We focus on a typical use case of SOAP fea-
ence to the atomic structure such that hax|A; ρi i →                          tures, namely, as the input to a kernel ridge regression
 hax|ρi i. Following Ref. 43, we introduce the sym-                           (KRR) model for a property and its derivatives (e.g.
metrized (ν + 1)-body correlation representation                              energy and forces). Even though librascal focuses
                                                                              on the evaluation of features, including the model eval-
    ha1 x1 ; . . . aν xν |ρ⊗ν i                                               uation step is crucial to assess the computational ef-
                           i
        X      Z                                                              fort of the evaluation of the features in the context of
                                      k                         k             the overall cost of the model. In the same spirit of
    =               d R̂ ha1 x1 | R̂î |ρi i . . . haν xν | R̂î |ρi i ,
       k=0,1     SO  3
                                                                              assessing the computational effort in a way that re-
                                                                       (2)    flects the most common use case scenario, we focus
                                                                              on the cost of evaluating a previously-trained model,
where ρ⊗νi      is a  tensor product     of ν   atom   centered    fields     rather than the cost to train the model itself. The
averaged over all possible improper rotations. This                           training step is almost always limited by memory, not
object can be understood as a fixed ν-point stencil                           computation time, and must only be performed once
centered on atom i which is applied continuously to                           per potential. When running a simulation, such as an
the density field, hence accumulating correlations of                         MD trajectory, the representation and model evalua-
body order ν + 1. To perform the rotational average,                          tion constitute the real limiting factors in what can be
it is convenient to expand the atom density on a basis                        achieved with a given potential in terms of statistics,
whose expansion coefficients are given by                                     system size, and complexity of the target properties.
                               Z                                                 We report and examine the benchmarks separat-
                     X                                                        ing the logical components of the overall calculation,
  hanlm|ρi i =            δaaj dx hnl|xi hlm|x̂i hx − rji |gσ i ,
                                                                              as summarized in Figure 1 and 2 – namely the con-
                    j∈Ai
                                                                              struction of the neighbour list, the calculation of the
                                                                          (3)
                                                                              local density expansion (that can be further broken
where x = kxk, x̂ = x/x. hx|nli ≡ Rnl (x) are or-                             down into the evaluation of radial and angular terms)
thogonal radial basis functions, which may or may                             the combination of the density coefficients to obtain
not depend explicitly on l (see e.g. Ref. 53), and                            an invariant representation, and the evaluation of the
 hlm|x̂i ≡ Ylm (x̂) are spherical harmonics. As we dis-                       model itself. Most of these steps can also be broken
cuss in Section IV A, the choice of hx|nli is flexible and                    down into the time required to compute just the repre-
can thus be guided by considerations of computational                         sentation (energy) versus the overhead for computing
and information efficiency. The angular dependence,                           gradients (forces) in addition.
on the other hand, is most naturally expanded using                              For the representation stage, it is possible to track
spherical harmonics, which results in compact expres-                         the computational cost as a function of a few key pa-
sions for the density correlation features of Eq. (2) in                      rameters, namely the radial and angular expansion
terms of products and contractions of the expansion                           limits (nmax and lmax , respectively). The benchmarks
coefficients of Eq. (3) (see Ref. 43 for more details).                       for this stage are reported not per atom, but per pair,
For the case of ν = 2,                                                        consistent with the overall scaling of this component
                                                                              of the calculation. The timings reported in this way
                         ⊗2            1                                      are therefore also mostly independent of the system in
    ha1 n1 ; a2 n2 ; l|ρi i = √
                                    2l + 1                                    question, i.e., the variation between systems is usually
               X
                          m                                                   comparable to the variation between individual timing
                    (−1) ha1 n1 lm|ρi i ha2 n2 l(−m)|ρi i (4)                 runs. The model stage has less of a detailed depen-
                m
                                                                              dence on the spherical expansion parameters, but the
which corresponds to the SOAP power spectrum of                               system dependence is more subtle. The main influ-
Ref. 41 up to some inconsequential factors. In the                            ence on the computational cost is the feature space di-
4

                               Angular
                             integration                                                    Spherical Harmonics

                                                                                                               ...
                                                     Radial Integration

                                                                                                    Spherical Expansion

                                                                                                                  ...

                                                                                        ...
                                                                                              ...

                                                                                                       ...
                                           Spherical Invariants

                   Local density
                                                                                                                  ...
                                                                                (flatten)
                                                                          ...
                                                                                                                ...
                  Neighbors List                          To model

                                                                                                     ...
     Figure 2: Schematic showing the process of expanding the density in a radial and angular basis set, and
                         recombining those to form spherical invariants (or covariants).

mension nfeat and the number of environments nactive              machine learning potentials. For a single-species sys-
used to parametrize the model. As will be discussed in            tem, we have chosen the bulk silicon dataset57 from
Section IV H, both of these parameters can be reduced             Bartók et al. 58 ; despite its simple species composi-
significantly by the use of dimensionality reduction al-          tion, it still represents a large array of structural di-
gorithms, with reduced computational cost generally               versity. The fluid methane dataset59 from Veit et al. 28
trading off with the accuracy of predictions (a pattern           has two chemical species, but distributed homoge-
seen in many other machine learning frameworks51 ).               neously throughout the cell; the dataset additionally
In order to run and organize the large number of in-              contains a range of different cell densities. In order
dividual benchmarks required for this study, we have              to include more challenging multi-species systems, we
made extensive use of the signac data management                  have selected three additional datasets from different
framework54,55 , which can be accessed from an open               application areas. The solvation dataset from Rossi
repository56 .                                                    et al. 60 consists of structures each containing a single
                                                                  molecule of methanesulfonic acid within a large cell of
                                                                  liquid phenol, where the presence of multiple species
                                                                  and the inhomogeneity of their distribution presents
A.    Datasets
                                                                  a challenge for both representation and fitting algo-
                                                                  rithms. The molecular crystal dataset “CSD1000r”
   The system dependence of the overall computation               used in Musil et al. 61 contains up to four species,
is influenced by two major factors. The first is the              where not all species are present in each separate
number density, which – together with the cutoff ra-              structure. Finally, the widely-used QM9 dataset62
dius rcut – determines the total number of pairs that             contains isolated molecules of up to 9 heavy (non-
must be iterated over to compute the representations,             hydrogen) atoms each, and composed up to five chem-
as well as the number of the degrees of freedom needed            ical species – where, again, not every species is repre-
to fully characterize the local environment, which in             sented in every structure.
turns affect the radial and angular expansion parame-
ters necessary to represent it. The second is the num-
ber of chemical elements that are present, which di-              IV. IMPLEMENTATION OF INVARIANT
rectly affects the dimensionality of the representation.          REPRESENTATIONS
However, several optimizations are possible depending
on the model and species composition, as well as the                 We begin by discussing the librascal implemen-
distribution of these species throughout the system,              tation of the the power spectrum SOAP features, and
making this a subtle and nontrivial influence on the              by showing how a deeper understanding of the struc-
total model cost.                                                 ture of the atom-density correlation features can be
   Therefore, we have decided to benchmark the over-              exploited to improve substantially the cost of evalua-
all cost (neighbour list, representation, and model to-           tion. Benchmarks on all the datasets discussed above
gether) on a selection of five realistic datasets that            are included in the SI – here we choose a subset of the
represent both typical and challenging applications of            different test cases, since in most cases the computa-
5

tional cost can be normalized in a way that minimizes                                       GTO                       DVR
the dependence on the specifics of the system at hand.

                                                                   Analytical integration
A.     Density coefficients

   The exact expression for the density coefficients de-
pends on the specifics of the atom density field and on
the basis used to expand it. To see this, it is advan-
tageous to separate the integral in Eq. (3) into radial
and angular coordinates. The angular integral is most
naturally expressed using the spherical harmonics ba-
sis, as evidenced e.g. by the particularly simple form

                                                                   Spline optimization
of Eq. (2). Then, regardless of the choice of the func-
tional form of the atom density or the radial basis,
the density coefficients can be written as a sum over
functions of neighbor distances and orientations
                  X
     hanlm|ρi i =      δaaj fcut (rji ) hnlm; rji |gσ i (5)
                    j∈Ai

where
                       Z
      hnlm; r|gσ i =       dx hnl|xi hlm|x̂i hx − r|gσ i .   (6)

For a Gaussian atom density, the integral can be fac-                                       Timing / µs / neighbors
torized into
                                                                    Figure 3: Computational cost for the evaluation of
              hnlm; r|gσ i = hr̂|lmi hnl; r|gσ i ,           (7)
                                                                    the radial integral and its derivatives with different
containing a radial integral                                       methods, for structures taken from the QM9 dataset.
                      Z ∞                                          Note that the dataset has very little influence on this
                     2                    2
hnl; r|gσ i = 4πe−cr     dx x2 hnl|xi e−cx il (2cxr) , (8)         benchmark since the radial integral and its derivative
                            0                                         are always evaluated once per neighbor. For the
                                                                          splining an accuracy of 10−8 was chosen.
where c = 1/2σ 2 , and the radial and angular degrees
of freedom are explicitly coupled by the l dependence
of the modified Bessel function. Thus, the density                                                    √
                                                                   where dn = 1/2σn2 , σn = rcut max( n, 1)/nmax , Nn2 =
coefficients can be computed by evaluating spherical
                                                                   2/(σn2n+3 Γ (n + 3/2)) is a normalization factor, and
harmonics and radial integral functions for each pair
                                                                   0 ≤ n < nmax . In contrast to the displaced Gaus-
of neighbors, and then summing over their products
                                                                   sian basis in the original formulation of SOAP,41 this
                                                                   choice of radial basis leads to a radial integral that
               X
 hanlm|ρi i =      δaaj fcut (rji )hr̂ji |lmi hnl; rij |gσ i (9)
                j∈Ai
                                                                   can be evaluated analytically:

Alternative atom-centred density formulations such as                                                 Γ n+l+3
                                                                                                                   
                                                                                               3/2 2           2  l l
                                                                                              
in ACE8 or TurboSOAP53 lead to similar expressions                    hnl|rij ; GTOi = π exp −crij Nn                c rij
                                                                                                         Γ l + 32
for the radial function. For instance, TurboSOAP                                                                !
chooses a Gaussian atomic density that is symmetric                          − n+l+3      n+l+3       3 c2 rij
                                                                                                            2

about ri instead of rji , making it possible to factorize          (c + dn )     2
                                                                                     1 F1       ,l + ,            , (11)
                                                                                            2         2 c + dn
the radial term such that hnl; rij |g̃i = hn|rij i hl|rij i.
Both terms can be efficiently computed using recur-                where 1 F1 is the confluent hypergeometric function
rence relations in l and n. In librascal, the density              of the first kind. Given that the overlap matrix S
expansion is implemented only for Gaussian atomic                  between GTOs of the form (10) can be computed an-
densities symmetric about rji , using two types of ra-             alytically, it is then easy to obtain an orthogonal basis
dial basis sets: The Gaussian type orbital (GTO) basis             set
and the discrete variable representation (DVR) basis.                                     X
                                                                         hn|o-GTOi =         [S−1/2 ]nn0 hn0 |GTOi .    (12)
                                                                                               n0
B.     GTO radial basis
                                                                   Thanks to the linear nature of all the operations in-
     The Gaussian type orbital radial basis is defined as          volved in the evaluation of the density expansion co-
                                                                   efficients, the orthogonalization can be applied at any
            hr|n; GTOi = Nn rn exp −dn r2 ,
                                           
                                                     (10)          point of the procedure. In the case of the analytical
6

evaluation of Eq. (11), it is convenient to first com-      each combination of radial 0 ≤ n < nmax and angular
bine the contributions from all the neighbors to the        0 ≤ l ≤ lmax indices, the integral is tabulated and the
density coefficients Eq. (8), and then orthogonalize        spline is computed for the range [0, rc ]. A grid {rk }M  k=1
just once. In section IV D, when computing the coef-        with constant step size ∆ is used to achieve a constant
ficients numerically, it is instead more convenient to      time complexity for the search of the closest interval
orthogonalize the radial integral Eq. (11) directly.        [rk , rk+1 ] for a distance rij ∈ [rk , rk+1 ]. Following the
   The total time required to compute the radial in-        implementation of Ref. 66, the computation of radial
tegral, as well as its derivative with respect to rij       terms simplifies to the evaluation of a polynomial of
(needed for gradients of the model), is plotted in the      degree 3 in rij with precomputed coefficients ck and
top left panel of Fig. 3 as a function of the expansion     dk :
parameters nmax and lmax , and scales roughly linearly
with respect to the expansion thresholds (see also the                          1
                                                                 hnl; rij |gσ i =  (rij − rk+1 )ck + (rij − rk+1 )3 dk
SI for a more detailed figure). Despite the use of an                          ∆
efficient and robust algorithm which is discussed in                    − (rij − rk+1 )dk + (rij − rk+1 )ck+1
Appendix A, most of the computational cost in the                    + (rij − rk+1 )3 dk+1 − (rij − rk+1 )dk+1 . (14)
                                                                                                              
evaluation of Eq. (11) is associated with the confluent
hypergeometric function 1 F1 .                              This expression requires only a small number of multi-
                                                            plications and additions, thus reducing the computa-
                                                            tional time of the radial integral by avoiding the eval-
C.   DVR radial basis                                       uation of the complex hypergeometric, exponential or
                                                            Gamma functions present in the analytical GTO and
   Another possible choice of basis is inspired by the      DVR basis sets. Given that the expression is linear
idea of using a numerical, rather than analytical, eval-    in the coefficients, it is straightforward e.g. to evalu-
uation of the radial integral. In fact, the numerical in-   ate the coefficients for hnl|o-GTOi by simply applying
tegral can be done exactly and with no discretization       the orthogonalization matrix to the coefficients of the
overhead if we choose the orthonormal DVR radial ba-         hnl|GTOi. Smooth derivatives ∂ hnl; rij |gσ i/∂rij of
sis with Gauss-Legendre quadrature rule.63 This basis       this piecewise polynomial function can also be com-
has the advantage of vanishing at every quadrature          puted by taking the derivative of the polyonmial with
                                           √
point except for one, i.e. hx|n; DVRi = ωn δ(x−xn ),        minimal additional effort. As seen in Fig. 3, splining
which simplifies the numerical radial integral into         reduces the computational cost of the radial integrals
                                                            by almost an order of magnitude, and effectively elimi-
                        √       2
     hnl|rij ; DVRi = xn ωn e−cxn il (2cxn rij ) ,   (13)   nates the difference between the GTO and DVR basis.
                                                               Thus, the choice of hx|nli should not be guided by
where xn are the quadrature points, distributed across      the cost of evaluation, but by a different metric – for
the range [0, rcut + 3σ] over which the integrand dif-      instance, the information efficiency. It has already
fers substantially from zero, and the ωn are the corre-     been shown that GTO encodes linearly regressable in-
sponding quadrature weights.64 The DVR basis is or-         formation more efficiently than DVR65 , implying that
thogonal by construction, and only requires evaluating      the splined GTO basis has a clear advantage overall.
the modified spherical Bessel function rather than the      There is evidence that the size of the radial basis set
more demanding 1 F1 , leading to a reduction by about       nmax has a larger influence than the angular expan-
a factor of 2 of the cost of evaluating radial integrals    sion threshold lmax on the accuracy of a SOAP-based
(top right panel in Fig. 3). Unfortunately, this comes      potential67 . Furthermore, a reduction in nmax redues
at the price of a less-efficient encoding of structural     the cost, not only of the spherical expansion coeffi-
information, particularly in the limit of sharp atomic      cients, but also of evaluating invariants. Together,
Gaussians, as recently shown in Ref. 65.                    these insights all point towards numerical optimiza-
   The computational cost of evaluating the radial in-      tion of the radial basis as a promising future line of
tegral in the DVR basis is again shown in the upper         investigation.
right-hand panel of Fig. 3. The computational cost
is reduced by more than half compared to the inte-
gral in the GTO basis, although the scaling with the        E.    Spherical Harmonics
lmax and especially nmax parameters remains approx-
imately linear (see plots in the SI).                          In contrast to the relatively obscure special func-
                                                            tions needed for the radial integrals, the spherical
                                                            harmonics needed for the angular part of the den-
D.   Spline optimization                                    sity coefficients (cf. Eq. (7)) are much more widely
                                                            used due to their importance in any problem with
   Rather than devising basis functions that allow for      spherical symmetry. Correspondingly, there has been
a less demanding analytical evaluation of the radial        much research into finding efficient algorithms to eval-
integrals, one can evaluate inexpensively the full ra-      uate spherical harmonics, leading to many good algo-
dial integral hnl; r|gσ i by pre-computing its value on     rithms becoming publicly available. In librascal, we
a grid, and then using a cubic spline interpolator. For     have chosen to implement the algorithm described in
7

                          4                                            total cost for realistic parameter sets.
                                  No gradients                            For the molecular materials, on the other hand,
timing / µs / neighbors

                          3       Gradients                            the evaluation of the invariants becomes more expen-
                                                                       sive, becoming comparable to the computation the
                                                                       density coefficients. The difference can be explained
                          2                                            as follows. Given that the coefficients hanlm|ρi i are
                                                                       combined to obtain spherically equivariant represen-
                          1                                            tations of the atomic environment by averaging over
                                                                       the group symmetries their tensor products, as out-
                                                                       lined in Eq. (2), their evaluation exhibits a very dif-
                          0
                              1    3    5    7          9   11 13 15   ferent scaling. The cost is independent of the number
                                                 lmax                  of neighbors and instead depends strongly on the size
                                                                       of the basis used to expand the atom density as well
                                                                       as on the number of chemical species nspecies . For
            Figure 4: Timings for the computation of the               the special case of spherical invariants of body order
           spherical harmonics as a function of the angular            (ν +1) = 3, corresponding to classic SOAP features41 ,
              expansion threshold for the QM9 dataset.                 evaluating Eq. (4) essentially amounts to computing
                                                                       an outer product over the (a, n) dimension of expan-
                                                                       sion coefficients that is then summed over m – which
Limpanuparb and Milthorpe 68 , which makes use of                      requires a number of multiplications of the order of
efficient recurrence relations optimized for speed and                 nspecies 2 nmax 2 (lmax + 1)2 . In summary, the cost of the
numerical stability and is similar to the algorithm im-                different steps varies substantially depending on the
plemented in the GNU Scientific Library69 . Gradients                  system, the cutoff, and the expansion parameters, and
are computed from analytical expressions.                              there is no contribution that dominates consistently in
   As Figure 4 shows, the cost scales linearly with the                all use cases.
angular expansion parameter lmax , and including gra-
dients consistently increases the cost by a factor of
about 4, consistent with the need to compute 3 addi-
tional values per spherical harmonic. The cost to com-                 G.   Cost of gradients
pute the spherical harmonics and gradients is typically
comparable to, or larger than, the cost to compute the                    Evaluating the gradients of the invariant features
splined radial integral; this cost is discussed in more                with respect to the atomic coordinates is a necessary
detail and in the context of the whole invariants com-                 step to compute model derivatives, e.g. forces and
putation in the following section.                                     stresses for MD simulations – but it also entails a sub-
                                                                       stantial overhead, as the right-hand panels of Fig. 5
                                                                       shows. This overhead is ultimately a consequence of
F.                   Spherical Expansion and Invariants                the direct evaluation of the gradients of the features,
                                                                       which requires a separate contraction for each of the
   Having discussed how to implement an efficient pro-                 hnn0 l| components in the SOAP vector,
cedure to evaluate the radial and the angular terms
contributing to the density expansion, let us now con-
                                                                       ∂ hnn0 l|ρ⊗2
                                                                                 i i
                                                                                       X             ? ∂ hnlm; rji |gσ i
sider the cost of the remaining steps to obrain the                                  ∝   hn0 lm|ρi i                     +. . .   (15)
                                                                           ∂rj                              ∂rj
full SOAP feature vector ha1 n1 ; a2 n2 ; l|ρ⊗2
                                             i i. Figure 5
                                                                                       m
presents an overview of the timings for all evaluation
steps for different (nmax , lmax ), comparing a dataset                While some speedup could be attained by reordering
of bulk Si configurations and a database of molecular                  the summation, the core issue is the need to compute
crystals. For a few selected parameter sets, the figure                a separate term for each feature and each neighbor
also shows the breakdown of the evaluation time into                   of the central atom, which means that the compu-
the part associated with the evaluation of radial inte-                tational effort, for the typical values of (nmax , lmax ),
grals and spherical harmonics for each neighbor, the                   is overwhelmingly dominated by the construction of
combination of the two into the full density expansion                 the invariants. These issues indicate that the evalua-
coefficients, and the calculation of the SOAP invari-                  tion of gradients would benefit from further optimiza-
ants. The spline interpolation makes the cost of radial                tions – in particular, trading off modularity for speed
integrals negligible, and even the evaluation of spher-                by optimizing the expansion coefficients together with
ical harmonics usually requires less than 25% of the                   the model evaluation. This way, it will be possible
total timing. For the silicon dataset, which has only                  to avoid the (re)computation of certain intermediate
one atomic species, the cost is typically dominated by                 quantities, analogous to the optimization of the order
the combination of radial and angular terms. Indeed,                   of matrix multiplications involved in the evaluation of
the computational cost of this step scales roughly as                  the chain rule linking the model target and the input
nneigh nmax (lmax + 1)2 , which can easily dominate the                atomic coordinates.
8

                                                                        Molecular crystals                                                                                                          Bulk Silicon
 Splined GTO — no gradients

                                  105 µs/atom                                           156 µs/atom                                                                                                       56.7 µs/atom
                                                                                                                            2 666 µs/atom                 57.3 µs/atom                                                          1 189 µs/atom
                                                                                                                       10                                                                                                  10
                                                         11 13 15

                                                                                                                                                                                 11 13 15
                                                                                                                                    timing / ms / atom
                                                                                                                            1

                                                                                                                                                                                                                                        timing / ms / atom
                                                                                                                       10
                                                                                                                                                                                                                                0
                                                                                                                                                                                                                           10
                                                         9

                                                                                                                                                                                 9
                                                 l max

                                                                                                                                                                         l max
                                                                                                                            0
                                                                                                                       10
                                                         7

                                                                                                                                                                                 7
                                                                                                                                                                                                                                    1
                                                                                                                                                                                                                           10
                                                         5

                                                                                                                                                                                 5
                                                                                                                                1
                                                                                                                       10
                                                         3

                                                                                                                                                                                 3
                                                         1

                                                                                                                                                                                                                                    2

                                                                                                                                                                                 1
                                                                                                                                                                                                                           10
                                                                    2      4     6      8     10 12 14                                                                                      2   4   6     8     10 12 14
                                                                                      n max                                                                                                             n max
                                                                    radial integral            spherical expansion
                                                                    spherical harmonics        invariant calculation
                                  21.2 µs/atom                                                                                      44.9 ms/atom          8.81 µs/atom                                                                  14.4 µs/atom

                                                                                        5.33 ms/atom
 Splined GTO — with gradients

                                  1.62 ms/atom                                                                              2 31.2 ms/atom                543 µs/atom                                     1.15 ms/atom
                                                                                                                                                                                                                                1 6.23 ms/atom
                                                                                                                       10                                                                                                  10
                                                         11 13 15

                                                                                                                                                                                 11 13 15
                                                                                                                                    timing / ms / atom

                                                                                                                            1

                                                                                                                                                                                                                                        timing / ms / atom
                                                                                                                       10
                                                                                                                                                                                                                                0
                                                                                                                                                                                                                           10
                                                         9

                                                                                                                                                                                 9
                                                 l max

                                                                                                                                                                         l max

                                                                                                                            0
                                                                                                                       10
                                                         7

                                                                                                                                                                                 7

                                                                                                                                                                                                                                    1
                                                                                                                                                                                                                           10
                                                         5

                                                                                                                                                                                 5

                                                                                                                                1
                                                                                                                       10
                                                         3

                                                                                                                                                                                 3
                                                         1

                                                                                                                                                                                                                                    2
                                                                                                                                                                                 1

                                                                                                                                                                                                                           10
                                                                    2      4     6      8     10 12 14                                                                                      2   4   6     8     10 12 14
                                                                                      n max                                                                                                             n max

                                  419 µs/atom                                                                                       2.28 ms/atom          69.8 µs/atom                                                                  309 µs/atom

    Figure 5: Effect of radial and angular cutoff on overall timing of calculating spherical invariants. (left)
  molecular crystals dataset (with, on average, 27 neighbors per center, and 4 elements) (right) bulk silicon
  dataset (16 neighbors per center and a single element). (top) SOAP power spectrum only (bottom) SOAP
   power spectrum and gradients. All calculations use the GTO radial basis with spline optimization. For
         selected points, we also show, as pie charts, the relative time spent in the different phases.

H.                              Feature dimensionality reduction                                                                                         fectively needed for the prediction of typical atomistic
                                                                                                                                                         properties.
   A more straightforward, and potentially more im-                                                                                                         Therefore, a subset of features – usually a small
pactful, optimization involves performing a data-                                                                                                        fraction of the full set – can be selected with little
driven selection to reduce the number of invariant                                                                                                       impact on the model error.44,70 Both CUR71 and far-
features to be computed and used as inputs of the                                                                                                        thest point sampling (FPS)70,72,73 selection strategies
model. Even though representations based on system-                                                                                                      are available in librascal, and can be performed as
atic orthonormal basis expansions, such as the SOAP                                                                                                      a preliminary step in the optimization of a model, us-
power spectrum, provide a complete linear basis to de-                                                                                                   ing Python utility functions. Feature selection can re-
scribe 3-body correlations7,48 , and even though they                                                                                                    duce the time spent both on computing the features,
do not provide an injective representation of an atomic                                                                                                  the model parameters and on making predictions (see
environment46 , one often finds that for realistic struc-                                                                                                Section V B). For kernel models, and reasonably sim-
tural datasets different entries in the SOAP feature                                                                                                     ple forms of the kernel function, the evaluation of both
vector exhibit a high degree of correlation. This means                                                                                                  the features and the kernels scale linearly with nfeat .
that they span a much larger space than what is ef-                                                                                                        Once a list of selected features has been obtained,
9

                          103                                       scaling of the different terms with the system parame-
                                  Bulk Silicon                      ters, most notably the neighbor density and the num-
                                  Liquid Methane
timing / µs / neighbors

                                                                    ber of chemical elements. We include a simple but
                                  Solvated CH3SO3H                  complete implementation of kernel ridge regression, a
                                  QM9                               framework that is often used together with SOAP fea-
                                  Molecular Crystals
                                                                    tures and that allows us to comment on the interplay
                          102                                       between the calculation of the representation and the
                                                                    model. Thus, we can compare the computational ef-
                                                                    fort associated with the use of librascal with that of
                                                                    QUIP, an existing, well-established code to train and
                                                                    evaluate Gaussian approximation potentials (GAPs)
                          101                                       based on SOAP features, and investigate the effect
                             10        100      1000        10000   of the various optimizations described above on the
                                         number of features         overall model efficiency.
    Figure 6: Timing for the calculation of SOAP power
    spectrum with gradients as function of the requested
      number of features. Horizontal lines represent the
                                                                    A.   Existing implementations
     time taken by the spherical expansion step for each
         dataset. The grey line is a guide for the eye
       representing a linear relation between time and                 Over the past couple of years, several codes have
      number of features. The (nmax , lmax ) parameters             been released that can be used to fit and run ML
      used are the following: bulk Silicon (10,11), liquid          potentials supporting different representations, es-
        methane (8,7), solvated CH3 SO3 H (8,7), QM9                pecially for neural-network type potentials such as
            (12,9), and molecular crystals (10,9).                  n2p274 (which uses Behler-Parrinello ACSF39 ), ANI-
                                                                    175 , PANNA76 , or DeepMD77 . Here we focus on ker-
                                                                    nel methods, for which there is a smaller number of ac-
   their indices {q} ≡ {(aq nq ; a0q n0q ; lq )} can be passed      tively used codes. The first, and still widely adopted,
   to the C++ code. The sparse feature computation is               is the QUIP library, part of the libAtoms framework78 ,
   simply implemented as a selective computation of the             which has been used for almost all published Gaussian
   pre-selected invariants hq|ρ⊗2                                   approximation potentials (GAPs)2,25,27,28,53,79,80 and
                                 i i. The effect of this op-
   timization on the overall cost of computing spherical            continues to be actively maintained. Other kernel-
   invariants is shown in Figure 6, with realistic nmax             learning potential packages of note are GDML, which
   and lmax parameters, which are comparable to those               implements the “gradient-domain machine learning”
   used in applications (i.e. (nmax , lmax ) equal to (10, 12)      method of Chmiela et al. 81 (the full-kernel equiva-
   for Si58 , (8, 6) for methane28 , and (9, 9) for molecular       lent of the sparse kernel model we implement here),
   crystals61 ). The overall trend is that of a constant            and QML82 , which notably implements the FHCL-
   contribution (from the spherical expansion) plus a lin-          type representations83 and the OQML framework15 .
   ear term (from the spherical invariants). Although               We finally note for completeness several codes used
   most datasets do not reach linear scaling even for the           for linear high-body-order models, such as the SNAP
   largest number of features, selecting a small nfeat can          method47 implemented in LAMMPS84 , aPIPs85 and
   reduce the computational cost by up to an order of               ACE8,48 implemented in JuLIP86 , and the NICE de-
   magnitude. The impact of feature dimensionality on               scriptors49 implemented in a separate code87 inter-
   both the computational cost and accuracy of mod-                 faced with librascal (see Section VI). Here we focus
   els trained on realistic data is discussed in Section V;         only on the QUIP code, which is the most mature im-
   briefly, the features can be sparsified fairly aggres-           plementation available and matches most closely the
   sively (up to a factor of about 5–10, depending on the           application domain of librascal.
   dataset) without any significant impact on the predic-
   tion error.

                                                                    B.   Kernel models
   V.                     COMPARATIVE BENCHMARKS
                                                                      To benchmark the performances of librascal in
      Now that we have analyzed separately the different            the context of the GAP framework typically used to
   components of the calculation of the SOAP features,              build potentials with SOAP, we implemented the same
   we turn our attention to the end-to-end benchmarking             regression scheme used in QUIP to build a MLIP
   of a full energy and force evaluation, similar to what           based on the SOAP power spectrum representation.
   one would encounter when running a MD simulation.                We summarize the key ideas, emphasizing the aspects
   As in the previous Section, we run comprehensive tests           that are important to achieve optimal performance. In
   on each of the five datasets described in Section III A,         a GAP, as in the vast majority of regression models
   and we report here those that are most telling of the            based on atom-centred features, the energy is defined
10

as a sum of atomic contributions                                   summing the derivatives of the kernel over the active
                     X            X                                set. For instance, for the force,
    E(A) ≡ hE|Ai =       E(Ai ) ≡   hE|Ai i               (16)
                           i∈A            i∈A
                                                                                     X               X
                                                                        ∇j hE|Ai =          δai aI       ∇j hq|Ai ; repi
where Ai indicates a local environment centered on                                   i∈A             q
atom i. An accurate, yet simple and efficient GAP can
                                                                                       "                                   #
                                                                                           X                  ∂k(MI , Ai )
be built using a “projected process approximation”88                               ×             hE; aI |MI i                . (21)
form of kernel ridge regression, that mitigates the un-                                                        ∂ hq|Ai i
                                                                                           I∈M
favorable scaling with train set size ntrain of the cost of
fitting (cubic) and predicting (linear) energies using a           This form shows that the cost of evaluating forces
“full” ridge regression model. A small, representative             scales with nfeat nneigh nactive , indicating how the re-
subset M of the atomic environments usually found                  duction of the number of sparse points and features
in the training set – the so-called “active”, “pseudo-”            combine to accelerate the evaluation of energy and
or “sparse” points – is used, together with a positive-            forces using a sparse GAP model.
definite kernel function k, as a basis to expand the                  The fitting procedure that is implemented in
atomic energy                                                      librascal has been discussed in Ref.89 , and we do
               X                                                   not repeat it here. It only requires the evaluation
      E(Ai ) =     δai aI hE; aI |MI i k(MI , Ai ),   (17)         of kernels and kernel derivatives between the active
               I∈M                                                 set environments, and the environments in the struc-
                                                                   tures that are part of the training set, and is usually
where MI indicates the I-th sparse point, hE; aI |MI i             limited by memory more than by computational ex-
indicates the regression weights, and a separate energy            pense. In the benchmarks we present here we adopt
model is determined for each atomic specie, which also             the polynomial kernel which has been widely used to
means that the active set is partitioned with respect to           introduce non-linearity into SOAP-based GAP mod-
the central atom type. The sparse model (17) exhibits              els2,25,27,28,79 :
a much more favorable scaling with training set size,                                 "                            #ζ
both during fitting (O(ntrain nactive 2 +nactive 3 ), for the                          X
implementation in librascal) and when predicting a                    kζ (MI , Ai ) =     hMI ; rep|qi hq|Ai ; repi , (22)
new structure (O(nactive )). Obviously, the accuracy                                        q

of the approximation relies on a degree of redundancy
                                                                   whose derivative can be simply computed as
being present in the training set, and in practice a
suitable size of the active set M scales with the “di-                   ∂kζ (MI , Ai )
versity” of the training set. Usually, however, an ac-                                  = ζ hMI ; rep|qi kζ−1 (MI , Ai ).     (23)
                                                                           ∂ hq|Ai i
curacy close to that of a full model can be reached
even with nactive  ntrain . The gradient of the energy
with respect to the coordinates of an atom j can be                C.    Benchmarks of sparse models
obtained as a special case of the general form (B1)
               X                 X                                    Having summarized the practical implentation of
  ∇j hE|Ai =        hE; aI |MI i    δai aI ∇j k(MI , Ai ),
                                                                   a sparse GPR model based on SOAP features, we
               I∈M                 i∈A
                                                    (18)           can systematically investigate the effect of the sparsi-
and the virial (the derivative with respect to deforma-            fication parameters – number of sparse environments
tions η of the periodic cell) as a special case of (B2)            nactive and number of sparse features nfeat – on the
                                                                   different components of an energy and force calcula-
   ∂         X              X        X                             tion. Figures 7 and 8 show the full cost of evaluating
     hE|Ai =   hE; aI |MI i   δai aI                               a MLIP for different classes of materials, both with
  ∂η
                 I∈M                i∈A         j∈Ai               and without the evaluation of forces, for different lev-
                                 rji ⊗ ∇j k(MI , Ai ). (19)        els of sparsification in terms of both nactive and nfeat .
                                                                   The cost is broken down in the contributions from the
In both Eqs. (18) and (19), the sum over the neigh-                evaluation of the the neighbour list, the representa-
bors of atom i extends also over periodic replicas of              tion, and model evaluation (prediction) steps.
the system. Both equations require the evaluation of                  Figure 7 shows that, when using the full feature vec-
kernel gradients, that can in turn be expressed using              tor in the model90 , the evaluation of the kernels con-
the chain rule in terms of the derivatives of the ker-             tributes substantially to the cost of predicting ener-
nel function with respect to atomic features, and the              gies. In QUIP this cost, which scales linearly with the
atomic gradient of such features:                                  number of active points, matches the cost of evaluat-
                                                                   ing the representations – independent of nactive , since
                     X                       ∂k(MI , Ai )          the same number of representations must always be
  ∇j k(MI , Ai ) =         ∇j hq|Ai ; repi                . (20)
                       q
                                              ∂ hq|Ai i            computed for the target structure, at nactive ≈ 5000
                                                                   for Si, and nactive ≈ 500 for the molecular crystals.
When computing the model derivatives it is important               Due to the optimization of the feature evaluation step,
to contract the sums in the optimal order, by first                in librascal the kernel evaluation dominates down
11

              SOAP representation
              gradients overhead
              energy prediction
              forces prediction
              librascal
              QUIP

 Figure 7: Prediction timings for GAP models as a function of the number of sparse points, with (right) and
   without (left) the evaluation of forces, with minimal feature sparsification, i.e., just enough to eliminate
  redundant symmetric terms (these are retained in librascal for simpler bookeeping). We used all unique
    SOAP features for each system in this figure, meaning 6660 features for the molecular crystals and 715
                                             features for bulk silicon.

to even smaller nactive . Note also the lower cost of        how this speedup combines with the acceleration of
the kernel evaluation for the molecular crystals in          the model evaluation step, whose nominal complex-
librascal, which can be explained by the fact that           ity also scales linearly with nfeat , for a intermediate
only the features associated with chemical species that      size of the active set nactive = 2000. For simple,
are present in each structure are computed, while in         single-component systems such as bulk silicon the cost
QUIP they yield blocks of zeros that are multiplied          saturates to that of evaluating the density expansion
to compute scalar products. Evaluating also forces           coefficients, and so the overall speedup that can be
(right-hand panels of Fig. 7) introduces a very large        achieved by feature sparsification is limited to about
overhead to feature calculation (up to one order of          a factor of two or three with respect to the full SOAP
magnitude, as discussed in the previous Section) and         power spectrum. For multi-component systems, such
roughly doubles the cost of model prediction. Since          as the CSD dataset or the solvated CH3 SO3 H dataset,
the cost of feature evaluation is independent on nactive ,   a speedup of nearly an order of magnitude is possible.
the active set can be expanded up to thousands of en-
vironments before the model evaluation become com-
parable to feature evaluation.                               D.   Accuracy-cost tradeoff
   In order to accelerate calculations further, it is then
necessary to reduce not only the time needed to com-           While the performance optimization discussed in
pute the model, but also the time needed to compute          Section IV can dramatically increase the efficiency of
the representation itself. In Fig. 6 we showed how re-       a MLIP based on SOAP features and sparse GPR,
stricting the evaluation of SOAP features to a smaller       one should obviously ensure that models with reduced
subset of the ha1 n1 ; a2 n2 ; l| indices reduces by up to   nactive and nfeat still achieve the desired accuracy. The
an order of magnitude the cost of evaluating the fea-        data-driven determination of the most representative
ture vector and its gradients. Figure 8 demonstrates         and diverse set of features and samples is a very active
12

                   neighbor list
                   SOAP representation
                   SOAP gradients
                   energy prediction
                   forces prediction
                   librascal
                   QUIP

Figure 8: Prediction timings for GAP models as a function of the number of features, with (right) and without
 (left) the evaluation of forces. All models use 2000 sparse points for the sparse kernel basis. The rightmost
     column in each plot shows the cost with some redundant features, which are computed by default in
  librascal for simpler bookkeeping. In practical applications, though, we recommend these be eliminated
                                  automatically through feature sparsification.

area of research, using both unsupervised44,70–72,91       ML model to reproduce properties that are indirectly
and, very recently, semi-supervised92 criteria to select   related to the accuracy of the PES58,94 :
an optimal subset. Here we use the well-established                 sR
FPS to sort features and environment in decreasing                      1.06 V0                          2
                                                                       0.95 V0
                                                                                [E GAP (V ) − E DFT (V )] dV
order of importance, starting from the full list of en-        ∆=                                            (24)
                                                                                      0.12 V0
vironments for the Si dataset and a pool of 715 fea-
tures, corresponding to nmax = 10, lmax = 12. We           where E GAP and E DFT are the GAP and DFT ener-
train MLIPs to reproduce energy and forces and re-         gies relative to the diamond energy minimum, and V0
port the four-fold cross-validation error as well as the   is the volume of the minimum DFT energy structure
cost for evaluating the energy and its gradients in Fig-   for each phase.
ure 9, using only the “best” nactive active points and        The results clearly indicate that it is possible to
nfeat features. We also report the “∆” measure intro-      considerably reduce the cost of the MLIP with little
duced in Ref. 93 as an indication of the ability of the    impact on the accuracy of the model. Severe degra-
13

  Figure 9: Evaluation of the GAP model performance for bulk Silicon. We present the evaluation cost and
 corresponding error as a function of the number of sparse training point and features selected. From left to
right, top to bottom: time required to evaluate the model, root mean square error for the predicted energies
   and forces, absolute error in the predicted volume compared in the Diamond phase, and ∆-error — see
equation (24) — for the energy/volume curve for Diamond and β-Sn phases. For all errors, the reference are
                                     the values from DFT calculations58 .

dation of model performance occurs in the regime in        the bispectrum41,47 and the ν = 2 equivariants that
which the computational cost is dominated by the           underlie the λ-SOAP kernels13 (which is also avail-
calculation of the density expansion coefficients, sug-    able as an independent implementation96 ). As devel-
gesting that further optimization of the evaluation of     opment progresses, these libraries will be further inte-
hanlm|ρi i might not be exceedingly beneficial to most     grated with librascal, harmonizing and streamlining
practical use cases.                                       the user-facing APIs, and achieving the best balance
                                                           between modularity and evaluation efficiency.

VI.   EXPERIMENTAL FEATURES
                                                           VII.   CONCLUSIONS
   The spherical expansion coefficients can also be
used to compute equivariant features and kernels13,95 ,       In this paper we have made practical use of recent
as well as higher-body-order invariants8,43 . This eval-   insights into the relationships between several fam-
uation is easily and efficiently done with an external     ilies of representations that are typically applied to
library, as it is done in the current implementation87     the construction of machine-learning models of the
of the N-body iterative contraction of equivariants        atomic-scale properties of molecules and materials.
(NICE) framework49 . Furthermore, librascal con-           We have demonstrated how these insights can be
tains experimental implementations of other represen-      translated into algorithms for more efficient computa-
tations based on the SOAP framework, for example           tion of these representations, most notably SOAP, but
14

also the atom-density bispectrum and the λ-SOAP              where a = n+l+3
                                                                           2     and b = l + 32 . We take into account
equivariants. We have shown how the radial basis             that the arguments of 1 F1 are real and positive and
used to expand the density can be chosen at will and         we avoid its artificial overflow by using the asymptotic
computed quickly using a spline approximation. To-           expansion (Eq. 13.2.4 and Eq. 13.7.1 in Ref. 97)
gether with a fast gradient evaluation, this reduces the
                                                                                                      ∞
cost of computing the density expansion to the point                                     Γ(b) X (b − a)s (1 − a)s −s
where it is no longer the rate limiting step of the calcu-   lim 1 F1 (a, b, z) = ez z a−b                           z ,
                                                             z→∞                        Γ(a) s=0           s!
lation in typical settings. Further optimizations can
                                                                                                                   (A3)
be obtained by a “lossy” strategy, which trades off
                                                             since the   exponential  in  Eq.  (11)   can    be  factor-
some accuracy for efficiency by discarding redundant                 h 2 2 i
                                                                       c r
                                                                                               h                  i
                                                                                                         c
                                                                                     2
                                                                                                 2
or highly correlated entries in both the active set of       ized exp c+dijn exp −crij    = exp crij ( c+d n
                                                                                                              − 1)  and
a projected-process regression model and in the in-            c
                                                             c+dn −1 < 0. Note that G is implemented as a class so
variant features. We have implemented all these opti-        that the switching point between the direct series and
mizations in librascal, a modular, user-friendly and         the asymptotic expansion evaluations is determined
efficient open-source library purpose-built for the com-     at construction for particular values of a and b using
putation of atom-density features (especially SOAP).         the bisection method.
   In order to test these optimizations in practice,            For each value of n, the function G and its deriva-
we have run benchmarks over different kinds of               tives with respect to rij can be efficiently evaluated
datasets spanning elemental materials as well as or-         using the two step recurrence downward relation
ganic molecules in isolation, in crystalline phases,
and in bulk liquid phases. Using one of the most                                           c2 rij
                                                                                               2

widespread codes for the training and evaluation of            G(a + 1, b + 1, rij ) =            G(a + 2, b + 3, rij )
                                                                                          c + dn
SOAP-based machine-learning interatomic potentials                                        + (b + 1)G(a + 1, b + 2, rij ),
as a reference, we have found that our implementa-                                                                      (A4)
tion of the SOAP representation is much faster, but
that the advantage is less dramatic when considering                                       c2 rij
                                                                                                2
                                                                                                  a−b
                                                                        G(a, b, rij ) =               G(a + 1, b + 2, rij )
also the calculation of a kernel model, which scales                                      c + dn a
with the number of features, and that of the gradi-                                           b
ents, which is dominated by a term that scales with                                       + G(a + 1, b + 1, rij ), (A5)
                                                                                              a
the number of neighbors in both codes. Feature selec-
tion, however, addresses both these additional over-                                          2c2 r
                                                             with ∂G(a, b, rij )/∂rij = c+dij  n
                                                                                                 G(a + 1, b + 1, rij ) −
heads, and allows for an acceleration of the end-to-         2crij G(a, b, rij ). We found empirically that only the
end evaluation time of energy and forces by a factor         downward recurrence relation was numerically stable
anywhere between four and ten with minimal increase          for our range of parameters. Note that a + 1 corre-
in the prediction errors. Our tests show that in the         sponds effectively to steps of l +2 so computing G and
current implementation, when using realistic values           dG
of the parameters, the different steps of the calcula-       drij for all l ∈ [0, lmax ] and all n ∈ [0, nmax [ requires

tion contribute similarly to the total cost, indicating      4nmax evaluations when using this recurrence relation.
that there is no single obvious bottleneck. Further
improvements, although possible, should consider the
                                                             Appendix B: Derivatives of the energy function
model as a whole and especially improve the accu-
racy/cost balance of lossy model compression tech-
niques.                                                         We have defined an atom centered energy model
                                                             such that the energy associated with   Pstructure A can
                                                             be written as in Eq. (16), E(A) = i∈A E(Ai ). The
Appendix A: Efficient implementation of 1 F1                 structure A is determined by the set of atomic coordi-
                                                             nates and species {ri , ai } and (for periodic structures)
                                                             unit cell vectors {h1 , h2 , h3 }. The atom-centred en-
  The confluent hypergeometric function of the first         vironment Ai is entirely characterized by the atom
kind is defined as                                           centered vectors {rji = rj − ri } with rji < rcut . The
                                ∞                            derivatives of E with respect to the position of atom
                               X    (a)s s
              1 F1 (a, b, z) =              z ,     (A1)     k (the negative of the force acting on the atom) can
                               s=0
                                   (b) s s!                  be computed using the chain rule

where (a)s is a Pochhammer’s symbol (Ref. 97, Chap.             ∂E(A) X ∂E(Ai ) X X ∂E(Ai ) ∂rji
                                                                     =         =           ·     .
5.2(iii)). To efficiently compute Eq. (11), we imple-            ∂rk     ∂rk         ∂rji    ∂rk
                                                                           i∈A                 i∈A j∈Ai
ment a restricted version of 1 F1
                                                                                                                 (B1)
                                                    !        Here index j runs over the neighbors of atom i, which
                  Γ(a)        2
                                             c2 rij
                                                 2
                                                             include periodic images, if the system is periodic. The
                               
  G(a, b, rij ) =      exp −crij 1 F1 a, b,           ,
                  Γ(b)                      c + dn           term ∂rji /∂rk is zero unless k = i (in which case it
                                                    (A2)     evaluates to −1) or if j = k (in which case it evaluates
You can also read