Two-loop five-point integration-by-parts relations in a usable form

Page created by Nathaniel Pearson
 
CONTINUE READING
Prepared for submission to JHEP                              PCFT-21-08, USTC-ICTS-21-08

                                          Two-loop five-point integration-by-parts relations in a
                                          usable form
arXiv:2104.06866v1 [hep-ph] 14 Apr 2021

                                          Dominik Bendlea,f Janko Boehma Murray Heymanna,f Rourou Mab,c Mirko Rahnf
                                          Lukas Ristaua Marcel Wittmanna Zihao Wuc,d Yang Zhangc,d,e
                                          a
                                            Department of Mathematics, Technische Universität Kaiserslautern, 67663 Kaiserslautern, Ger-
                                            many
                                          b
                                            Cuiying Honors College, Lanzhou University, Lanzhou, Gansu 730000, China
                                          c
                                            Interdisciplinary Center for Theoretical Study, University of Science and Technology of China,
                                            Hefei, Anhui 230026, China
                                          d
                                            Peng Huanwu Center for Fundamental Theory, Hefei, Anhui 230026, China
                                          e
                                            Max-Planck-Institut für Physik, Werner-Heisenberg-Institut, D-80805, München, Germany
                                          f
                                            Fraunhofer Institute for Industrial Mathematics (ITWM), Fraunhofer-Platz 1, 67663 Kaiser-
                                            slautern, Germany
                                              E-mail: dominik.bendle@itwm.fraunhofer.de,
                                              boehm@mathematik.uni-kl.de, heymann@mathematik.uni-kl.de,
                                              marr16@lzu.edu.cn, mwittman@rhrk.uni-kl.de, wuzihao@mail.ustc.edu.cn,
                                              yzhphy@ustc.edu.cn

                                          Abstract: In this paper, we show that with the state-of-art module intersection IBP re-
                                          duction method and our improved Leinartas’ algorithm, IBP relations for very complicated
                                          Feynman integrals can be solved and the analytic reduction coefficients can be dramatically
                                          simplified. We develop a large scale parallel implementation of our improved Leinartas’ al-
                                          gorithm, based on the Singular/GPI-Space framework. We demonstrate our method by
                                          the reduction of two-loop five-point Feynman integrals with degree-five numerators, with
                                          a simple and sparse IBP system. The analytic reduction result is then greatly simplified
                                          by our improved Leinartas’ algorithm to a usable form, with a compression ratio of two
                                          order of magnitudes. We further discover that the compression ratio increases with the
                                          complexity of the Feynman integrals.
Contents

1 Introduction                                                                           1

2 IBP from module intersection and the reduction procedure                               4
  2.1 Syzygy equation and module intersection                                            4
  2.2 Linear reduction of the IBP system                                                 5

3 Partial fraction decomposition                                                         6
  3.1 Nullstellensatz decomposition                                                      7
  3.2 Algebraic dependence decomposition                                                 7
  3.3 Numerator decomposition                                                            8
  3.4 The resulting algorithm                                                            9
  3.5 A simple example                                                                   9
  3.6 Parallelization                                                                   10

4 Parallel computing using the Singular/GPI-Space framework                             11
  4.1 GPI-Space and Singular                                                            11
  4.2 Description of the Petri net                                                      12

5 An    example for partial fraction decomposed IBP coefficients                        13
  5.1    Diagram and target integrals                                                   13
  5.2    Generating short IBP relations by module intersection                          15
  5.3    RREF and Interpolation of analytic IBP coefficients                            16
  5.4    Reduction coefficients simplification achieved by pfd program                  17

6 Summary and outlook                                                                   18

A Instructions on how to use the massively parallel PFD application                     19

1   Introduction

Higher order perturbative quantum field theory correction is a very important theoretical
input for high energy physics. The higher order correction is usually calculated by the
precision computation of multi-loop Feynman diagrams. In a multi-loop diagram compu-
tation, usually there is involved a huge number of Feynman integrals, and one has to do
the so-called integral reduction to reduce them to a relatively small integral basis before
integral evaluation. From the experience of multi-loop Feynman integral studies, the inte-
gral reduction is frequently the most time and memory consuming step. There has been
done a lot of research towards developing and improving the integral reduction algorithms,
as well as simplifying the integral reduction results.

                                          –1–
Usually, the integral reduction is carried out through the classic method of integration-
by-parts (IBP) reduction [1], i.e., the total derivatives within the dimension-regularization
scheme are zero. The IBP reduction can be carried out with symbolic Feynman indices
and an algebraic reduction [2], or with integer-valued indices to perform a linear reduction
(Laporta algorithm [3]). Several public codes for IBP reduction are available [4–15].
Recently, there are also novel approaches for integral reduction without considering the IBP
relations like the intersection theory method [16–19] and the auxiliary mass flow method
[20–24].
     This paper focuses on our latest progress on the IBP reduction. It is well known that
1) for multi-loop Feynman integrals the IBP linear system tend to be very large with a
huge number of integrals so the linear reduction can be very difficult, 2) (for any method
of integral reduction) the analytic reduction results tend to consist of very complicated
rational functions with huge sizes. In this paper, we try to overcome these two difficulties
by

   1. applying module-intersection [25, 26] based syzygy IBP reduction method [27, 28].
      With the latest developments of the module-intersection method, we are able to
      generate a very small sparse IBP system for complicated multi-loop integral reduction
      problems. This kind of IBP system can then be solved via the state of art reduction
      computational methods [29–33] to get the finite-field reduction result or the analytic
      reduction result via modern interpolation algorithms.

   2. using the multivariate partial fraction method [34–38] to simplify the analytic IBP
      reduction result. (Certainly, it is also beneficial to use the multivariate partial frac-
      tion method to simplify the final result of analytic scattering amplitude on the level
      of the transcendental functions [39].) In this paper, we present a new massively
      parallel multivariate partial fraction package for simplifying the complicated rational
      functions, relying on parallelization via the Singular/GPI-Space framework [40].
      We observe that, in particular when an integral basis with uniform transcendental
      weight exists, the IBP reduction coefficients can be dramatically simplified. (See also
      ref. [38] for a different modern partial fraction algorithm.)

     The main development after [37] is the large-scale parallelized implementation of our
multivariate partial fraction algorithm, based on the Singular/GPI-Space framework.
The parallelization is realized not just on the level of different coefficients, but also within
the partial fraction computation of one coefficient. The feature is very useful for simpli-
fying IBP reduction coefficients (and also the coefficients of transcendental functions of
an analytic amplitude), since usually the sizes of these coefficients are quite uneven. The
internal parallelization is important to deal with complicated coefficients. Such kind of
parallelization leads to a complicated workflow for computers, and GPI-space framework
is a nature choice to handle such a workflow.
     Besides this, in this paper, we also apply new ideas of choosing an integral basis to
get factorized denominators in the reduction coefficients. Take the advantage of this, we

                                             –2–
can first determine the denominators and do a simple polynomial interpolation instead of
a rational function interpolation for an analytic IBP computation.

    To show the power of our latest method, we choose the two-loop five-point massless
Feynman integrals as a cutting-edge example. It is well known for the IBP reduction of
these integrals, due to the complicated kinematics, to be very difficult and the analytic IBP
reduction coefficients are huge. For example, the analytic IBP reduction of the numerator-
degree-5 for the so-called “double pentagon” integrals was first done by Kira 2.0 [15] with
a new type of block-triangular integral relations from the auxiliary mass flow method [22],
and a new strategy for choosing master integrals to simplify the computation and the
result. The resulting IBP coefficients are huge with a disk size of about 25 GB [15]. (The
numerator-degree-4 reduction of the same integral family was first done in [41] and the
reduction coefficients was simplified by multivariate partial fraction methods in [37, 38].)

     In this paper, we show that, alternatively, we can use the module intersection method
to get a simple and sparse IBP system for the same target integrals. Our IBP system is
then solved with our Singular/GPI-Space based semi-numeric row reduced echelon form
(RREF) computation with a large scale parallelization based on the Singular/GPI-space
framework [40], and the fully analytic IBP reduction coefficients are obtained by interpo-
lation. We choose a uniformly transcendental (UT) basis [42] to make the denominators of
the IBP reduction coefficients simple. Then with our new package of multivariate partial
fraction, again based on large-scale parallelization using the Singular/GPI-space frame-
work, we simplify the IBP reduction coefficients. The size of the final analytic result is
only about 186 MB. A compression rate of more than 100 is thus achieved, and the ana-
lytic IBP reduction result is therefore attaining a usable form. Furthermore, we observe
the clear tendency that the compression rate from partial fraction method increases with
the complexity of the coefficients.

     This paper is organized in the following way: In Section 2, we give a short account
of the current status of the module-intersection method for generating a simple IBP sys-
tem, and the IBP reduction via the interpolation process. In Section 3, we introduce our
improved Leinartas’ algorithm equipped with polynomial division and syzygy reduction
computations, for simplifying the IBP reduction coefficients via multivariate partial frac-
tioning. In Section 4, we discuss our new large-scale parallelized implementation of our
partial fraction algorithm based on the Singular/GPI-space framework. In Section 5, we
show one IBP computation example for the two-loop five-point non-planar Feynman inte-
gral with degree 5 numerators, where the very complicated analytic reduction coefficients
are compressed by more than two orders of magnitudes and put into a usable form. In
Section 6, we summarize our paper and provide some outlook. In the Appendix, we pro-
vide an the installation note and a short manual of our GPI-space based partial fractioning
program.

                                           –3–
2     IBP from module intersection and the reduction procedure

2.1    Syzygy equation and module intersection
In this section, we will introduce the module intersection method for generating IBP re-
lations. For a given integral to be reduced, the number of IBP relations generated by
the module intersection method is much smaller than that for the traditional Laporta
algorithm, thus saving lots of computational resources.
     The module intersection method is based on the Baikov representation [43, 44], where
an integral is written as
                                                                                 1
                                          Z
                                    E−D+1                        D−L−E−1
             I[α1 . . . αn ] = CEL U 2       dz1 · · · dzn P (z)    2
                                                                            α1 · · · z αn
                                                                                          , (2.1)
                                           Ω                             z1           n

where CEL is a constant, U is the Gram determinant as
                                                                         !
                                                         p1 , . . . pE
                                       P = detG                              ,                   (2.2)
                                                         p1 , . . . pE

L is loop order, E is number of independent external vectors, P (z) is the Baikov polynomial
depending on z = zi that
                                                                      !
                                        l1 , . . . lL , p1 , . . . pE
                            P = detG                                    ,               (2.3)
                                        l1 , . . . lL , p1 , . . . pE

which vanishes at the boundary ∂Ω.
    In Baikov representation, the IBP relations come from
                                           n
                                              ∂                             1
                      Z                                                                
                                           X                 D−L−E−1
                 0=        dz1 · · · dzn         ai (z)P (z)    2                       ,        (2.4)
                       Ω                     ∂zi                     z1 α1 · · · zn αn
                                           i=1

where ai ’s are some polynomials that depend on z = zi ’s. The expansion of equation (2.4)
gives an IBP relation, if the chosen ai (z)’s satisfy such condition that there exists some
polynomial b(z) that
                                  n
                               X          ∂P 
                                    ai (z)      + b(z)P = 0.                           (2.5)
                                           ∂zi
                                       i=1
                                                              ∂P             ∂P
Equation (2.5) is called a syzygy equation for generators {P, ∂z 1
                                                                   , . . . , ∂z n
                                                                                  }, where the ai (z)’s
and b(z) form a solution for this syzygy equation, and each solution corresponds to an IBP
relation. Note that traditional methods have no control on the power increase while gener-
ating IBP relations, which we want to prevent for simplicity of the calculation. The syzygy
method can nicely control the power of index while generating IBP relations, preventing
them from increasing by adding an additional constraint

                                           ai (z) = zi bi (z),   i ∈ S,                          (2.6)

where bi (z)’s are polynomials in zi ’s and S is the set of indices of which the power needs
to be prevented from increasing.

                                                      –4–
Solving equation (2.5) is not a computationally hard task. See the ref. [45] for resolving
this constrain by the canonical form of IBP relations. Here we use a Laplacian expansion
to solve this constrain [26]. Since Baikov polynomial is a determinant, that is P = det(Pij ),
we have Laplace expansion as
                                      X          ∂P
                                           Pkj        − δki P = 0.                                (2.7)
                                                 ∂Pij
                                       j

                                                       ∂P
The derivative with respect to matrix entries         ∂Pij   can be easily converted to the derivative
                   ∂P
to zi ’s, that is      via Leibniz chain rules. Then (2.7) gives solutions for the syzygy
                   ∂zi ,
equation (2.5), as a module M1 whose generators are at most linear in the zi ’s.
      The solution of (2.6) forms another module whose generators are simply ai = zi , bi = 1
for i ∈ S. The final solution for the equations form the module M1 ∩ M2 . The module
intersection can be performed by module Gröbner basis in a position-over-term ordering
[46]. While calculating the module intersection, we can treat kinetic variables, namely ci ’s,
and Baikov variables zi ’s in the same way, that is, we calculate a Gröbner basis in the
polynomial ring Q[c1 , . . . , cm , z1 , . . . , zn ]. This will make the calculation faster than taking
ci ’s as parameters (rather than variables), though resulting in a redundant generating
system. Such a redundant result can be imported back to Q(c1 , . . . , cm )[z1 , . . . , zn ]. By
checking the leading term, redundant generators can be removed. This can be done by
using the simplify command in Singular.
      Once we have solutions for (2.5) and (2.6), we can use (2.4) to generate IBP relations
whose powers for the specified indices never increase. Such an IBP system is of much
smaller numbers of IBP relations and, hence, total size. Usually, we will generate IBP
relations on spanning cuts and then assemble the full IBP system, instead of generating
the whole system directly. For more details, see [41, 46]. In this paper, we use our package
ModuleIntersection [47] to automize the module intersection and the corresponding
IBP generation.

2.2   Linear reduction of the IBP system
Once we have obtained the IBP system, we reduce it by using our own algorithm and
implementation for linear reduction ([41]). First, we find independent IBP relations by
perform the reduction setting the variables to integers and using linear algebra over finite
fields. Then, we simplify the linear system via removing the overlap between two different
cuts. Then, we use the linear system simplified by the previous steps to reduce the target
integrals to master integrals, giving the IBP reduction coefficients as a final result. Note
that, in the third step, we rely our own RREF code, which applies not only row but also
column swaps, for finding the best pivots. The column operation changes the set of master
integrals, but we can convert the new master integrals to the original ones a posteriori. We
have observed that adding such column operation makes our calculation much faster.
     For some complicated IBP reduction computations, an analytical calculation takes too
much time to be doable. Hence, we perform calculations semi-numerically in such cases.
This is to set some but usually not all variables to some constant integers and then perform

                                                   –5–
the reduction. This results in a much lower cost for the computation. After the reduction,
from the semi-analytic result we can determine the degree of the chosen-analytic variable
in each IBP coefficient (usually a rational function). Knowing the degree for each variable,
we can perform reduction on multiple semi-numeric points, to interpolate the fully analytic
IBP coefficients. The rational function interpolation algorithm is described in Appendix
A of ref. [41]. However, observations in ref. [37] can help us to convert the problem from
rational function interpolation to polynomial interpolation, with the latter one being easier.
    To be specific, as observed in ref. [37], the denominator factors of IBP coefficients only
contain symbol letters or simple polynomials in ǫ, if a UT basis is chosen as the master
integrals. (See [42, 48] for an overview of UT integrals. See [49–51] for an overview
of symbols.) With this knowledge, for determination of orders for factors in ǫ, we carry
out a reduction by setting all kinetic variables as constants and leave ǫ analytic. For
determination of power for symbol letters in the denominators, we set ǫ as a constant
and kinetic variables as generically arbitrary linear functions in an auxiliary variable x,
such that all symbol letters are distinct in this parameterization. Then we carry out a
reduction, and the orders for symbol letters in the denominators can be read by factorizing
the denominator expressions in x.
     After getting the analytic result for the denominators, we then need to interpolate the
numerators via polynomial interpolation. The degree of each variable in the numerators
can be acquired by performing the reduction after setting all but one variable as constant
integers, as previously stated. Once the degree of variables is known, interpolation of corre-
                                                                                    Q
sponding polynomial becomes a well-defined problem. One needs to accumulate (di + 1)
semi-numeric points, where di is the degree for the i-th variable to be interpolated, and
then carry out reduction on these points. The analytic expressions for these numerators
can be interpolated from these semi-numeric results.
     There is another point for simplifying the calculation which we will mention here: Since
IBP coefficients are homogeneous in the kinetic variables, we can set one of them, e.g., c1 ,
to 1, and do the steps stated above, determining the analytic result of IBP coefficients with
c1 = 1. Then we can restore the final result with c1 dependence via dimension analysis.

3       Partial fraction decomposition

The algorithm we use to reduce the size of rational functions is an improved version of
Leinartas’ algorithm for multivariate partial fraction decomposition [34, 35]. We start out
with a short account of this algorithm. For more details refer to Section 3 of our paper [37].
     Let in the following K[x1 , . . . , xd ] or short K[x] be the ring of polynomials over some
field K in d variables x = (x1 , . . . , xd ) and let K be the algebraic closure of K (e.g.
R = C). The goal is to write a rational function f /g (f, g ∈ K[x]) where the polynomial
g = q1e1 · . . . · qm
                    em factors into many small1 irreducible factors q , as a sum of functions with
                                                                     i
“smaller” numerators and denominators. The algorithm consists of 3 main steps:

    1
        In rational functions arising from IBP reductions most of the denominator factors are of degree 1.

                                                      –6–
3.1   Nullstellensatz decomposition
In the first step of the algorithm we search for relations of the form

                                      1 = h1 q1e1 + · · · + hm qm
                                                                em
                                                                                                  (3.1)

where hi ∈ K[x] are polynomials. By multiplying (3.1) with f /g, we get the decomposition
                                                m
                             f· m     hk qkek
                                P
                         f        k=1
                                                X     f · hk
                           =     Qm e i       =    Qm          ei .                 (3.2)
                         g        i=1 qi             i=1,i6=k qi
                                                            k=1

in which each denominator contains only m − 1 different irreducible factors. Now we repeat
this step with each summand in the decomposition (3.2) until we obtain a sum of rational
functions where each denominator contains only factors, that do not admit a relation as
in (3.1). By Hilbert’s weak Nullstellensatz [37, Lemma 3.6], such a relation exists if and
                                                                 d
only if the polynomials qi do not have a common zero in K and can be computed by
calculating a Gröbner basis [37, Definition 3.3] of the ideal generated by the polynomials
qiei [37, Algorithm 1].

3.2   Algebraic dependence decomposition
If the polynomials q1e1 , . . . , qm
                                   em are algebraically dependent, i.e. there exists a polynomial

p ∈ K[y1 , . . . , ym ] in m variables called an annihilating polynomial for q1e1 , . . . , qm
                                                                                             em , such
         e1          e
that p(q1 , . . . , qmm ) = 0 ∈ K[x], then we can use this equation to derive a decomposition
similar to (3.2). For this, write
                                         X
                        p = cα yα +              cβ yβ     (cα , cβ ∈ K, cα 6= 0)                 (3.3)
                                       β∈Nm
                                   deg(p)≥|β|≥|α|

such that cα yα is one of the terms of smallest degree (using multi-indices β ∈ Nm , so
yβ = y1β1 ·. . .·ym
                  βm
                     and deg(yβ ) = |β|= β1 +· · ·+βm ). Writing q for the vector (q1e1 , . . . , qm
                                                                                                   em ),

it holds
                                                  X
                      0 = p(q) ⇔ cα qα = −           cβ qβ
                                                        β
                                                X cβ qβ               m ei βi
                                                                 X cβ Y qi
                                  ⇔     1=−                   =−
                                                        cα qα      cα   qiei αi
                                                    β                 β         i=1
                                                              m
                                         f    X cβ            Y      qiei βi
                                  ⇒        =−      f               ei (αi +1)
                                                                                                  (3.4)
                                         g      cα
                                                    β         i=1 qi

and since yα has minimal degree, for each β in the sum in Equation (3.4) it holds βi ≥ αi +1
for at least one index i, i.e. the factor qi does not appear in the denominator of the corre-
sponding term and thus the denominators of the rational functions in the decomposition
each have at most m − 1 different irreducible factors.
     As with the Nullstellensatz decomposition, this step is repeated with each summand
in (3.4). This leads to a decomposition where each denominator contains only algebraically

                                                    –7–
independent factors qi , since it can be shown [37, Corollary 3.8], that polynomials q1 , . . . , qm
are algebraically dependent if and only if q1e1 , . . . , qm
                                                           em are (for any e ∈ N ).
                                                                            i      ≥1
     The problem of calculating annihilating polynomials can be reduced to the computa-
tion of the Gröbner basis of a certain ideal [37, Lemma 3.9, 3.10 and Algorithm 2]. But
there is a simpler way of determining beforehand, whether an annihilating polynomial ex-
ists: The Jacobian criterion states, that a set of polynomials {g1 , . . .   , gm }is algebraically
                                                                                 ∂gi
independent if an only if the Jacobian m × d-matrix of polynomials ∂x              j
                                                                                         has full row
                                                                                    i,j
rank over the field K(x) of rational functions [37, Lemma 3.7]. From this it also follows,
that after the algebraic dependence decomposition, in each denominator the number of
different irreducible factors is at most d (the number of variables) since an m × d-matrix
with m > d cannot have full row rank and thus any d + 1 polynomials are algebraically
dependent.

3.3   Numerator decomposition
Note that in the previous two steps, the denominators become simpler (with respect to
the number of different factors in their factorisation), but the numerators do not. In
(3.2) and (3.4) the original numerator f still appears in each summand. To also shorten
the numerators, it makes sense to do a division with remainder by the factors in the
denominator. For a rational function f /g with factorisation g = q1e1 · . . . · qm
                                                                                 em as above we

can calculate a division expression
                                                    m
                                                    X
                                         f =r+            ak q k                               (3.5)
                                                    k=1

where ai ∈ K[x] are polynomials and r ∈ K[x] is a “small” remainder. More precisely, by
making use of a Gröbner basis of the ideal I = hq1 , . . . , qm i generated by the irreducible
factors qi we can ensure, that each term of the polynomial r is not divisible by the lead
term of any element of I [37, Definition 3.4 and Algorithm 3]. We say, that r is “reduced”
with respect to I. Simply multiplying (3.5) by f /g yields
                                               m
                            f      r     X          ak
                              = Qm e i +    (e −1)
                                                                                               (3.6)
                            g    i=1 qi    q k
                                                   Q m                       ei
                                              k=1    k             i=1,i6=k qi

where the first term has a particularly small numerator and in each of the other terms one
of the factors in the denominator cancels. Thus repeatedly applying this decomposition
step results in a sum of rational functions where the numerator of any function is reduced
(as defined above) with respect to (the ideal generated by the irreducible factors of) its
denominator.
     Note that the division expression (3.5) depends on the choice of a monomial order-
ing, i.e. a total ordering on the set xα α ∈ Nd of monomials, that is compatible with
                                      

multiplication [37, Definition 3.2]. This ordering is needed to define the “lead term” of a
multivariate polynomial in the division-with-remainder algorithm. In our Singular im-
plementation we used the graded reverse lexicographic ordering [37, (3.5)] which sorts first
by the degree of the monomial.

                                               –8–
3.4   The resulting algorithm
If we do the Nullstellensatz decomposition, the algebraic dependence decomposition and
the numerator decomposition one after the other, we obtain a sum of rational functions
where each summand is of the form
                                           fS
                                        Q      bi
                                                                                  (3.7)
                                          i∈S qi
where S ⊆ {1, . . . , m} is some set of indices, bi ∈ N, fS ∈ K[x] and by the above
                                                                   d
 (1) the polynomials {qi |i ∈ S} have a common zero in K ,

 (2) the polynomials {qi |i ∈ S} are algebraically independent,

 (3) fS is reduced with respect to the ideal hqi |i ∈ Si ⊆ K[x].
(This is Theorem 3.5 in [37].)
        Since in practice, the computation of annihilating polynomials can become very slow
if the degrees of the polynomials qiei get too big, we make the following two modifications
to the algorithm. Firstly, before the algebraic dependence decomposition we insert a short
version of the numerator decomposition step described in 3.3, which only decomposes fur-
ther if the remainder r in (3.5) is zero. This eliminates some of the denominator factors
before going into the more complicated algebraic dependence decomposition (see also Re-
mark 1.2 and Algorithm 4 in [37]). Secondly, the algebraic dependence decomposition
step itself can be changed to using an annihilating polynomial for q1 , . . . , qm rather than
q1e1 , . . . , qm
                em . Instead of (3.4), we then get

                                       X cβ Y q βi   m
                                  f               i
                                    =−      f          .                                 (3.8)
                                  g      cα
                                           β
                                              qiαi +ei
                                                     i=1

Now the number of different irreducible denominator factors does not have to decrease
in every step, since it is possible, that βi < αi + ei for all i. However, if in (3.3) we
always choose α minimal with respect to the graded reverse lexicographic ordering, it can
be shown, that the algorithm still terminates (see Remark 1.3 in [37]).
    In our implementation of the final algorithm [37, Algorithm 5], we make use of the
computer algebra system Singular, which provides efficient algorithms for Gröbner basis
computations as well as polynomial factorization and division with remainder.

3.5   A simple example
To demonstrate the algorithm described in 3.4 consider the rational function
                        f             x1 + x2
                          =                             ∈ R[x1 , x2 , x3 ]               (3.9)
                        g   x1 x2 (x2 + 1)x3 (x1 − x3 )
with m = 5 denominator factors q1 = x1 , q2 = x2 , q3 = x2 + 1, q4 = x3 , q5 = x1 − x3 .
    In the first step (Nullstellensatz decomposition) we observe, that q2 and q3 have no
common zeros and find the relation 1 = 1 · q3 + (−1) · q2 . Multiplying with f /g yields
                                 f   x1 + x2       −x1 − x2
                                   =             +             .                       (3.10)
                                 g   q1 q2 q4 q5   q1 q3 q4 q5

                                               –9–
Now q1 , q2 , q4 , q5 have the common zero x1 = x2 = x3 = 0 and also q1 , q3 , q4 , q5 have a
common zero, namely x1 = x3 = 0, x2 = −1. So condition (1) is fulfilled.
    In the short numerator decomposition described in 3.4 we see that only the first nu-
merator x1 + x2 has remainder 0 when dividing by the denominator factors q1 , q2 , q4 , q5 :
x1 + x2 = 1 · q1 + 1 · q2 + 0. Thus in the first term, we can cancel q1 and q2 respectively:

                              f      1        1       −x1 − x2
                                =         +         +             .                         (3.11)
                              g   q2 q4 q5 q1 q4 q5   q1 q3 q4 q5

    Next, in the algebraic dependence decomposition, the factors in the second and third
denominator of (3.11) are found to be algebraically dependent, since 0 = q1 − q4 − q5 . Thus
1 = qq51 + −q
            q5 and multiplying this to the second and third term gives the decomposition
              4

                      f      1       1      −1     −x1 − x2 x1 + x2
                        =         +    2 +     2 +            +                             (3.12)
                      g   q2 q4 q5 q4 q5   q1 q5    q3 q4 q52   q1 q3 q52

in which all denominators consist of algebraically independent factors, so (2) is fulfilled.
     Finally, in the numerator decomposition, the first three numerators are already reduced
with respect to the denominator factors since they are just ±1. For the fourth numerator
we get the division expression −x1 − x2 = (−1) · q3 + (−1) · q4 + (−1) · q5 + 1 and for the
fifth numerator it holds x1 + x2 = 1 · q1 + 1 · q3 + (−1). Substituting this into (3.12) yields

   f      1       1       −1       −1        −1        −1        1         1       1       −1
     =         +    2  +     2 +     2  +        2 +         +       2 +     2 +     2 +
   g   q2 q4 q5 q4 q5    q1 q5   q4 q5     q3 q5     q3 q4 q5 q3 q4 q5   q3 q5   q1 q5   q1 q3 q52
          1       −1           1         −1
     =         +         +       2 +                                                         (3.13)
       q2 q4 q5 q3 q4 q5 q3 q4 q5      q1 q3 q52

which now also satisfies condition (3). Note, that for this simple example the partial fraction
decomposition (3.13) does not seem “shorter” than the original fraction f /g. However, as
described in the following, for the largest entries of the IBP matrix, this algorithm can
reduce the size of rational functions by factors of more than 100.

3.6   Parallelization
To reduce the runtime, it is of course possible to run the partial fractioning algorithm for
all entries of the IBP-matrix in parallel. However, it is often the case, that a small number
of large entries dominate and alone determine the total runtime. Therefore it makes sense
to also parallelize the PFD-algorithm itself, at least for the most difficult, i.e. the largest,
entries.
     In the implementation of the algorithm described above we work with a list D of terms
representing a sum and start with D containing only the input rational function as single
entry. Then each step of the algorithm consists of decomposing all terms in D individually
into a sum/list of terms itself (as in equations (3.2), (3.6) and (3.8)) and then replacing D
by the concatenation of these lists as well as merging terms that have the same denominator
(see lines 4, 7, 10, 13 in [37, Algorithm 5]). The first part, the decomposing of each term,
can be done in parallel over all elements of D.

                                             – 10 –
In our current implementation in the Singular library pfd.lib, this parallel version
of the partial fractioning algorithm is realized relying on multi-process computation in
Singular. In practice, this internal parallelization can greatly reduce the runtime of the
partial fractioning algorithm, especially for large entries of the IBP-matrix, see Section 5.4.

4       Parallel computing using the Singular/GPI-Space framework

The implementation of our RREF algorithm and the partial fraction decomposition algo-
rithm is part of the Singular/GPI-Space framework project [40] for massively parallel
computations in computer algebra.2 This framework combines the open source computer
algebra system Singular [52] with the workflow management system GPI-Space [53],
developed at the Fraunhofer Institute ITWM, Kaiserslautern. It originates in an effort to
realize massively parallel computations in algebraic and tropical geometry [40, 54–57].
     In the present section, we discuss the application of our approach to the partial fraction
decomposition, for our code see https://github.com/singular-gpispace/PFD. The applica-
tion to the RREF problem has been addressed in [41].

4.1       GPI-Space and Singular
GPI-Space is a workflow management system well suited for running workflows on clusters
and consists of three main components:

        - a distributed runtime system,

        - a virtual memory layer, and

        - a workflow manager.

    The distributed runtime system manages resources (such as assigning work to various
nodes in a cluster) the virtual memory layer allows processes to communicate and share
data and the workflow manager tracks the global structure and state of a workflow.
    The user specifies a program using Petri nets [58], which we define as follows:

Definition 4.1. A Petri net is a bipartite directed graph N = (P, T, F ), where

    1. P ∩ T = ∅

    2. F ⊆ (P × T ) ∪ (T × P )

P is a set of places, T is the set of transitions and F is the set of flow relations. If (p, t) ∈ F
then we say p is an input of t, and if (t, p) ∈ F we say p is an output of t.

     A Petri net is associated with a map, called a marking, which is used to determine if
a transition is enabled.

Definition 4.2. A marking is a map M : P → N≥0 . A transition t is called enabled if, for
all p such that (p, t) ∈ F , M (p) > 0.
    2
        https://www.mathematik.uni-kl.de/˜boehm/singulargpispace/

                                                 – 11 –
A Marking may be thought of as a function that counts a number of tokens on a given
place. If a Petri net equipped with a marking as an enabled transition, then that transition
may be fired. This is a process whereby the given marking M is mapped or transitioned
to a new marking M ′ , defined pointwise for all p ∈ P as

                       M ′ (p) := M (p) − |{(p, t)} ∩ F | + |{(t, p)} ∩ F |.

Firing a transition can be thought of as that transition consuming a token from each input
and placing a new token on each output.
     In its implementation, GPI-Space expands the concept of a Petri net in the sense
that tokens can be complex data structures. Transitions run code on that data and the
result usually determines the data carried by the tokens placed on the output places of the
respective transition. GPI-Space also allows the user to put conditionals on transitions:
A transition can inspect the contents of an input token without consuming it (yet) and
only fire provided some conditions are met.
     The firing of a transition should be inherently local and the firing of one transition
should not block the firing of other enabled transitions (this may, however, be realized by
using the the aforementioned conditionals).
     Our Singular library that implements the partial fraction decomposition is executed
by the C-library version of Singular and is then wrapped into a GPI-Space Petri net
that processes a list of input tokens in parallel, with GPI-Space in turn being configured
and called from the console version of Singular for transparent and convenient user
interaction.

4.2   Description of the Petri net
The partial fraction decomposition function in the PFD Singular library can be applied
to the various rational function entries of a matrix using the Petri net depicted in Figure
1. This is just an example of a Petri net realizing a parallel task using efficient workflow
management and workload balancing via GPI-Space. Note that transitions can handle
several tokens in parallel, provided there are suffiently many tokens available on the input
places.
     In the following we briefly discuss the Petri net by describing the use of the tokens on
each place and what computation each transition refers to:
     The place I holds the input token provided by the client program, which is used to
initialize the various other places. In particular, a list of identifiers for the various matrix
entries to be computed is inserted here. The place Options holds a token, which provides
information needed by the computation to be performed, such as the name of the Singular
function to be called. The path to the library where it is implemented is stored and kept
here. The tokens on Tasks each represent a matrix entry to be computed. Tokens on the
place Results represent finished computations. Placing a token on the place O signals to
GPI-Space that the Petri net has finished. The information on the token that is placed
here can be collected by Singular to determine whether the Petri net exited without errors.
     The transition Init consumes the input token provided by the client program is con-
sumed and initializes the values on Options. Furthermore the list of references to entries

                                              – 12 –
I

                         Init           Count

                                                       Finish
          Tasks        Options                            if                   O
                                                 | Results | = Count

                       Compute         Results

                     Figure 1. A basic Petri net for parallel computing.

to be computed are converted into task tokens that are placed on Tasks and the total
number of tasks to be computed is stored on Count. The transition Compute consumes
a token from Tasks and reads the values at Options. This information is then used to
call the partial fraction decomposition function for the corresponding matrix entry. Note
that for our application we simply read input and write output directly via files. Once the
computation is finished, a token is then placed on Results.
     Once the number of tokens on Results equals the number stored at Count, the tran-
sition Finish is triggered, to place an output token to be collected by the client program
on Output. Our Petri net simply places a message in form of a boolean to signal that the
Petri net exited without any errors.

5     An example for partial fraction decomposed IBP coefficients

In this section, we provide a numerator-degree-5 nonplanar massless two-loop five-point
IBP reduction and the coefficients simplification example, to demonstrate our recent de-
velopments.

5.1    Diagram and target integrals

The nonplanar two-loop five-point diagram is shown in Figure 5.1.

                                           – 13 –
2                                               4

                                                               3

                             1                                               5
              Figure 2. Two-loop five-point nonplanar “double pentagon” diagram

     All external and internal lines are massless. The kinetic conditions are 2p1 · p2 = s12 ,
2p2 · p3 = s23 , 2p3 · p4 = s34 , 2p4 · p5 = s45 and 2p1 · p5 = s15 . The propagators are

                D1 = l12 ,       D2 = (l1 − p1 )2 ,    D3 = (l1 − p12 )2 ,   D4 = l22 ,
                D5 = (l2 − p123 )2 ,       D6 = (l2 − p1234 )2 ,    D7 = (l1 − l2 )2 ,
                                                                                               (5.1)
                D8 = (l1 − l2 + p3 )2 ,       D9 = (l1 − p1234 )2 ,    D10 = (l2 − p1 )2 ,
                D11 = (l2 − p12 )2 .

Where pi...j = jk=i pk . For this diagram, a UT basis and its symbol form was given in
                 P

ref. [59, 60]. Ref. [60, 61] provided the analytic expressions for the master integrals. In
ref. [41], the analytic IBP reduction coefficients for the integrals with ISP up to the degree
4 in the sector (1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0) were calculated, with respect to both Laporta basis
and UT basis. In ref. [37], using improved Leinartas’ algorithm introduced in section 3,
the size of IBP coefficients to UT basis was decreased from 700MB to 19MB. Recently,
integrals for this double pentagon diagram with irreducible scalar product (ISP) degree 5
has been reduced to a well chosen Laporta basis in ref. [15], using equation systems given
in ref. [20–22]. In the basis choice of ref. [15], the analytic reduction coefficients have the
size ∼ 25 GB.

    In this section, we will reduce the target integrals, with ISP up to degree 5, to a UT
basis and show that the coefficients can be further greatly simplified. The target integrals

                                                      – 14 –
are

  I1,1,1,1,1,1,1,1,0,0,−5 , I1,1,1,1,1,1,1,1,0,−1,−4 , I1,1,1,1,1,1,1,1,0,−2,−3 , I1,1,1,1,1,1,1,1,0,−3,−2 ,
  I1,1,1,1,1,1,1,1,0,−4,−1 , I1,1,1,1,1,1,1,1,0,−5,0 , I1,1,1,1,1,1,1,1,−1,0,−4 , I1,1,1,1,1,1,1,1,−1,−1,−3 ,
  I1,1,1,1,1,1,1,1,−1,−2,−2 , I1,1,1,1,1,1,1,1,−1,−3,−1 , I1,1,1,1,1,1,1,1,−1,−4,0 , I1,1,1,1,1,1,1,1,−2,0,−3 ,
  I1,1,1,1,1,1,1,1,−2,−1,−2 , I1,1,1,1,1,1,1,1,−2,−2,−1 , I1,1,1,1,1,1,1,1,−2,−3,0 , I1,1,1,1,1,1,1,1,−3,0,−2 ,
  I1,1,1,1,1,1,1,1,−3,−1,−1 , I1,1,1,1,1,1,1,1,−3,−2,0 , I1,1,1,1,1,1,1,1,−4,0,−1 , I1,1,1,1,1,1,1,1,−4,−1,0 ,
  I1,1,1,1,1,1,1,1,−5,0,0 , I1,1,1,1,1,1,1,1,0,0,−4 , I1,1,1,1,1,1,1,1,0,−1,−3 , I1,1,1,1,1,1,1,1,0,−2,−2 ,
                                                                                                                  (5.2)
  I1,1,1,1,1,1,1,1,0,−3,−1 , I1,1,1,1,1,1,1,1,0,−4,0 , I1,1,1,1,1,1,1,1,−1,0,−3 , I1,1,1,1,1,1,1,1,−1,−1,−2 ,
  I1,1,1,1,1,1,1,1,−1,−2,−1 , I1,1,1,1,1,1,1,1,−1,−3,0 , I1,1,1,1,1,1,1,1,−2,0,−2 , I1,1,1,1,1,1,1,1,−2,−1,−1 ,
  I1,1,1,1,1,1,1,1,−2,−2,0 , I1,1,1,1,1,1,1,1,−3,0,−1 , I1,1,1,1,1,1,1,1,−3,−1,0 , I1,1,1,1,1,1,1,1,−4,0,0 ,
  I1,1,1,1,1,1,1,1,0,0,−3 , I1,1,1,1,1,1,1,1,0,−1,−2 , I1,1,1,1,1,1,1,1,0,−2,−1 , I1,1,1,1,1,1,1,1,0,−3,0 ,
  I1,1,1,1,1,1,1,1,−1,0,−2 , I1,1,1,1,1,1,1,1,−1,−1,−1 , I1,1,1,1,1,1,1,1,−1,−2,0 , I1,1,1,1,1,1,1,1,−2,0,−1 ,
  I1,1,1,1,1,1,1,1,−2,−1,0 , I1,1,1,1,1,1,1,1,−3,0,0 , I1,1,1,1,1,1,1,1,0,0,−2 .

The first 21 target integrals are with ISP degree 5 and the others are with the ISP degree
4 or lower.

5.2    Generating short IBP relations by module intersection
Using the syzygy and module intersection method introduced in section 2, the useful IBP
relations for reducing target intergrals in (5.2) is generated. As shown in table 1, the
number of IBP relations and corresponding size is only 17 MB, which is much smaller than
those generated by traditional Laporta algorithm. This IBP generation is done within one
hour with one core on a laptop. See also ref. [22] for generating a simple integral relation
system for the same integrals with the auxiliary mass flow method.

                               Cut           # relations        # integrals          size
                              {1,5,7}           2723               2749            1.4 MB
                              {1,5,8}           2753               2777            1.6 MB
                              {1,6,8}           2817               2822            2.1 MB
                              {2,4,8}           2918               2921            2.1 MB
                              {2,5,7}           2796               2805            1.5 MB
                              {2,6,7}           2769               2814            1.2 MB
                              {2,6,8}           2801               2821            1.6 MB
                              {3,4,7}           2742               2771            1.4 MB
                              {3,4,8}           2824               2849            1.9 MB
                              {3,6,7}           2662               2674            1.5 MB
                             {1,3,4,5}          1600               1650            0.72MB

Table 1. The IBPs system for the “double-pentagon” integrals with the numerator degree up to
5, generated on the spanning cuts, with module intersection method

                                                         – 15 –
5.3    RREF and Interpolation of analytic IBP coefficients

In this subsection we will derive the analytic IBP coefficients from numeric results. In this
case, the IBP coefficients are rational functions of s12 , s23 , s23 , s23 , s23 , ǫ and ǫ5 . Interpolat-
ing rational functions is relatively harder than polynomials, here we can use our previous
knowledge about denominator to get the analytic rational functions by determining the
denominator and interpolating the numerators, separately.
      As stated in 2, the denominator factors only contain even symbol letters or simple
polynomials in ǫ, if a UT basis is chosen as the master integrals. (The UT basis of this
integral family is given in ref. [59, 60]. ) For this example, among the factors, only one of
them, the gram determinant ǫ5 2 (since ǫ5 is a symbol letter), is quadratic, the others are
all linear in either {s12 , s23 , s23 , s23 , s23 } or ǫ, as follows

      ǫ − 1,        2ǫ − 1,           3ǫ − 1,     4ǫ − 1,        4ǫ + 1,          s12 ,   s15 ,   s15 − s23 ,
      s23 ,       s12 + s23 ,          s12 − s34 ,        s12 + s15 − s34 ,          s15 − s23 − s34 ,     s34 ,
      s23 + s34 ,          s12 − s45 ,        s23 − s45 ,        s12 + s23 − s45 ,          s12 − s15 + s23 − s45 ,
      s12 − s34 − s45 ,               s12 + s15 − s34 − s45 ,             s45 ,     s15 − s23 + s45 ,       s34 + s45 ,   (5.3)
      s215 s212   +   s223 s212   −   2s15 s23 s212   −   2s223 s34 s12   + 2s15 s23 s34 s12 −       2
                                                                                                  2s15 s45 s12
      + 2s15 s23 s45 s12 + 2s15 s34 s45 s12 + 2s23 s34 s45 s12 + s223 s234 +                      s215 s245 + s234 s245
      − 2s15 s34 s245 − 2s23 s234 s45 + 2s15 s23 s34 s45 .

In order to determine the denominator for each IBP reduction coefficient, we only need
to find the power of each factor listed in (5.3). First we derive the order for denominator
factors that depend only on ǫ, by performing IBP reduction after taking all sij as numerical
values and keeping ǫ analytic. Second, we perform IBP reduction after taking ǫ as a
numerical value, and
                                                            sij = aij x + bij ,                                           (5.4)

where x is an auxiliary variable, and aij and bij are some arbitrary numbers such that
all even symbol letter becomes distinct and irreducible polynomials in x. After the semi-
numeric reduction, we can restore the powers for denominator factors by factorizing the
denominator expressions in x. Thus we completely determine all the denominators for each
reduction coefficients.
     After knowing the analytic denominators, we can derive the numerators via a polyno-
mial interpolation. By performing RREF, keeping one of the variables analytic and others
as numeric values, we get the highest order for this variable in the numerators. Once the
highest orders are fixed for each variable, the analytic numerator can be interpolated using
finite number of numerical points. In practice, we take s12 = 1. Then we perform IBP
reduction keeping s45 analytic and ǫ, and other variables taking 307200 (according to the
highest orders for the uninterpolated variables) different sets of numerical values. After
the interpolation, we get the expressions of IBP coefficients with s12 = 1. Then we can
restore s12 in the expressions by dimension analysis.

                                                                  – 16 –
The interpolation of numerators costs 3 hours on 210 CPU cores. Restoring s12 by
dimension analysis costs 3 hours on 20 CPU cores. After all steps are done, the analytic
result, which is a 47 × 108 matrix of IBP coefficients, is in total 19.6 GB of disk space.

5.4   Reduction coefficients simplification achieved by pfd program

Our PFD implementation using the Singular/GPI-Space framework can simplify the 19.6
GB of the original reduction coefficients to 186 MB within 72 hours of parallel computation,
utilizing 350 cores and relying only on parallelization over the different IBP coefficients.
While this, of course, provides a significant speedup compared to a sequential computation,
this approach is not as efficient as one would desire since, while most of the entries require
only less then 3 hours to finish, there are also various IBP coefficients which decompose in
about 72 hours and thus dominate the overall run-time.

      Therefore, we use a second layer of parallelism as described in Section 3.6 to speed
up the decomposition of the individual difficult IBP coefficients. The most complicated
coefficient can then be simplified within 19 hours on 16 cores using multi-process computing
in Singular. A realilzation of this layer of parallelism via a Petri net can still improve the
efficiency here significantly, since it allows to spread out the handling of individual entries
beyond a single node of the cluster, and thus ressources can be utilized far more efficiently
due to better load balancing (the parallel tasks generated by a single entry again are not
equally difficult). A corresponding implementation will soon be available.

     Table 5.4 shows the size of UT basis coefficients before and after running our mul-
tivariate partial fraction algorithm, and the compression ratio of Feynman integrals with
different numerator degrees respectively.

                degree   original size   compressed size    compression ratio
                   2       73.1KB           52.4KB                1.4
                   3       23.6MB           1.69MB                14
                   4       1.19GB           20.7MB                59
                   5       18.4GB           163MB                 115
                 total     19.6GB           186MB                 108

Table 2. Compression ratio of UT basis IBP reduction coefficients of the “double-pentagon”
integral with different numerator degrees

    The compression ratio trend is shown in Figure 5.4, and clearly we see that the com-
pression ratio for IBP reduction coefficients increases with the complexity of the target
integrals.

                                            – 17 –
compression ratio

             115

            58.8

              14.

              1.4                                                               degree
                    1          2              3             4              5

Figure 3. Graphic representation of the compression ratio of UT basis IBP reduction coefficients
of the “double-pentagon” integral with different numerator degrees

    The final IBP reduction coefficients, in this example, can be downloaded from

         staff.ustc.edu.cn/~yzhphy/double_pentagon_deg5_IBP.tar.gz

as a 47 × 108 matrix where each row corresponds to an integral in (5.2) and each column
corresponds to an UT integral in the basis. (The UT basis is also provided in the link
above.)

6   Summary and outlook

In this paper, we presented the most recent developments in generating simple IBP relations
by the module-intersection method, reducing the IBP relations and simplifying the analytic
IBP reduction coefficients dramatically by our multivariate partial fraction algorithm.
     We significantly upgraded the implementation of our multivariate partial fraction algo-
rithm, powered by the Singular/GPI-Space framework for massively parallel computa-
tions in computer algebra. The parallelization is both over different target coefficients, and
also internally in one coefficient. With the GPI-Space powered approach, the paralleliza-
tion is efficient and suitable for simplifying very complicated IBP reduction coefficients
(and also rational function coefficients in an analytic amplitude).
     Besides that, we also applied the observation in ref. [37] that, using a UT basis, the
IBP reduction coefficients’ denominators consist of either simple polynomials in ǫ or the
symbol letters. With this observation, we can easily determine the denominators of all IBP
reduction coefficient and then convert the rational function interpolation computation to
a simple polynomial interpolation computation.
     As a frontier example, the numerator-degree-5 massless two-loop five-point “double-
pentagon” integrals are reduced by our method. On a UT basis, the analytic coefficients are
computed with the module-intersection IBP system and polynomial interpolation. These

                                            – 18 –
coefficients have a total size of 19.6 GB. After running our multivariate partial fraction
algorithm, the coefficients are simplified to only 186 MB. The compression ratio is more
than 100, and we see that the compression ratio increases with the size of the coefficients
(and the complexity of the Feynman integral).
     We expect that the Singular/GPI-space based PFD package, would be of great
use, not only for simplifying analytic IBP coefficients, but also for simplifying the analytic
multi-loop scattering amplitudes.
     In the future, the automation of the complete process of module-intersection IBP
generation, IBP reduction via the interpolation and the reduction coefficients simplification
by the PFD package would be our main focus. The module intersection computation itself
is automized in the package ModuleIntersection [47], however, the automatic syzygy-
type (including our module-intersection method) IBP seeding was less discussed in the
literature. This is a key point of the automation. We have made progress in this direction,
and a new package with automatic IBP seeding (to generate the minimal number of simple
IBP relations from the module intersection) will be available soon.

Acknowledgments

We acknowledge Taushif Ahmed, Christoph Dlapa, Bo Feng, Alessandro Georgoudis, Matthias
Heller, Johannes Henn, Yanqing Ma, Andreas von Manteuffel and Huaxing Zhu for very
useful discussions. The work of YZ was supported from the NSF of China through Grant
No. 11947301, 12047502 and No. 12075234. Gefördert durch die Deutsche Forschungs-
gemeinschaft (DFG) - Projektnummer 286237555 - TRR 195 (Funded by the Deutsche
Forschungsgemeinschaft (DFG, German Research Foundation) - Project-ID 286237555 -
TRR 195). The work of JB and MW was supported by Project B5 of SFB-TRR 195.
The work of LR was supported by Project A13 of SFB-TRR 195 and Potentialbereich
SymbTools - Symbolic Tools in Mathematics and their Application of the Forschungsinita-
tive Rheinland-Pfalz.

A    Instructions on how to use the massively parallel PFD application

To run a massively parallel partial fraction decomposition from the interactive Singular
shell, the user must install Singular, GPI-Space and our PFD application (which comes
with the necessary part of the Singular/GPI-Space framework). The PFD project has
been built successfully on Centos 7 and 8 and Ubuntu 18.04 LTS and 20.04 LTS. Detailed
installation instructions can be found in the Github repository of our application:
                        https://github.com/singular-gpispace/PFD
These instructions are self contained and describe the installation of all major dependencies.
    Assume that ${example ROOT} has been set to some directory accessible from all nodes
involved in the computation. Moreover assume that $PFD INSTALL DIR is the directory
where our application was installed and $SINGULAR INSTALL DIR the directory where Sin-
gular was installed (as defined in the instructions mentioned above).
    Change directory to the ${example ROOT}:

                                           – 19 –
cd $ { example_ROOT }

    The following should be present in ${example ROOT}:

   • A nodefile with the machines to use, in the simplest case the following command
     generates such a file with just the current machine:
      $ ( hostname ) > hostfile

   • Optionally: We start the GPI-Space Monitor (to do so, you need an X-Server run-
     ning) to display computations in form of a Gantt diagram. In case you do not want
     to use the monitor, you should not set in Singular the fields options.loghostfile
     and options.logport of the GPI-Space configuration token (see below). In order to
     use the GPI-Space Monitor, we need a loghostfile with the name of the machine
     running the monitor.
      $ ( hostname ) > loghostfile

     On this machine, start the monitor, specifying an open TCP port where the monitor
     will receive information from GPI-Space. The same port has to be specified in
     Singular in the field options.logport.
      $ { install_ROOT }/ gpispace / bin / gspc - monitor -- port 9876

   • The example files with input for our example, which can be generated by
      mkdir -p $ { example_ROOT }/ input
      mkdir -p $ { example_ROOT }/ results
      for r in {1..4}
      do
         for c in {1..4}
         do
            echo " x /( x *( x +1) ) " > fraction_ " $r " _ " $c " . txt
         done
      done

     These files could, for example, contain the entries of a 4x4 matrix of rational functions
     on which the partial fraction decomposition should be computed.

   • Moreover, we need a directory for temporary files used during the calculation, which
     should be accessible from all machines involved in the computation:
      mkdir $ { example_ROOT }/ temp

We start Singular, telling it where to find the library and the shared object file for the
PFD application:
SINGULARPATH = " $PFD_INSTAL L_ DI R / LIB "       $SINGULAR_ I NS T AL L _ DI R / bin / Singular

    In Singular, enter the following code boxes into the interpreter. This will

   • load the library giving access to the PFD application,

                                                – 20 –
• add information where to store temporary data (in the field options.tmpdir),

   • specify where to find the nodefile (in the field options.nodefile), and

   • set how many processes per node should be started (in the field options.procspernode,
     usually one process per core, not taking hyper-threading into account; you may have
     to adjust according to your hardware),

   • specify the address of the GPI-Space Monitor (this step is optional),

   • specify the indices of the array of files generated above which should be processed,

   • specify a base file name for the array of files, along with the path to the input files
     and the path where output files should be written,

and finally

   • start the computation.

LIB " pfd_gspc . lib " ;

configToken gc = configure_gs pc () ;

gc . options . tmpdir = " tmpdir " ;
gc . options . nodefile = " nodefile " ;
gc . options . procspernode = 8;

If the GPI-Space monitor is supposed to be used then also do:
gc . options . loghostfile = " loghostfile " ;
gc . options . logport = 9876;

Continue by setting up the list of indices:
ring r = 0 , x , lp ;

list l = list ( list (1 , 1) , list (1 , 2) , list (1 , 3) , list (1 , 4) , list (2 , 1) ,
    list (2 , 2) , list (2 , 3) , list (2 , 4) , list (3 , 1) , list (3 , 2) , list (3 , 3) ,
    list (3 , 4) , list (4 , 1) , list (4 , 2) , list (4 , 3) , list (4 , 4) ) ;

parallel_pfd ( " fraction " , l , " input " , gc , " results " ) ;

Note that in more complex environments like a cluster, one should rather specify absolute
paths to the nodefile and the temp directory. Specifying the results directory is optional,
and only necessary if the result directory should be different from the input directory.

References

 [1] K. G. Chetyrkin and F. V. Tkachov, Integration by Parts: The Algorithm to Calculate beta
     Functions in 4 Loops, Nucl. Phys. B192 (1981) 159–204.
 [2] A. Smirnov, An Algorithm to construct Grobner bases for solving integration by parts
     relations, JHEP 04 (2006) 026, [hep-ph/0602078].

                                              – 21 –
[3] S. Laporta, High precision calculation of multiloop Feynman integrals by difference equations,
     Int. J. Mod. Phys. A 15 (2000) 5087–5159, [hep-ph/0102033].
 [4] C. Anastasiou and A. Lazopoulos, Automatic integral reduction for higher order perturbative
     calculations, JHEP 07 (2004) 046, [hep-ph/0404258].
 [5] A. Smirnov, Algorithm FIRE – Feynman Integral REduction, JHEP 10 (2008) 107,
     [arXiv:0807.3243].
 [6] A. Smirnov and V. Smirnov, FIRE4, LiteRed and accompanying tools to solve integration by
     parts relations, Comput. Phys. Commun. 184 (2013) 2820–2827, [arXiv:1302.5885].
 [7] A. V. Smirnov, FIRE5: a C++ implementation of Feynman Integral REduction, Comput.
     Phys. Commun. 189 (2015) 182–191, [arXiv:1408.2372].
 [8] A. Smirnov and F. Chuharev, FIRE6: Feynman Integral REduction with Modular
     Arithmetic, arXiv:1901.07808.
 [9] P. Maierhoefer, J. Usovitsch, and P. Uwer, Kira—A Feynman integral reduction program,
     Comput. Phys. Commun. 230 (2018) 99–112, [arXiv:1705.05610].
[10] P. Maierhoefer and J. Usovitsch, Kira 1.2 Release Notes, arXiv:1812.01491.
[11] P. Maierhoefer and J. Usovitsch, Recent developments in Kira, CERN Yellow Reports:
     Monographs 3 (2020) 201–204.
[12] C. Studerus, Reduze-Feynman Integral Reduction in C++, Comput. Phys. Commun. 181
     (2010) 1293–1300, [arXiv:0912.2546].
[13] A. von Manteuffel and C. Studerus, Reduze 2 - Distributed Feynman Integral Reduction,
     arXiv:1201.4330.
[14] R. N. Lee, LiteRed 1.4: a powerful tool for reduction of multiloop integrals, J. Phys. Conf.
     Ser. 523 (2014) 012059, [arXiv:1310.1145].
[15] J. Klappert, F. Lange, P. Maierhöfer, and J. Usovitsch, Integral Reduction with Kira 2.0 and
     Finite Field Methods, arXiv:2008.06494.
[16] P. Mastrolia and S. Mizera, Feynman Integrals and Intersection Theory, JHEP 02 (2019)
     139, [arXiv:1810.03818].
[17] H. Frellesvig, F. Gasparotto, M. K. Mandal, P. Mastrolia, L. Mattiazzi, and S. Mizera,
     Vector Space of Feynman Integrals and Multivariate Intersection Numbers, Phys. Rev. Lett.
     123 (2019), no. 20 201602, [arXiv:1907.02000].
[18] H. Frellesvig, F. Gasparotto, S. Laporta, M. K. Mandal, P. Mastrolia, L. Mattiazzi, and
     S. Mizera, Decomposition of Feynman Integrals on the Maximal Cut by Intersection
     Numbers, JHEP 05 (2019) 153, [arXiv:1901.11510].
[19] H. Frellesvig, F. Gasparotto, S. Laporta, M. K. Mandal, P. Mastrolia, L. Mattiazzi, and
     S. Mizera, Decomposition of Feynman Integrals by Multivariate Intersection Numbers,
     arXiv:2008.04823.
[20] X. Liu, Y.-Q. Ma, and C.-Y. Wang, A Systematic and Efficient Method to Compute
     Multi-loop Master Integrals, Phys. Lett. B 779 (2018) 353–357, [arXiv:1711.09572].
[21] X. Liu and Y.-Q. Ma, Determining arbitrary Feynman integrals by vacuum integrals, Phys.
     Rev. D 99 (2019), no. 7 071501, [arXiv:1801.10523].

                                              – 22 –
You can also read