Segmentation of EM showers for neutrino experiments with deep graph neural networks - arXiv.org

Page created by Cody Munoz
 
CONTINUE READING
Segmentation of EM showers for neutrino experiments with deep graph neural networks - arXiv.org
Prepared for submission to JINST

 Segmentation of EM showers for neutrino experiments
 with deep graph neural networks

 V. Belavin, E. Trofimova, , ,1 A. Ustyuzhanin 
arXiv:2104.02040v4 [cs.LG] 16 Apr 2021

 Laboratory of methods for Big Data Analysis, National Research University Higher School of Economics,
 Pokrovsky Boulevard 11, Russia
 Skolkovo Institute of Science and Technology,

 Bolshoy Boulevard 30, bld. 1, Russia

 E-mail: etrofimova@hse.ru

 Abstract: We introduce a novel method for showers reconstruction from the data collected with
 electromagnetic (EM) sampling calorimeters. Such detectors are widely used in High Energy
 Physics to measure the energy and kinematics of in-going particles. In this work, we consider the
 case when a large number of particles pass through an Emulsion Cloud Chamber (ECC) brick,
 generating electromagnetic showers. This situation can be observed with long exposure times or
 large input particle flux. For example, SHiP experiment is planning to use emulsion detectors for
 dark matter search and neutrino physics investigation. The expected full flux of SHiP experiment
 is about 1020 particles over five years. Because of the high amount of in-going particles, we
 will observe a lot of overlapping showers. It makes EM showers reconstruction a challenging
 segmentation problem. Our reconstruction pipeline consists of a Graph Neural Network that
 predicts an adjacency matrix for the clustering algorithm. To improve Graph Neural Network’s
 performance, we propose a new layer type (EmulsionConv) that takes into account geometrical
 properties of shower development in ECC brick. For the clustering of overlapping showers, we
 use a modified hierarchical density-based clustering algorithm. Our method does not use any prior
 information about the incoming particles and identifies up to 82% of electromagnetic showers in
 emulsion detectors. The mean energy resolution over 17, 715 showers is 27%. The main test bench
 for the algorithm for reconstructing electromagnetic showers is going to be SND@LHC.

 1Corresponding author.
Segmentation of EM showers for neutrino experiments with deep graph neural networks - arXiv.org
Contents

1 Introduction 1

2 Related work 2

3 Experimental data 3

4 Reconstruction Algorithm 4
 4.1 Graph construction 4
 4.2 Edge Classification 5
 4.2.1 Convolution block 6
 4.3 Showers Clusterization 7

5 Experiments and Results 7
 5.1 Metrics 7
 5.2 Architecture evaluation 8
 5.3 Clusterization 9

6 Conclusion and perspectives 11

A Base-track pairs energy and likeliness estimates 12

1 Introduction

Electromagnetic (EM) showers are produced by interactions of incoming particle decay products
with the photographic plates of emulsion cloud chamber bricks (ECC) [1] (figure 1). EM showers
reconstruction allows scientists to estimate the decay point, full momentum, and energy of an
original particle. This knowledge is the starting point of lifting the veil on physics beyond the
standard model.
 The ECC has been used in the Oscillation Project with Emulsion-Tracking Apparatus (OPERA)
experiment. OPERA was taking data for five years, since 2008, and discovered muon to tau neutrino
oscillations in appearance mode [2]. The high granularity of the OPERA ECC guarantees a good
EM shower identification [1]. One of the future experiments, SHiP [3], is planning to follow the
same principle and a similar design as the OPERA experiment. An expected full flux of the particles
passing through SHiP detectors will be about 2 × 1020 protons over five years [4]. That is about
50-300 showers per brick.
 Reconstruction of 3D structures from data from emulsion detectors is complicated due to
showers overlaps (figure 3a). These overlaps make it difficult to correctly determine whether a track
belongs to a particular shower and, consequently, to restore initial particles’ properties.

 –1–
0.3 mm
 1 mm
 Micro-track

 Lead
 plate

 Base-track
 Opera Film × 57 plates Opera Film
 Lead × 56 plates
 2 emulsion layers poured
 on a plastic base

Figure 1: Sectional emulsion brick. The brick
consists of 56 lead layers and 57 plastic tracks Figure 2: Micro-track and base-track in
with nuclear photographic emulsions glued to emulsion film definition.
both sides [5].

 To recover showers in the emulsion detectors, we introduce a method based on the graph neural
networks (GNN) [6–8]. We use GNN to predict an adjacency matrix for the clustering algorithm.
One of the key motivations for using GNNs is the highly structured nature of the data associated
with the EM shower development in the detector.
 In this paper, we (1) propose a new type of layer (EmulsionConv) for GNNs that utilizes prior
knowledge of the physical problem, (2) develop an adapted version of the HDBSCAN algorithm [9],
(3) validate our pipeline on the problem of semantic segmentation of overlapping showers, i.e. of
assigning each track shower label (figure 3).
 This paper is structured as follows. In section 2 we outline the literature about EM showers
reconstruction algorithms. In section 3, we introduce the dataset used to perform experiments. In
section 4, we describe an algorithm for showers segmentation. In section 5 we present the metrics
and experimental results that demonstrate the practical viability of the proposed method.

2 Related work

Several techniques to reconstruct a single EM shower energy and position are presented in [10–12].
 In [10], the algorithms for electron shower reconstruction have been developed and are applied
to study the electron/pion separation and the shower energy measurement in an emulsion brick. The
algorithm iteratively matches each base-track, with base-tracks in the downstream films based on
specified angular and position requirements. Extrapolation of the base-track candidate is allowed
at most for three films.
 In [11], authors solve the problem of shower reconstruction in the presence of a dominated
instrumental background due to the ambient radioactivity and cosmic rays. The algorithm for
one shower reconstruction is based on prior knowledge about the initial point and distribution
direction, and utilized Boosted Decision Trees from TMVA [13] to classify all tracks as a signal
or background. For energy reconstruction, a linear regression on the number of selected tracks is
applied. The achieved Energy Resolution (see 5.1) value is 0.23 ± 0.01.

 –2–
0.6 0.6
 0.4 0.4
 0.2 0.2
 0.0 x 0.0 x
 0.2 0.2
 0.4 0.4
 0.6 0.6
 0.4 0.4
 0.2 0.2
 0.0 y 0.0 y
 0.2 0.2
 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4
 z z

(a) Unclustered showers: all tracks with the same (b) Clustered showers: tracks are colored according
color. to the ground truth shower label.

 Figure 3: EM showers in a emulsion brick.

 Similarly to [11], [12] presents an algorithm for background classification that does not rely
on the information of the shower origin. It also utilizes Boosted Decision Trees, followed by the
Conditional Random Field model for pattern identification. The achieved Energy Resolution is
0.27. This approach is similar to ours in the sense of the absence of prior information about showers
origin. However, the authors are not solving the problem of showers semantic segmentation.
 In this work, we aim to recover multiple showers in brick while achieving the same energy
resolution as one shower reconstruction algorithms. We do not have any prior knowledge about
shower origin or shower direction.

3 Experimental data

The ECC brick has a modular structure made of a sequence of lead plates interleaved with emulsion
films. It integrates the capabilities of high-precision tracking of nuclear emulsion films and the
large mass accumulated by lead material as a target [10].
 To describe a process of particle track reconstruction in a sampling brick, we need to introduce
two terms: micro-tracks and base-tracks. Micro-tracks are the segments of track reconstructed in a
single emulsion layer. Base-track is a pair of micro-tracks on sequential layers that are well aligned
(figure 2) [1]. Here and after, we are going to use the track as an alias to base-track. The EM
showers data consists of tracks X, Y, Z coordinates and XZ, YZ planes projection angles - , and
 , which fully determine the direction of the track.
 The tracks exist only inside the brick emulsion. That leads to the following constraints on tracks
coordinates: ∈ [−62500 , 62500 ], ∈ [−49500 , 49500 ]; = 1293 , ∈
{0, 1, ..., 57}.

 –3–
The dataset consists of 107 emulsion bricks uniformly distributed from 50 to 300 EM showers
per brick. That corresponds to 17, 715 showers totally. The data is generated using the FairShip
framework [14]. In this work, we consider EM showers data cleared of the background tracks.

4 Reconstruction Algorithm

Our algorithm of EM showers reconstruction consists of three steps. Firstly, we create a directed
graph based on some heuristics. We assign each track with a vertex and connect them with edges.
Secondly, we classify edges with a Graph Neural Network to predict the probability of edges
connecting tracks from the same shower. Thirdly, we use a graph and received probabilities as
weights in downstream clusterization.1

4.1 Graph construction
During the prepossessing step, we are constructing a directed graph. The vertices of the graph
correspond to the tracks recorded by the detector. To decide whether to connect or not to connect
two vertices with an edge, we introduce a distance metric defined on pairs of tracks called “integral
distance”. Integral distance is equal to the area formed by the intersection of the extrapolations of
tracks (figure 4).
 If we assume that the first track is described by the parameters 1 , 1 , 1 , 1 , 1 , and the
second – by 2 , 2 , 2 , 2 , 2 , then this distance is expressed by the following integral, which can
be taken analytically:

 ∫ 1   21
 IntDist = ( ( 2 − 1 ) − ( 1 − 2 + 2 ( 2 − 1 ))) 2 +
 2
 (4.1)
 ∫ 1   12
 ( 2 − 1 ) − ( 1 − 2 + 2 ( 2 − 1 ))) 2 
 2

 The edge is directed from a track with a smaller coordinate to a track with a larger coordinate.
For each track, only 10 outgoing and 10 incoming edges with the smallest value of the IntDist are
considered to reduce computational demands during neural network training.
 To further leverage computational costs and reduce used neural network size, we perform
feature engineering and combine features describing vertices and edges.
 Vertex features are combinations of base-track coordinates information:

 1. initial features: , , , , 
 √
  2 + 2 (sin( )+cos( ))
 2. trigonometric features: = arctan , , , , 

 Edge features include:

 1. IntDist (eq. 4.1),
 1See https://gitlab.com/lambda-hse/em_showers_segmentation

 –4–
2. IP (impact parameter) projections on the X and Y axes for base-track pairs (eq. 4.2),

 1 − 2 − ( 1 − 2 ) ∗ 2
 IP projection on the X axis = (4.2)
 1 − 2

 3. base-track pairs energy and likeliness estimates (eq. A.2).

 As a result, 5 and 6 features are added to vertex and edge descriptions, correspondingly.

 Convolution Block Binary Classification Block

 EmulsionConv Linear Linear Linear
 (10, 10) (20, 30) (30, 30) (30, 10)

 Tanh Tanh Tanh
 EdgeConv
 (Linear(20, 10))

 Dropout Dropout Dropout
 (0.3) (0.3) (0.3)

 20
 Linear
Figure 4: Integral distance (10, 1)

(slateblue area). Sigmoid

 w

 Figure 5: Edge classification neural network architecture.

4.2 Edge Classification

Tracks that are closer to the shower origin, i.e. produced in the early stage of shower development
by hard s and s, contain more information about a shower than tracks produced in the late stages
by soft particles. Thus, we need to encourage awareness of the neural network of early-stage tracks
when this network makes predictions about late-stage tracks. In other words, we need to ensure the
fast growth of the receptive field of neural network [15, 16].
 Edge classification consists of Graph Convolution Network and a dense neural network (fig-
ure 5).
 To recover the shower, we extrapolate information from the first base-tracks to the following
ones by using Graph Convolution Networks (GCN). GCN is a special type of neural network
that generalizes convolution operations on regular N-dimensional grids to the unstructured grids
described in the graph. In particular, we use EdgeConv GCN layer. EdgeConv is proposed in [17] for
the segmentation of 3D clouds. The author‘s key idea is to modify a way to compute messages. They
propose to use relative information about vertices in message step, i.e. using the following formula
to compute messages: = (ℎ , ℎ − ℎ , ), where ℎ and - latent representation of
graph vertex and edge, - differentiable function, for example, neural network.

 –5–
4.2.1 Convolution block

The Graph Convolution Network includes three components: an encoder for input graph transfor-
mation in the latent representations of each vertex and each edge, a module that performs message
passing for latent features updating, and an output module for edge classification scores computation.
 Our GCN is composed of the five blocks of EdgeConv layer and three blocks of the newly
proposed EmulsionConv layer. In EdgeConv, messages propagation from vertex to some vertex
 takes as many updates as the length of the shortest path between and . In other words, it
would take 57 (number of emulsion layers in the detector) message-passing steps for the network
from the last message passing step to take into account information about early-stage tracks when
making predictions for late-stage tracks. More generally, without additional tricks, the receptive
field of GCNs grows linearly with the number of layers, which, in combination with vanishing
gradient problem [18, 19], leads to the use of shallow networks that can not properly propagate
information in large graphs. We propose the EmulsionConv layer in which we modify the algorithm
to collect messages and update hidden representation vectors (ℎ ) of vertices for each emulsion layer
separately. EmulsionConv aims to solve the problem of the slow growth of the receptive field and
computation burden of such inefficient updates in vanilla GCN, by exploiting the tree-like structure
of electromagnetic showers. The EmulsionConv layer algorithm is summarized in Algorithm 1.

Algorithm 1 EmulsionConv algorithm
Require: graph ( , ) = { ℎ , ℎ }; , – neural networks
Ensure: updated graph ( , ) = { ℎ , ℎ }
 1: Group edges = {( , )} =1 based on unique . There would be 57 groups (for each
 
 57
 emulsion layer) { } =1
 2: for each group in increasing order of do
 3: for each ( , ) in do
 4: = (ℎ , ℎ , )
 5: end for
 6: = Agg{ } ∈ ( )
 7: ℎ = (ℎ , )
 8: end for

 The distribution of edges classes is highly imbalanced (approximately 10:1). Thus, we decided
to use focal loss [20] during training:

 FL( ) = −(1 − ) log( ), (4.3)

 where
 - estimated probability of the model for the class with label y = t;
 - focusing parameter, = 3.0.
 The output of the Graph Convolution block is passed to the fully connected layers of the Binary
Classifier (figure 5). The classifier predicts the probability of edges connecting tracks of one
shower.

 –6–
4.3 Showers Clusterization
 For a final separation of the showers, there is a need for an algorithm that can operate with large
 sparse graphs and that avoid breaking showers during the clustering.
 We introduce a modified version of the HDBSCAN [9] clustering algorithm. We call it an Edge
 weight-based spatial clustering algorithm for graph structures (EWSCAM). EWSCAM iteratively
 connects vertices with the smallest edge weight, i.e. constructs a minimum spanning tree with the
 Kruskal algorithm [21]. Before passing to the EWSCAM, edge weights are transformed in the
 following way:

 arctanh(1 − weights)
 weights = (4.4)
 weights
 The main difference between our algorithm and HDBSCAN is that in HDBSCAN, authors use
 the Prim algorithm for Minimum Spanning Tree (MST) construction, whereas in EWSCAM, the
 Kruskal algorithm is used. Both algorithms produce correct MST but connect vertices in a different
 order, leading to different linkage and condense trees. Also, the Kruskal algorithm is faster for
 sparse graphs and can be applied to directed graphs without the need to symmetrize a directed graph
 into an undirected graph.
 0.00 0.00 2000
 1400
 0.01 0.01
 1200
 0.02 1500
 Number of points

 Number of points
 1000 0.02
 0.03
value

 value

 800 0.03 1000
 0.04
 600 0.04
 0.05
 400 500
 0.06 0.05
 200
 0.07 0.06
 0 0

 Figure 6: HDBSCAN condensed tree. Figure 7: EWSCAM condensed tree.

 The difference between two clustering algorithms is illustrated in figures 6, 7 on 4 intersecting
 showers. We show condensed trees (figures 6, 7), where the encircled nodes correspond to the
 chosen clusters. To estimate the clusters persistence = 1 is introduced. parameter
 defines the value of the threshold applied to edge weights at which the child node leaves the cluster.
 One can note that EWSCAM produces more robust clustering, whereas HDBSCAN breaks showers
 into six distinct clusters.

 5 Experiments and Results

 5.1 Metrics
 To assess the quality of the EM showers separation problem solution, we define recovered, broken,
 stuck together and lost showers as follows:

 · a shower is considered to be broken if the ratio of sizes of the largest cluster to the second largest
 cluster less than 2;

 –7–
· a shower is considered to be lost if all clusters summarily contain less than 10% of the base-tracks
 from the original shower;

· a shower is considered to be recovered if one cluster contains more than 90% of its base-tracks
 and it is not broken or lost;

· a shower is considered to be stuck together if it does not fall into any of the above-listed categories.

 We assess the quality of energy prediction with energy resolution:

  
 − 
 = , (5.1)
 
 where – standard deviation.
 We use the mean absolute error to estimate the efficiency of the initial particle position and
direction reconstruction:

 1 ∑︁
 MAE( ) = | − | (5.2)
 =1

5.2 Architecture evaluation
For the ablation study, we compare our architecture that consist of 3 layers of EmulsionConv and 5
layers of EdgeConv with two less complex architectures: (1) 8 layers of EmulsionConv, (2) 8 layers
of EdgeConv.
 We use ROC-AUC (area under receiver operating characteristic curve) (eq. 5.3) metric [22] as
a proxy metric to validate different architectures, because ROC-AUC measures quality of ranking,
i.e. ensures that probabilities for edges that connect tracks from different showers are lower than the
probabilities for edges that connect tracks from the same shower. These probabilities are used as
edge weights in a graph in Section 5.3, thus the ROC-AUC metric indirectly measures the quality
of the downstream clusterization.

 Í Í 
 =1 =1 [ − ] [ − ]
 ROC − AUC = Í Í , (5.3)
 =1 =1 [ − ]

 ∈ 0, 1 - binary label, ∈ [0, 1] – generated by the classifier probability of edge to connect
tracks from the same shower.
 In our work, we compare different configurations of the data: 17715 EM showers divided into
355 bricks with 50 showers in each brick, 17715 bricks containing single showers, 89 bricks with
200 showers inside each, and realistic cases correspond to 107 bricks with 50-300 showers per brick.
For all experiments, we split datasets into the train (34%), test (33%) and validation (33%) parts.
We train neural networks for 5000 epochs. To prevent overfitting, early stopping with a patience

 –8–
parameter equals to 100 is used. For optimization we use Adam algorithm [23] with learning rate
equals to 10−3 .
 As it can be seen from Table 1, the best results are achieved with Pure EmulsionConv a mix of
EmulsionConv and EdgeConv, whereas the quality of networks entirely composed of only one type
of EdgeConv layer shows a statistically lower performance. We choose the mix of the two layers
because of the higher stability of the network during training and the higher value of ROC-AUC.

 0.295
 0.290
 Energy Resolution

 0.285
 0.280
 0.275
 0.270 threshold=0.2
 threshold=0.3
 0.265 threshold=0.4
 threshold=0.6
 0.260 threshold=0.8
 0.845 0.850 0.855 0.860 0.865 0.870 0.875
 Recovered showers
 Figure 8: Recovered showers and energy resolution trade-off in a dependence on the threshold.

 We argue that this is due to the ability of EmulsionConv to prolong information through all
emulsion layers of the detector. Thus our algorithm has a higher generalization performance than
EdgeConv, which accumulates local neighbourhood information of the feature for further recurrent
application to global feature learning.

5.3 Clusterization

The hyperparameters for the EWSCAM algorithm are chosen to maximize the percentage of re-
covered showers (figure 8). The optimal values are 4 and 0.2 for min and threshold respectively.
EWSCAM recovers approximately 82% of EM showers. The metrics for EWSCAM and HDB-
SSCAN are collected in Table 1. Presented results take into account variability introduced by
different initializations of GCN and datasets sampling. In our case, where there is a high number
of overlapping showers per brick, we make the Recovered Showers metric the most reliable one.
 Further analysis is performed on recovered showers. To separate recovered showers from
other types of showers, we use XGBoost Classifier [24]. figure 9 represents average results for
the three-fold cross-validation receiver operating characteristic (ROC) and precision-recall (Pr-R)
curves. ROC curve illustrates the diagnostic capabilities of the binary classifier by plotting the
true positive rate (TPR) (eq. 5.4) versus the false positive rate (FPR) (eq. 5.5) at various threshold
settings. The precision-recall curve shows the trade-off between precision (eq. 5.6) and recall, i.e.
TPR, for different thresholds. The blue area surrounding the graphs corresponds to the deviation of

 –9–
the three-fold cross-validation results. The area under the ROC curve (AUC) and average precision
 metrics are equal to 84% and 95%, correspondingly.

 TPR = , (5.4)
 + 
 where TP (true positive) corresponds to selected signal, FN (false negative) – rejected signal.

 FPR = , (5.5)
 + 
 where FP (false positive) corresponds to selected background, TN (true negative) – rejected
 background.

 Precision = . (5.6)
 + 

 ROC curve Precision-Recall Curve
 1.0 1.0
TPR (Signal efficiency)

 0.8 0.9
 Precision (Purity)

 0.6 0.8

 0.4 0.7

 0.2 0.6
 0.0 AUC = 0.836 0.5 AP = 0.952
 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
 FPR (Background efficiency) Recall (efficiency)
 Figure 9: Average classification metrics for clusters classification.

 To predict a shower’s energy, we use Huber Regression [25], which is robust to outliers. We
 provide the length of the predicted cluster and the estimation of z coordinate as the features. In
 order to make data to distribute uniformly in the interval [0, 1], we use box-cox [26] and quantile
 transformations with a window size = 0.21. To assess the statistical significance of the results, we
 use the bootstrap [27].
 Figure 10 illustrates the dependency of mean energy reconstruction errors on true values of
 energies. As we can see, with the increase of ground truth energy, the reconstruction resolution
 steadily improves as we would expect. However, after ∼4 GeV ER starts to increase slowly. We
 argue that this happens because, with the increase of the shower energy, the shower’s length also
 increases. Thus, the high-energy showers generated randomly in the first half of the brick are not
 fully contained inside the detector. So the degradation of shower reconstruction that we observe is
 likely connected to this border effect.

 – 10 –
An average energy resolution for a realistic case of 50-300 showers per brick is equal to
 ∼ 0.27 ± 0.005. The mean relative deviation of the predicted energy from the true energy value is
 close to zero (figure 11).

 Table 1: Comparison of clusterization algorithms (average three-fold cross-validation results).

 EWSCAM HDBSCAN
 Our
 Network
 Pure Edge Pure Emulsion Mix Edge+Emulsion
 Metric
 Recovered Showers, % 76.76 ± 1.19 81.05 ± 1.03 81.77 ± 1.75 71.23 ± 6.49
 Stuck Showers, % 16.67 ± 2.21 11.24 ± 1.31 11.89 ± 3.08 17.17 ± 5.56
 Broken Showers, % 1.09 ± 0.06 1.11 ± 0.07 1.17 ± 0.21 6.76 ± 2.64
 Lost Showers, % 5.48 ± 1.20 6.61 ± 0.45 5.18 ± 1.61 4.84 ± 1.77
 , 0.3104 ± 0.0006 0.3092 ± 0.0022 0.3107 ± 0.0010 0.3087 ± 0.0028
 , 0.2436 ± 0.0014 0.2448 ± 0.0016 0.2432 ± 0.0019 0.2431 ± 0.0022
 , 0.2083 ± 0.0021 0.2076 ± 0.0022 0.2070 ± 0.0023 0.2050 ± 0.0027
 , rad 0.0126 ± 0.0004 0.0121 ± 0.0003 0.0136 ± 0.0013 0.0139 ± 0.0019
 , rad 0.0127 ± 0.0003 0.0122 ± 0.0005 0.0131 ± 0.0008 0.0131 ± 0.0008

 0.45 400 50-300 showers
 50-300 showers
 200 showers 350
 0.40
 50 showers 300
 0.35 1 shower
 250
Mean ER

 0.30 1 shower [11]
 Count

 200
 0.25
 150
 0.20
 100
 0.15 50
 0 5 10 15 20 01 0 1 2
 True energy (GeV) (E_pred - E_true)/E_true
 Figure 10: Energy resolution (ER) vs number of showers Figure 11: Relative deviation of the
 per brick. Line corresponds to mean ER value; area near predicted shower energy value from
 the line corresponds to standard deviation of ER value. the true value.

 6 Conclusion and perspectives

 We propose a novel approach for the shower semantic segmentation problem. Our key contribution
 is a new layer type that can be used in end-to-end graph deep learning pipelines. We observe

 – 11 –
a statistically significant performance boost in comparison with the state-of-the-art EdgeConv
layer [17] on the problem of showers semantic segmentation on the dataset generated with FairSHIP
software [14]. Experiments have shown that the algorithm can detect up to 82% of showers with
∼ 0.27 ± 0.005 energy resolution, which is on par with prior works on EM showers reconstruction
while outperforming them in two key aspects:

 • it is capable of detecting multiple showers and separating them in cases of overlaps;

 • it does not use any prior information on showers origin, which simplifies the analysis pipeline
 and reduces the costs of experimental setup by, for example, allowing neglecting Changeable
 Sheets that were used in OPERA [11], to estimate shower origin position.

 We believe that our approach can be of interest for other physical experiments that are using
sampling calorimeters or detectors that have similar data representation, i.e. 3D point clouds.
One of the principal test benches for the proposed EM showers reconstruction algorithm could
be the SND@LHC [28]. SND@LHC is a proposed, compact and self-contained experiment for
measurements with neutrinos produced at the Large Hadron Collider in the as yet unexplored region
of pseudo-rapidity 7.2 < < 8.6.
 We speculate that those possible uses are not limited to the sampling calorimeters and can
be used to analyse tracks data from Time Projection Chamber [29], and Silicon Tracker [30]. For
future work, we are going to investigate usage perspectives for other detector types.

Acknowledgments

We would like to express our appreciation to Giovanni De Lellis, Denis Derkach and Fedor Ratnikov
for the invaluable comments and support.
 The reported study utilized the supercomputer resources of the National Research University
Higher School of Economics. The research leading to these results has received funding from the
Russian Science Foundation under grant agreement n◦ 19-71-30020.

A Base-track pairs energy and likeliness estimates

The energy and likeliness features are estimated with Molière’s formulas of multiple scattering [31].
The formulas of multiple scattering states that for tracks pairs with the parameters , , , , ,
   2  1/2
where = 1, 2, difference in the spatial angle (Δ = (Δ ) 2 − Δ ) and change in z
coordinate (Δ ) could be described by the following distribution:

 2
 Δ 2
   
 2Δ 2 2 Δ 
 (Δ , Δ ) = 2 exp − 2 , h i = Δ = , (A.1)
 h i h i 0

 where Es = 21 MeV - critical energy, X0 = 5000 mm - radiation length [32], - relative to the
speed of light object velocity.
 The energy and likeliness estimates features are found by maximizing following likeliness
function:

 – 12 –
 (Δ , Δ ) (Δ , ) (Δ , ) (Δ , ) (Δ , ) → max, (A.2)
 
 where , - changes in spatial angle projections (i.e., Δ , Δ ) and changes is spatial
deviation (Δ , Δ ), correspondingly. , follow Gaussian distribution.

References
 [1] R. Acquafredda et al., The OPERA experiment in the CERN to Gran Sasso neutrino beam, JINST 4
 (2009) P04018.
 [2] N. Agafonova et al., Final Results of the OPERA Experiment on Appearance in the CNGS Neutrino
 Beam, PRL 120 (2018) 211801.
 [3] W. M. Bonivento, The SHiP experiment at CERN, Journal of Physics: Conference Series 878 (2017)
 1 012059.
 [4] O. Lantwin, Search for new physics with the SHiP experiment at CERN, PoS EPS-HEP2017 304
 (2017) 7 [hep-ex/1710.03277].
 [5] S. Dmitrievsky, Status of the OPERA Neutrino Oscillation Experiment, Acta Physica Polonica B. 41
 (2010)
 [6] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang and P. S. Yu, A Comprehensive Survey on Graph Neural
 Networks," in IEEE Transactions on Neural Networks and Learning Systems (2020) 2978386.
 [7] K. Xu, W. Hu, J. Leskovec, St. Jegelka, How Powerful are Graph Neural Networks?, CoRR (2018)
 [cs.LG/1810.00826].
 [8] J. Zhou, G. Cui,Zh. Zhang, Ch. Yang, Zh. Liu, L. Wang, Ch. Li, M. Sun , Graph Neural Networks: A
 Review of Methods and Applications CoRR (2018) [cs.LG/1812.08434].
 [9] L. McInnes and J. Healy, Accelerated Hierarchical Density Clustering, 2017 IEEE International
 Conference on Data Mining Workshops (ICDMW) (2017) pp. 33-42 [stat.ML/1705.07321].
[10] L. Arrabito, D. Autiero, C. Bozza, S. Buontempo, Y. Caffari, L. Consiglio, M. Cozzi, N. D’Ambrosio,
 G. De Lellis and M. De Serio, Electron/pion separation with an Emulsion Cloud Chamber by using a
 Neural Network, Journal of Instrumentation 2 (2007) P02001.
[11] B. Hosseini, Search for Tau Neutrinos in the −→ Decay Channel in the OPERA Experiment
 (2015).
[12] A. Ustyuzhanin, S. Shirobokov, V. Belavin and A. Filatov, Machine-Learning techniques for
 electromagnetic showers identification in OPERA datasets, ACAT 2017 conference proceedings
 (2017).
[13] CERN The Toolkit for Multivariate Data Analysis with ROOT (TMVA), CERN-OPEN-2007-007
 [physics/0703039].
[14] GitHub, Inc. Fairship url: https://github.com/ShipSoft/FairShip, accessed: 2019-11-07.
[15] A. Araujo, W. Norris, J. Sim, Computing Receptive Fields of Convolutional Neural Networks, Distill
 11 (2019).
[16] W. Luo, Y. Li, R. Urtasun, R. Zemel, Understanding the effective receptive field in deep convolutional
 neural networks, Advances in neural information processing systems (2016) [cs.CV/1701.04128 ].

 – 13 –
[17] Y. Wang, Y. Sun, Z. Liu, S.E. Sarma, M.M. Bronstein and J.M. Solomon, Dynamic Graph CNN for
 Learning on Point Clouds, ACM Transactions on Graphics (TOG) 38 (2019), [cs.CV/1801.07829].
[18] Z. Liu, Ch. Chen, L. Li et al., Geniepath: Graph neural networks with adaptive receptive paths,
 Proceedings of the AAAI Conference on Artificial Intelligence 33 (2019) [cs.LG/1802.00910].
[19] G. Li, M. Muller, Al. Thabet, B. Ghanem, Deepgcns: Can GCNs go as deep as CNNs?, Proceedings
 of the IEEE International Conference on Computer Vision (2019) [cs.CV/1904.03751].
[20] T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, Focal Loss for Dense Object Detection,
 Facebook AI Research (FAIR) (2018) [cs.CV/1708.02002v2].
[21] J. B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem,
 Proceedings of the American Mathematical Society. 7 (1) (1956) S0002-9939-1956-0078686-7.
[22] T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters 26 (2006)
 j.patrec.2005.10.010.
[23] P. Kingma Diederik, B. Jimmy, Adam: A Method for Stochastic Optimization (2014)
 [cs.LG/1412.6980].
[24] GitHub, Inc. XGBoost url: https://https://github.com/dmlc/xgboost, accessed: 2019-12-17.
[25] P. J. Huber, Robust Estimation of a Location Parameter, Annals of Mathematical Statistics 35 (1)
 (1964) p. 73–101 1177703732.
[26] G. E. P. Box, D. R. Cox, An analysis of transformations, Journal of the Royal Statistical Society
 (1964) Series B. 26 (2): 211–252. JSTOR 2984418. MR 0192611.
[27] B. Efron, R. Tibshirani, An Introduction to the Bootstrap (1993) Boca Raton, FL: Chapman &
 Hall/CRC. ISBN 0-412-04231-2.
[28] C. Ahdida, R. Albanese, A. Alexandrov et al., SND@LHC - Scattering and Neutrino Detector at the
 LHC, CERN, Geneva, CERN-LHCC-2021-003. LHCC-P-016, (2021) url:
 https://cds.cern.ch/record/2750060, accessed: 2021-03-13.
[29] R. Acciarri et al., Summary of the Second Workshop on Liquid Argon Time Projection Chamber
 Research and Development in the United States, Journal of Instrumentation 10 (7) (2015)
 [physics.ins-det/1504.05608].
[30] M. Tobin, The LHCb Silicon Tracker, Nuclear Instruments and Methods in Physics Research Section
 A: Accelerators, Spectrometers, Detectors and Associated Equipment (2016), p. 174-180
 j.nima.2005.03.113.
[31] H A. Bethe, Moliere’s Theory of Multiple Scattering (1953) p. 1256—1266 [hep-ph/1204.3675].
[32] De Angelis, A., Pimenta, M., Introduction to Particle and Astroparticle Physics, Undergraduate
 Lecture Notes in Physics (2018) 978-3-319-78181-5

 – 14 –
You can also read