THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...

Page created by Dave Simmons
 
CONTINUE READING
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
The Python Software
Environment in KM3NeT

Johannes Schumann (Speaker),
Tamas Gal
on behalf of the KM3NeT Collaboration
PyHEP Conference 2021-07-08
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
KM3NeT
● Water Cherenkov detector infrastructure
  in the Mediterranean sea (more than 2km depth)
   Oscillation Research with Cosmics in the Abyss
    –   dense instrumentation for few-GeV atmospheric ν
    –   determine neutrino mass hierarchy
    –   effective target volume ~ 6 Mm3

   Astroparticle Research with Cosmics in the Abyss
    –   sparse instrumentation for TeV-PeV cosmic ν
    –   discover high-energetic astrophysical neutrino sources                   [1] KM3NeT LoI

● Challenging task to build, operate and scientifically exploit the
  detector
        –   Large (uneven) datasets need to be processed (on different levels)
        –   Challenge: Minimize obstacles to access data from multiple languages and
            platforms
PyHEP 21 - 2021-07-05 – Johannes Schumann                                                   2
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
Detector Setup                                                               Artist’s impression
                                                                                of KM3NeT/ORCA

                                                           ν     l

                                                                      π
                                                                       e-
Sources                                                               µ-
                                                                     π

                                                                                           [5] P2O LoI

                                                                                              12 PMTs

                                                                               17”
                       [2]                     [3]                   [4]                      19 PMTs
Atmospheric                  Active Galactic
                                                     Supernova                        [1] KM3NeT LoI
                                  Nuclei
                                                                            Digital Optical Module
                                                                                 31 x 3” PMTs
   PyHEP 21 - 2021-07-05 – Johannes Schumann                                                         3
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
Computing and Python Environment
                                Shore Station               Computing Center
                                           C++ /
 ν    l                                    tier-based
          π0                               processing chain
           µ- -
            e                                                                    .root
          π+                                Data Writeout
                                                                                    km3io

                                                     Microservices
                                                                            DB
                                                                            km3db
                                                            km3services
       Online
      Monitoring

                                                                                            cca004$> _
                  km3mon
                                                               ● km3pipe
                                                               ● km3cuts
                                                               ● km3flux
                                                               ● km3astro
                                                               ● ...
PyHEP 21 - 2021-07-05 – Johannes Schumann                                                                4
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
C++ & PyROOT in KM3NeT
● Main KM3NeT codes (trigger, calibration, reconstruction) are C++
● ROOT6/PyROOT/cppyy allows use of the C++ codebase in Python
● For analysis & development: offline data format ROOT Classes
  designed with Python usage in mind from the start, e.g. printing.
● Offline framework that supports both C++ and Python user code
       –    providing e.g. user-friendly event-file reading
       –    C++ out of the box, with some pythonizations.
       –    Freedom to choose where to use Python and C++
● ‘low level’ C++ with ‘high-level’ Python scripting commonly used:
   e.g. summary file creation, astronomy searches and cascade
  reconstruction.

PyHEP 21 - 2021-07-05 – Johannes Schumann                             5
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
PyROOT perspective

PyHEP 21 - 2021-07-05 – Johannes Schumann
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
km3io
● km3io provides uproot/awkward front-end in order to provide KM3NeT data access
  w/o (Py)ROOT
● Main dev: Tamás Gál
● Source: https://github.com/KM3NeT/km3io
● Standard KM3NeT file formats are ROOT based with custom classes
       –    “Online”: detector DAQ write out format
       –    “Offline”: format MC simulation and event reconstruction data

● Reading of data files was previously only given via PyROOT bindings
       –    but ROOT installation is needed
       –    PyROOT is slow and requires more memory compared to uproot

● Optimised iterator behaviour to allow combined lazy-readings of multiple branches
● Individual number of particles and photosensor readings (hits)
  leads to uneven data structure → perfect match with awkward arrays

PyHEP 21 - 2021-07-05 – Johannes Schumann                                         7
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
June 22, 2021
[ ]:

   ]: import uproot
 [1]:
 [
      import km3io
 [ ]:
    uproot perspective
[12]: from km3net_testdata import data_path
 [ ]: data_fname = data_path("offline/km3net_offline.root")

[18]:
 [ ]: f = uproot.open(data_fname)

 [ ]: f["E"].show()
[19]:

[ ]: name                 | typename                 | interpretation
     ---------------------+--------------------------+-------------------------------
[ ]: Evt                  | Evt                      | AsGroup(u4')
     Evt/AAObject/usr_… | vector             | AsGroup(i4')
     Evt/det_id           | int32_t                  | AsDtype('>i4')
[ ]: Evt/mc_id            | int32_t                  | AsDtype('>i4')

[19]: f["E"].show()
                                      ...

                                              1
     name                      | typename                 | interpretation
     ---------------------+--------------------------+-------------------------------
     Evt                       | Evt                      | AsGroup(u4')                   8
     Evt/AAObject/usr_… | vector                  | AsGroup(
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
June 22, 2021| AsObjects(AsArray(True, Fal…
      Evt/mc_trks/mc_tr… | std::vector*
      Evt/mc_trks/mc_tr… | int32_t[]                   | AsJagged(AsDtype('>i4'))
 [1]: Evt/mc_trks/mc_tr…
       import uproot     | int32_t[]                   | AsJagged(AsDtype('>i4'))
      Evt/mc_trks/mc_tr…
       import km3io      | std::vector*        | AsObjects(AsArray(True, Fal…
      Evt/mc_trks/mc_tr… | std::vector*       | AsObjects(AsArray(True, Fal…

     uproot perspective
[12]: Evt/mc_trks/mc_tr… | std::vector*
       from km3net_testdata  import data_path          | AsObjects(AsArray(True, Fal…
      Evt/mc_trks/mc_tr… | std::string*                | AsObjects(AsArray(True, Fal…
       data_fname = data_path("offline/km3net_offline.root")
      Evt/comment           | TString                    | AsStrings()
      Evt/index             | int32_t
[18]: f = uproot.open(data_fname)                        | AsDtype('>i4')
      Evt/flags             | int32_t                    | AsDtype('>i4')
 [ ]:
[20]: f["E/Evt/hits"].keys()
 [ ]:
[20]: ['hits.id',
        'hits.dom_id',             ● Hit data stored in a general purpose class
 [ ]:
        'hits.channel_id',
        'hits.tdc',                  for DAQ and simulated hits
 [ ]:
        'hits.tot',
        'hits.trig',                          amplitude, pure amplitude, pure time,
                                               –
 [ ]:
        'hits.pmt_id',                        etc. are MC data values
 [ ]: 'hits.t',
        'hits.a',
 [ ]: 'hits.pos.x',
        'hits.pos.y',
 [ ]: 'hits.pos.z',
        'hits.dir.x',
        'hits.dir.y',
[19]: f["E"].show()
        'hits.dir.z',
      name
        'hits.pure_t',          | typename                 | interpretation
      ---------------------+--------------------------+-------------------------------
        'hits.pure_a',
      Evt
        'hits.type',            | Evt                      | AsGroup(u4')                   9
      Evt/AAObject/usr_… | vector                  | AsGroup(
THE PYTHON SOFTWARE ENVIRONMENT IN KM3NET - JOHANNES SCHUMANN (SPEAKER), TAMAS GAL ON BEHALF OF THE KM3NET COLLABORATION PYHEP CONFERENCE ...
't',                             June 22, 2021
       'tdc',
       'pos_x',
 [1]: import  uproot
       'pos_y',
      import  km3io
       'pos_z',
       'dir_x',
    uproot perspective
[12]: from km3net_testdata import data_path
       'dir_y',
      data_fname = data_path("offline/km3net_offline.root")
       'dir_z',
       'tot',
[18]: f = uproot.open(data_fname)
       'trig']
 [ ]:
[36]: evts = r.events[:3]
      evts.hits.channel_id[0,:5]
 [ ]:
[36]: 
 [ ]:
[ ]:
[ ]:
[ ]:
[ ]:

 [ ]:

 [ ]:

 [ ]:

[19]: f["E"].show()

     name                      | typename                 | interpretation
     ---------------------+--------------------------+-------------------------------
     Evt                       | Evt                      | AsGroup(u4')                   10
     Evt/AAObject/usr_… | vector                  | AsGroup(
km3io perspective
[22]: r = km3io.OfflineReader(data_fname)

[43]: print(r.events.keys())

     {'n_mc_tracks', 'det_id', 'n_hits', 'mc_run_id', 'w2list', 'trigger_mask', 'w',
     'flags', 'mc_id', 't_sec', 'tracks', 'mc_tracks', 'trigger_counter',
     'frame_index', 'mc_hits', 'index', 'trks', 't_ns', 'mc_trks', 'w3list',
     'comment', 'run_id', 'n_trks', 'n_mc_trks', 'id', 'usr_names', 'n_tracks',
     'overlays', 'n_mc_hits', 'hits', 'mc_t'}

[45]: r.events.hits.fields

[45]: ['id',
       'channel_id',
       'dom_id',                               Only relevant fields are
       't',
       'tdc',                                  accessible →compare
       'pos_x',                                 a, pure_a, pure_t, ...
       'pos_y',
       'pos_z',
       'dir_x',
       'dir_y',
       'dir_z',
       'tot',
       'trig']

[36]: evts = r.events[:3]
      evts.hits.channel_id[0,:5]

[36]: 

   PyHEP 21 - 2021-07-05 – Johannes Schumann
[ ]:                                                                                   11
km3pipe
● Multi-purpose framework based on the thepipe project
● Main devs: Tamás Gál, Johannes Schumann
● Source: https://github.com/KM3NeT/km3pipe
● Focus on pipeline workflow
● Interoperability functions to all relevant detector interfaces (also by
  utilising other km3py packages, e.g. km3io)
       –    Detector data (ROOT / ASCII / custom binary formats)
       –    DAQ network interface
       –    Database
● HDF5 output → Conversion between different file formats
● Benchmark tools: timers & performance statistics
● High performance computing → create and submit scripts to TORQUE
● Provenance tracking
PyHEP 21 - 2021-07-05 – Johannes Schumann                                   12
km3pipe
● Set up a simple pipline:
                    0.0.2   Setup the pipeline

               [18]: pipe = km3pipe.Pipeline()

                [ ]: pipe.attach(km3pipe.io.online.EventPump, filename=data_fname)
                     pipe.attach(km3modules.common.StatusBar, every=25)
                     pipe.attach(km3pipe.calib.Calibration, filename=calib_fname)
                     pipe.attach(EventHits)
                     pipe.attach(EventHitsStatistic)
                     pipe.attach(km3pipe.io.hdf5.HDF5Sink, filename="output.h5")

               [20]: pipe.drain()

                         Pipeline and module initialisation took 0.851s (CPU 0.498s).
                         Number of Hits: 96
                         Number of Hits: 124
                         Number of Hits: 78
                         ================================[ . ]================================
                         Mean number of hits: 99.33333333333333
                         2021-06-21 22:07:53 ++
                         km3pipe.io.hdf5.HDF5Sink.HDF5Sink: HDF5 file written to: output.h5
                         ============================================================
                         3 cycles drained in 0.953132s (CPU 0.602786s). Memory peak: 247.59 MB
                           wall mean: 0.029388s medi: 0.019961s min: 0.016951s max: 0.051252s     std:
                         0.015509s
PyHEP 21 - 2021-07-05 – Johannes Schumann
                           CPU    mean: 0.030253s medi: 0.020248s min: 0.017112s max: 0.053400s   std: 13
WARNING Could not find setup.py for directory
     /home/johannes/.pyenv/versions/3.9.5/lib/python3.9/site-packages (tried all
     parent directories)
     2021-06-21 21:59:19 johannes-t480 pip._internal.vcs.versioncontrol[2283989]
     WARNING Could not find setup.py for directory
     /home/johannes/.pyenv/versions/3.9.5/lib/python3.9/site-packages (tried all
km3pipe
     parent directories)

● Custom    modules:
   0.0.1 Prepare a custom module

[6]: def EventHits(blob):
         hits = blob["Hits"]
         print("Number of Hits: {}".format(len(hits)))
         return blob

      class EventHitsStatistic(km3pipe.Module):
          def configure(self):
              self._hit_numbers = []

           def process(self, blob):
               hits = blob["Hits"]
               no_of_hits = len(hits)
               self._hit_numbers.append(no_of_hits)
               return blob

           def finish(self):
               mean_no_hits = np.mean(self._hit_numbers)
               print("Mean number of hits: {}".format(mean_no_hits))

PyHEP 21 - 2021-07-05 – Johannes Schumann                                          14
[4]: pipe = km3pipe.Pipeline()

                  [5]: pipe.attach(km3pipe.io.online.EventPump, filename=data_fname)
                       pipe.attach(km3modules.common.StatusBar, every=25)
                       pipe.attach(km3pipe.calib.Calibration, filename=calib_fname)
                       pipe.attach(EventHits)
km3pipe                pipe.attach(EventHitsStatistic)
                       pipe.attach(km3pipe.io.hdf5.HDF5Sink, filename="output.h5")

● Run the pipeline:
             ++ Detector:           Parsing the DETX header
                       ++ Detector: Reading PMT information…
                       ++ Detector: Done.

                  [6]: pipe.drain()

                       Pipeline and module initialisation took 2.011s (CPU 1.987s).
                       Number of Hits: 96
                       Number of Hits: 124
                       Number of Hits: 78
                       ================================[ . ]================================
                       Mean number of hits: 99.33333333333333
                       2021-06-22 15:39:30 ++
                       km3pipe.io.hdf5.HDF5Sink.HDF5Sink: HDF5 file written to: output.h5
                       ============================================================
                       3 cycles drained in 2.861304s (CPU 2.829904s). Memory peak: 241.05 MB
                         wall mean: 0.278396s medi: 0.018987s min: 0.018449s max: 0.797751s    std:
                       0.367240s
                         CPU   mean: 0.275658s medi: 0.019116s min: 0.018727s max: 0.789131s   std:
                       0.363080s

                  [6]: Blob([('EventPump', None),
                             ('StatusBar', None),
                             ('Calibration', None),
                             ('EventHitsStatistic', None),
                             ('HDF5Sink', None)])

PyHEP 21 - 2021-07-05 – Johannes Schumann                                                             15
km3pipe
 ● Data provenance information:
[14]: print(km3pipe.Provenance().as_json(indent=2))

     [
         {
           "uuid": "d6b77a12-979b-4ff9-9263-9af00482e5b0",
           "name": "pipeline",
           "parent_activity": "5ef0be78-85d8-4efd-8605-38101689b0ff",
           "child_activities": [],
           "start": {
              "time_utc": "2021-06-22T13:39:27.878522+00:00",
              "peak_memory": 217.5
           },
           "stop": {
              "time_utc": "2021-06-22T13:39:30.752623+00:00",
              "peak_memory": 241.046875
           },
           "system": {
              "thepipe_version": "1.3.5",
              "executable": "/home/johannes/.pyenv/versions/3.9.5/bin/python",
              "arguments": [
                "/home/johannes/.pyenv/versions/3.9.5/lib/python3.9/site-
     packages/ipykernel_launcher.py",
                "-f",– Johannes Schumann
 PyHEP 21 - 2021-07-05                                                                     16
                "/home/johannes/.local/share/jupyter/runtime/kernel-c70b0c7e-64ce-4718-b
km3buu
● Python based wrapper for the GiBUU neutrino generator [6]
● Main devs: Johannes Schumann
● GiBUU Overview:
       –    Monolithic application in FORTRAN90
       –    Factorized νN interaction model:
             ●
               Primary interaction: Relativistic Fermi Gas with SUSA
               potential
             ●
               Final State Interactions: Propagation of phase space
               densities using Boltzmann-Uehling-Uhlenbeck-Equation
       –    Binary output in ROOT file format → parsed using uproot
● KM3BUU uses GiBUU inside of container distributed via the
  KM3NeT docker server
● Write out to km3net data format is optional → requires PyROOT
PyHEP 21 - 2021-07-05 – Johannes Schumann                              17
km3buu
● Setup simulation configuration (jobcard) and run it:

PyHEP 21 - 2021-07-05 – Johannes Schumann                18
km3buu
● Setup simulation configuration (jobcard) and run it:

PyHEP 21 - 2021-07-05 – Johannes Schumann                19
km3buu
● KM3NeT data format write out:

PyHEP 21 - 2021-07-05 – Johannes Schumann   20
km3buu
● KM3NeT data format write out:

PyHEP 21 - 2021-07-05 – Johannes Schumann   21
Untitled

    km3services                          June 28, 2021                                                   Can be run on
                                                                                                         server or locally

    ● Microservices API, e.g. for calculating oscillation probabilities:
[1]: import numpy as np
     import matplotlib.pyplot as plt

[2]: from km3services.oscprob import OscProb                                          Docker container
     oscprob = OscProb()
                                                                                      Numpy                  Library,
[60]: n = 1000                                                                        arrays                 e.g.
      energies = np.logspace(-2, 1, n)                                                                      OscProb
      cos_zenith = 0                                                                   REST API
      nue_pdgid = 12
      numu_pdgid = 14
      nutau_pdgid = 16

[61]: prob_ee = oscprob.oscillationprobabilities(nue_pdgid, nue_pdgid, energies,␣
       ,→cos_zenith)                                                                      JSON
      prob_em = oscprob.oscillationprobabilities(nue_pdgid, numu_pdgid, energies,␣
       ,→cos_zenith)

      prob_et = oscprob.oscillationprobabilities(nue_pdgid, nutau_pdgid, energies,␣
       ,→cos_zenith)

                                                                                                                  Python
[62]: plt.plot(energies, prob_ee)                                                         REST API                program
      plt.plot(energies, prob_em)
      plt.plot(energies, prob_et)                                                                      km3services
      plt.grid()                                                                        Numpy
                                                                                        arrays            call
      plt.xscale("log")

                                                                                                     cca004$> _

    PyHEP 21 - 2021-07-05 – Johannes Schumann                                                                        22
Additional km3py packages
● km3db:               Interface for the KM3NeT Oracle database which stores
                       information about detector hardware, calibration,
                       monitoring data, Q&A results etc.
● km3flux:             Parsing of flux tables with interpolation functionality
● km3astro: Extension for astropy for celestial coordinate
            transformations
● km3net-testdata:                  Collection of all kinds of data formats which
                                    can be utilised for unit testing
● rainbowalga:                      GUI for animating the events and the light
                                    distribution in the detector

PyHEP 21 - 2021-07-05 – Johannes Schumann                                           23
Summary
● The km3py collection provides high compatibility to all interfaces of
  the detector and specific functionality
● Focus on data monitoring, pipelines and provenance
● Widely used tools and scripts in KM3NeT based on the km3py
  environment
● Most of the packages are open source and distributed via PyPI
● Additionally a                  environment for KM3NeT is in the making
       –    Existing framework to process KM3NeT data files (ROOT,
            HDF5, binary, etc.) natively in Julia: NeRCA.jl
       –    Real-time event reconstruction for high-level detector monitoring
       –    Some general HEP Julia projects which originate from KM3NeT
            members: UnROOT.jl, Corpuscles.jl and Neurthino.jl

PyHEP 21 - 2021-07-05 – Johannes Schumann                                   24
Thank you for your attention!
References
[1] S. Adrián-Martínez et al., ‘Letter of intent for KM3NeT 2.0’, J. Phys. G:
    Nucl. Part. Phys., vol. 43, no. 8, p. 084001, Jun. 2016,
    doi: 10.1088/09543899/43/8/084001.
[2] A. López-Oramas, ‘Multi-year Campaign of the Gamma-Ray Binary LS I
   +61◦ 303 and Search for VHE Emission from Gamma-Ray Binary
   Candidates with the MAGIC Telescopes’, 2015,
   doi:10.13140/RG.2.1.4140.4969.
[3] U. F. Katz and Ch. Spiering, ‘High-energy neutrino astrophysics: Status
    and perspectives’, Progress in Particle and Nuclear Physics, vol. 67,
    no. 3, pp. 651–704, Jul. 2012, doi: 10.1016/j.ppnp.2011.12.001.
[4] https://what-if.xkcd.com/73/
[5] A. V. Akindinov et al., ‘Letter of interest for a neutrino beam from Protvino
    to KM3NeT/ORCA’, Eur. Phys. J. C, vol. 79, no. 9, p. 758, Sep. 2019,
    doi:10.1140/epjc/s10052-019-7259-5.
[6] O. Buss et al., ‘Transport-theoretical description of nuclear reactions’,
    Physics Reports, vol. 512, no. 1, pp. 1–124, Mar. 2012,
    doi:10.1016/j.physrep.2011.12.001.

PyHEP 21 - 2021-07-05 – Johannes Schumann                                           26
You can also read