Invertible Networks for LHC Theory

Page created by Eleanor Richards
 
CONTINUE READING
Invertible Networks for LHC Theory
Invertible
  Networks
Tilman Plehn

Simulations

Events

Unfolding

Inverting

Measurements
               Invertible Networks for LHC Theory

                           Tilman Plehn

                           Universität Heidelberg

                         Würzburg, 4/2021
Invertible Networks for LHC Theory
Invertible
  Networks     Briefest introduction ever
Tilman Plehn
                Neural network just a function
Simulations

Events            – think fθ (x) just as f (x)
Unfolding         – no parametrization, just very many values θ
Inverting
                  – θ-space the cool space       [latent space]
Measurements

                Construction through minimization

                  – define loss function L
                  – minimize through task
                  – evaluate x → f (x) in test/use case

                LHC applications

                  – regression: parton momentum from jet constituents
                                matrix element over phase space
                  – classification: gluon/quark/bottom/top inside jet
                  – generation: sample r → f (r )
                    ...
Invertible Networks for LHC Theory
Invertible
  Networks     Challenges towards HL-LHC
Tilman Plehn
                Paradigm shift: model searches −→ fundamental understanding of data
Simulations

Events            – precision QCD
Unfolding         – precision simulations
Inverting
                  – precision measurements
Measurements
                 ⇒ Nothing fundamental without simulations   [not even unsupervised...]

                                                        ATLAS Preliminary
                                                        2020 Computing Model -CPU: 2030: Aggressive R&D
                                                                              2%        10%
                                                                         8%
                                                                                                   12%

                                                             13%                                               Data Proc
                                                                                                          7%   MC-Full(Sim)
                                                                                                               MC-Full(Rec)
                                                                                                               MC-Fast(Sim)
                                                                                                               MC-Fast(Rec)
                                                                                                               EvGen
                                                                                                    16%        Heavy Ions
                                                                   18%                                         Data Deriv
                                                                                                               MC Deriv
                                                                                              5%               Analysis
                                                                                   8%
Invertible Networks for LHC Theory
Invertible
  Networks     Challenges towards HL-LHC
Tilman Plehn
                Paradigm shift: model searches −→ fundamental understanding of data
Simulations

Events            – precision QCD
Unfolding         – precision simulations
Inverting
                  – precision measurements
Measurements
                 ⇒ Nothing fundamental without simulations                  [not even unsupervised...]

                10-year HL-LHC requirements

                  – simulated event numbers ∼ expected events                      [factor 25 for HL-LHC]

                  – general move to NLO/NNLO          [1%-2% error]

                  – higher relevant multiplicities   [jet recoil, extra jets, WBF, etc.]

                  – new low-rate high-multiplicity backgrounds
                  – cutting-edge predictions not through generators                        [N3 LO in Pythia?]
Invertible
  Networks     Challenges towards HL-LHC
Tilman Plehn
                Paradigm shift: model searches −→ fundamental understanding of data
Simulations

Events            – precision QCD
Unfolding         – precision simulations
Inverting
                  – precision measurements
Measurements
                 ⇒ Nothing fundamental without simulations                  [not even unsupervised...]

                10-year HL-LHC requirements

                  – simulated event numbers ∼ expected events                      [factor 25 for HL-LHC]

                  – general move to NLO/NNLO          [1%-2% error]

                  – higher relevant multiplicities   [jet recoil, extra jets, WBF, etc.]

                  – new low-rate high-multiplicity backgrounds
                  – cutting-edge predictions not through generators                        [N3 LO in Pythia?]

                Three ways to use ML

                  – improve current tools: iSherpa, ML-MadGraph, etc
                  – new tools: ML-generator-networks
                  – conceptual ideas in theory simulations and analyses
Invertible
  Networks     Generative networks
Tilman Plehn
                GANGogh    [Bonafilia, Jones, Danyluk (2017)]
Simulations

Events           – neural network: learned function f (x)                   [regression, classification]

Unfolding        – can networks create new pieces of art?
Inverting          map random numbers to image pixels?
Measurements     – train on 80,000 pictures            [organized by style and genre]

                 – generate flowers
Invertible
  Networks     Generative networks
Tilman Plehn
                GANGogh    [Bonafilia, Jones, Danyluk (2017)]
Simulations

Events           – neural network: learned function f (x)                   [regression, classification]

Unfolding        – can networks create new pieces of art?
Inverting          map random numbers to image pixels?
Measurements     – train on 80,000 pictures            [organized by style and genre]

                 – generate portraits
Invertible
  Networks     Generative networks
Tilman Plehn
                GANGogh     [Bonafilia, Jones, Danyluk (2017)]
Simulations

Events            – neural network: learned function f (x)                   [regression, classification]

Unfolding         – can networks create new pieces of art?
Inverting           map random numbers to image pixels?
Measurements      – train on 80,000 pictures            [organized by style and genre]

                Edmond de Belamy          [Caselles-Dupre, Fautrel, Vernier (2018)]

                  – trained on 15,000 portraits
                  – sold for $432.500
                 ⇒ ML all marketing and sales
Invertible
  Networks     Generative networks
Tilman Plehn
                GANGogh         [Bonafilia, Jones, Danyluk (2017)]
Simulations

Events            – neural network: learned function f (x)                       [regression, classification]

Unfolding         – can networks create new pieces of art?
Inverting           map random numbers to image pixels?
Measurements      – train on 80,000 pictures                [organized by style and genre]

                Edmond de Belamy              [Caselles-Dupre, Fautrel, Vernier (2018)]

                  – trained on 15,000 portraits
                  – sold for $432.500
                 ⇒ ML all marketing and sales

                Jet portraits     [de Oliveira, Paganini, Nachman (2017)]

                  – calorimeter or jet images
                    sparsity the technical challenge
                 1- reproduce valid jet images from training data
                 2- organize them by QCD vs W -decay jets
                  – high-level observables m, τ21 as check
                 ⇒ GANs generating QCD jets
Invertible
  Networks     GAN algorithm
Tilman Plehn
                Generating events   [phase space positions, possibly with weights]
Simulations

Events            – training: true events {xT }
Unfolding
                    output:    generated events {r } → {xG }
Inverting
                  – discriminator constructing D(x) by minimizing                     [classifier D(x) = 1, 0 true/generator]

Measurements           LD =   − log D(x)    xT
                                                 +    − log(1 − D(x))           xG

                  – generator constructing r → xG by minimizing                      [D needed]

                       LG =   − log D(x)    xG

                  – equilibrium D = 0.5 ⇒ LD /2 = LG = − log 0.5
                 ⇒ statistically independent copy of training events
Invertible
  Networks     GAN algorithm
Tilman Plehn
                Generating events          [phase space positions, possibly with weights]
Simulations

Events            – training:       true events {xT }
Unfolding
                    output:         generated events {r } → {xG }
Inverting         – discriminator constructing D(x) by minimizing                            [classifier D(x) = 1, 0 true/generator]

Measurements      – generator constructing r → xG by minimizing                             [D needed]

                 ⇒ statistically independent copy of training events

                Generative network studies
                  – Jets   [de Oliveira (2017), Carrazza-Dreyer (2019)]

                  – Detector simulations         [Paganini (2017), Musella (2018), Erdmann (2018), Ghosh (2018), Buhmann (2020,2021)]

                  – Events    [Otten (2019), Hashemi, DiSipio, Butter (2019), Martinez (2019), Alanazi (2020), Chen (2020), Kansal (2020)]

                  – Unfolding     [Datta (2018), Omnifold (2019), Bellagente (2019), Bellagente (2020), Vandegar (2020), Howard (2020)]

                  – Templates for QCD factorization               [Lin (2019)]

                  – EFT models        [Erbin (2018)]

                  – Event subtraction        [Butter (2019)]

                  – Phase space        [Bothmann (2020), Gao (2020), Klimek (2020)]

                  – Basics    [GANplification (2020), DCTR (2020)]

                  – Unweighting       [Verheyen (2020), Backes (2020)]

                  – Superresolution        [DiBello (2020), Baldi (2020)]

                  – Parton densities       [Carrazza (2021)]

                  – Uncertainties      [Bellagente (2021)]
Invertible
  Networks     GANplification
Tilman Plehn
                Gain beyond training data      [Butter, Diefenbacher, Kasieczka, Nachman, TP]
Simulations

Events            – true function known                                                               0.18
                    compare GAN vs sampling vs fit                                                         10 quantiles                         truth
Unfolding                                                                                             0.16 GAN trained on 100 data points       fit
                                                                                                                                                Sample
Inverting         – quantiles with   χ2 -values                                                       0.14                                      GAN
                                                                                                      0.12
Measurements      – fit like 500-1000 sampled points
                                                                                                      0.10
                    GAN like 500 sampled points [amplifictation factor 5]

                                                                                            p(x)
                                                                                                      0.08
                    requiring 10,000 GANned events
                                                                                                      0.06
                  – interpolation and resolution the key            [NNPDF]
                                                                                                      0.04
                 ⇒ GANs beyond training data                                                          0.02
                                                                                                      0.00   8     6     4     2   0   2    4   6   8
                                                                                                                                   x

                                                                                                                                   50 quantiles
                                                                                                                       GAN         100 data points

                                                                                       quantile MSE
                                                                                                                       sample
                                                                                                10       2   200
                                                                                                             300

                                                                                                                       fit

                                                                                                         101       102         103 104 105          106
                                                                                                                             number GANed
Invertible
  Networks     How to GAN LHC events
Tilman Plehn
                Idea: replace ME for hard process         [Butter, TP, Winterhalder]
Simulations

Events            – medium-complex final state t t̄ → 6 jets
Unfolding           t/t̄ and W ± on-shell with BW 6 × 4 = 18 dof
Inverting           on-shell external states → 12 dof [constants hard to learn]
Measurements        parton level, because it is harder
                  – flat observables flat    [phase space coverage okay]

                  – standard observables with tails          [statistical error indicated]
                                                                                          3
                                                                                    ⇥10
                                                                              6.0                                        True

                                                                    [GeV 1]
                                                                                                                         GAN
                                                                              4.0

                                                                     dpT,t
                                                                              2.0

                                                                    1 d
                                                                              0.0
                                                                              1.2

                                                                    GAN
                                                                    True
                                                                              1.0
                                                                              0.8
                                                                              1.0                        pT,t [GeV]
                                                                     Ntail
                                                                                              20%
                                                                    p1
                                                                              0.1
                                                                                     0        50    100 150 200 250 300 350 400
                                                                                                          pT,t [GeV]
Invertible
  Networks     How to GAN LHC events
Tilman Plehn
                Idea: replace ME for hard process          [Butter, TP, Winterhalder]
Simulations

Events            – medium-complex final state t t̄ → 6 jets
Unfolding           t/t̄ and W ± on-shell with BW 6 × 4 = 18 dof
Inverting           on-shell external states → 12 dof [constants hard to learn]
Measurements        parton level, because it is harder
                  – flat observables flat    [phase space coverage okay]

                  – standard observables with tails            [statistical error indicated]

                  – improved resolution      [1M training events]
                                                                                  1M true events                       ×101
                                                                              3
                                                                                                                              5.0
                                                                              2
                                                                                                                              4.0
                                                                              1

                                                                      φ j2
                                                                              0                                               3.0

                                                                             −1                                               2.0
                                                                             −2
                                                                                                                              1.0
                                                                             −3
                                                                                  −3     −2        −1    0     1   2   3
                                                                                                        φ j1
Invertible
  Networks     How to GAN LHC events
Tilman Plehn
                Idea: replace ME for hard process         [Butter, TP, Winterhalder]
Simulations

Events            – medium-complex final state t t̄ → 6 jets
Unfolding           t/t̄ and W ± on-shell with BW 6 × 4 = 18 dof
Inverting           on-shell external states → 12 dof [constants hard to learn]
Measurements        parton level, because it is harder
                  – flat observables flat    [phase space coverage okay]

                  – standard observables with tails          [statistical error indicated]

                  – improved resolution      [10M generated events]
                                                                                  10M generated events                  ×102
                                                                              3
                                                                                                                               3.5
                                                                              2

                                                                              1                                                3.0

                                                                      φ j2
                                                                              0                                                2.5

                                                                             −1                                                2.0
                                                                             −2
                                                                                                                               1.5
                                                                             −3
                                                                                  −3     −2      −1       0     1   2   3
                                                                                                         φ j1
Invertible
  Networks     How to GAN LHC events
Tilman Plehn
                Idea: replace ME for hard process         [Butter, TP, Winterhalder]
Simulations

Events            – medium-complex final state t t̄ → 6 jets
Unfolding           t/t̄ and W ± on-shell with BW 6 × 4 = 18 dof
Inverting           on-shell external states → 12 dof [constants hard to learn]
Measurements        parton level, because it is harder
                  – flat observables flat    [phase space coverage okay]

                  – standard observables with tails          [statistical error indicated]

                  – improved resolution      [50M generated events]
                                                                                  50M generated events                  ×103
                  – Forward simulation working                                3                                                1.8

                                                                              2                                                1.6

                                                                              1                                                1.4

                                                                      φ j2
                                                                              0
                                                                                                                               1.2
                                                                             −1
                                                                                                                               1.0
                                                                             −2
                                                                                                                               0.8
                                                                             −3
                                                                                  −3     −2      −1       0     1   2   3
                                                                                                         φ j1
Invertible
  Networks     Bonus: unweighting & errors without binning
Tilman Plehn
                Gaining from unweighting   [Butter, TP, Winterhalder]
Simulations

Events            – phase space sampling: weighted events                   [PS weight ×|M|2 ]

Unfolding           events: constant weights                                                                                        Train
                                                                                        10−1
                                                                                                                                    Unweighted
Inverting         – unweighting the weak spot of standard MC                            10−2                                        uwGAN
Measurements

                                                                               σ dEµ−
                  – learn phase space patterns                                          10−3

                                                                               1 dσ
                                                     [density estimation]
                    generate unweighted events         [through loss]                   10−4

                                                                                        10−5

                                                                                        10−6
                                                                                         2.0
                                                                                         1.5

                                                                                 Truth
                                                                                   X
                                                                                         1.0
                                                                                         0.5
                                                                                               0   1000   2000      3000     4000   5000    6000
                                                                                                                 Eµ− [GeV]
Invertible
  Networks     Bonus: unweighting & errors without binning
Tilman Plehn
                Gaining from unweighting      [Butter, TP, Winterhalder]
Simulations

Events            – phase space sampling: weighted events                      [PS weight ×|M|2 ]

Unfolding
                    events: constant weights                                                                                                                Train
                                                                                           10−1
                                                                                                                                                            Unweighted
Inverting         – unweighting the weak spot of standard MC                               10−2                                                             uwGAN

                                                                                  σ dEµ−
Measurements      – learn phase space patterns                                             10−3

                                                                                  1 dσ
                                                        [density estimation]
                    generate unweighted events            [through loss]                   10−4

                                                                                           10−5

                                                                                           10−6
                                                                                                   2.0
                                                                                                   1.5

                                                                                    Truth
                                                                                      X
                                                                                                   1.0
                                                                                                   0.5
                                                                                                          0      1000         2000      3000     4000       5000    6000
                Events with error bars   [Bellagente, Haußmann, Luchmann, TP]
                                                                                                                                     Eµ− [GeV]

                (1) learn phase space density as usual                                                   ×10−2
                (2) learn error from weight distributions                            2.0
                                                                      [Bayesian network]                                                    BINN
                                                                                                                                            Training Data
                  – generate events with error bars                                                1.5
                                                                                                                                            +/- σpred

                                                                                      Normalized
                                                                                                   1.0

                                                                                                   0.5

                                                                                                   0.0
                                                                                                   1.1
                                                                                     BINN
                                                                                     Data

                                                                                                   1.0
                                                                                                   0.9
                                                                                                         0              100               200           300
                                                                                                                              Ee+ [GeV]
Invertible
  Networks     How to GAN away detector effects
Tilman Plehn
                Goal: invert Monte Carlo    [Bellagente, Butter, Kasiczka, TP, Winterhalder]
Simulations

Events
                  – parton shower, detector simulation typical examples                        [drawing random numbers]

Unfolding         – inversion possible, in principle        [entangled convolutions, model assumed]

Inverting         – GAN task
Measurements                    DELPHES               GAN
                      partons     −→      detector −→ partons
                 ⇒ Full phase space unfolded

                Conditional GAN
                  – random numbers −→ parton level
                    hadron level as condition
                    matched event pairs
Invertible
  Networks     Detector unfolding
Tilman Plehn
                Reference process pp → ZW → (``) (jj)
Simulations
                  – broad jj mass peak
Events
                    narrow `` mass peak
Unfolding
                    modified 2 → 2 kinematics
Inverting           fun phase space boundaries
Measurements
                  – GAN same as event generation                    [with MMD]

                Model (in)dependence
                              2                                                                     1
                                     ⇥10                                                      ⇥10
                               2.5                                 Truth                3.0                                     Truth
                                                                   FCGAN                                                        FCGAN
                               2.0                                                      2.5
                     [GeV 1]

                                                                              [GeV 1]
                                                                   Delphes                                                      Delphes
                                                                                        2.0
                               1.5
                                                                                        1.5
                      dpT,j1

                                                                               dmjj
                               1.0
                     1 d

                                                                              1 d
                                                                                        1.0
                               0.5                                                      0.5
                               0.0                                                      0.0
                               1.2                                                      1.2
                      FCGAN

                                                                              FCGAN
                       Truth

                                                                               Truth
                               1.0                                                      1.0
                               0.8                                                      0.8
                                      0    25   50   75 100 125 150 175 200                   70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0
                                                      pT,j1 [GeV]                                              mjj [GeV]
Invertible
  Networks     Detector unfolding
Tilman Plehn
                Reference process pp → ZW → (``) (jj)
Simulations
                  – broad jj mass peak
Events              narrow `` mass peak
Unfolding           modified 2 → 2 kinematics
Inverting           fun phase space boundaries
Measurements      – GAN same as event generation                              [with MMD]

                Model (in)dependence
                  – detector-level cuts                   [14%, 39% events, no interpolation, MMD not conditional]

                                pT ,j1 = 30 ... 50 GeV pT ,j2 = 30 ... 40 GeV pT ,`− = 20 ... 50 GeV (12)
                                pT ,j1 > 60 GeV                                                                                         (13)
                                           2                                                                 1
                                     ⇥10                                                               ⇥10
                                                                                                 4.0
                               6.0                                            Truth                                                      Truth
                                                                              FCGAN              3.5                                     FCGAN
                               5.0                                            Eq.(12)            3.0                                     Eq.(12)
                                                                              Eq.(13)                                                    Eq.(13)
                               4.0                                                               2.5
                      dpT,j1

                                                                                          dmjj
                                                                                                 2.0
                     1 d

                                                                                         1 d
                               3.0
                                                                                                 1.5
                               2.0
                                                                                                 1.0
                               1.0                                                               0.5
                               0.0                                                               0.0
                                      0        25   50   75 100 125 150 175 200                        70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0
                                                          pT,j1 [GeV]                                                   mjj [GeV]
Invertible
  Networks     Detector unfolding
Tilman Plehn
                Reference process pp → ZW → (``) (jj)
Simulations
                  – broad jj mass peak
Events              narrow `` mass peak
Unfolding           modified 2 → 2 kinematics
Inverting           fun phase space boundaries
Measurements      – GAN same as event generation                [with MMD]

                Model (in)dependence
                  – detector-level cuts     [14%, 39% events, no interpolation, MMD not conditional]

                          pT ,j1 = 30 ... 50 GeV pT ,j2 = 30 ... 40 GeV pT ,`− = 20 ... 50 GeV (12)
                          pT ,j1 > 60 GeV                                                                                     (13)
                                                                                              ×10−3
                  – model dependence         [Thank you to BenN]                        6.0                                   Truth (W’)
                                                                                                                              FCGAN
                  – train: SM events                                                    5.0                                   Truth (SM)
                    test: 10% events with W 0 in s-channel

                                                                             [GeV−1]
                                                                                        4.0
                 ⇒ Working fine, but ill-defined
                                                                                        3.0

                                                                             σ dm``jj
                                                                             1 dσ
                                                                                        2.0

                                                                                        1.0

                                                                                        0.0
                                                                                              600     800   1000 1200 1400 1600 1800 2000
                                                                                                                 m``jj [GeV]
Invertible
  Networks     Proper inverting
Tilman Plehn
                Invertible networks   [Bellagente, Butter, Kasieczka, TP, Rousselot, Winterhalder, Ardizzone, Köthe]
Simulations

Events            – network as bijective transformation — normalizing flow
Unfolding           Jacobian tractable [specifically: coupling layer]
Inverting
                    evaluation in both directions — INN [Ardizzone, Rother, Köthe]
Measurements      – standard setup, random-number-padded working like FCGAN
                  – conditional: parton-level events from {r }
                  – maximum likelihood loss
                              L = − hlog p(θ|xp , xd )ix ,x
                                                        p d
                                                                         
                                                             ∂g(xp , xd )
                                = − log p(g(xp , xd )) + log                                        − log p(θ) + const.
                                                                ∂xp        x                p ,xd
Invertible
  Networks     Proper inverting
Tilman Plehn
                Invertible networks   [Bellagente, Butter, Kasieczka, TP, Rousselot, Winterhalder, Ardizzone, Köthe]
Simulations

Events
                  – network as bijective transformation — normalizing flow
Unfolding
                    Jacobian tractable [specifically: coupling layer]
                    evaluation in both directions — INN [Ardizzone, Rother, Köthe]
Inverting

Measurements
                  – standard setup, random-number-padded working like FCGAN
                  – conditional: parton-level events from {r }
                  – maximum likelihood loss

                Again pp → ZW → (``) (jj)

                  – performance on distributions like FCGAN
                  – parton-level probability distribution for single detector event
                 ⇒ Well-defined statistical inversion
                                                                                                    single detector event
                                                                                                    3200 unfoldings
                                                            1.4             FCGAN    Parton Truth
                                                            1.2
                                        events normalized

                                                            1.0

                                                            0.8

                                                            0.6

                                                            0.4

                                                            0.2                     cINN                eINN

                                                            0.0
                                                                  10   15   20   25 30 35                   40    45   50
                                                                                 pT,q1 [GeV]
Invertible
  Networks     Proper inverting
Tilman Plehn
                Invertible networks   [Bellagente, Butter, Kasieczka, TP, Rousselot, Winterhalder, Ardizzone, Köthe]
Simulations

Events
                  – network as bijective transformation — normalizing flow
Unfolding
                    Jacobian tractable [specifically: coupling layer]
                    evaluation in both directions — INN [Ardizzone, Rother, Köthe]
Inverting

Measurements
                  – standard setup, random-number-padded working like FCGAN
                  – conditional: parton-level events from {r }
                  – maximum likelihood loss

                Again pp → ZW → (``) (jj)

                  – performance on distributions like FCGAN
                  – parton-level probability distribution for single detector event
                 ⇒ Well-defined statistical inversion
                                                                                                    single detector event
                                                                                                    3200 unfoldings                              1.0
                                                            1.4             FCGAN    Parton Truth
                                                            1.2                                                                                  0.8
                                        events normalized

                                                                                                                            fraction of events
                                                            1.0
                                                                                                                                                 0.6
                                                            0.8
                                                                                                                                                                                 FCGAN
                                                                                                                                                                     eINN
                                                            0.6                                                                                  0.4

                                                            0.4                                                                                                 N
                                                                                                                                                 0.2           N
                                                                                    cINN                eINN                                                 cI
                                                            0.2

                                                            0.0                                                                                  0.0
                                                                  10   15   20   25 30 35                   40    45   50                              0.0     0.2      0.4     0.6      0.8   1.0
                                                                                 pT,q1 [GeV]                                                                           quantile pT, q1
Invertible
  Networks     Inverting to hard process
Tilman Plehn
                What theorists want: undo ISR
Simulations

Events
                  – detector-level process pp → ZW +jets                                 [variable number of objects]

Unfolding         – ME vs PS jets decided by network
Inverting         – training jet-inclusively or jet-exclusively
Measurements        parton-level hard process chosen 2 → 2
                                              ×10−2                                                                ×10−2
                                        2.5
                                                                               2 jet incl.                   2.5                                    2 jet excl.
                                                                               Parton Truth                                                          Parton Truth
                                        2.0
                                                                               Parton cINN                   2.0                                     Parton cINN
                             [GeV−1 ]

                                                                                                  [GeV−1 ]
                                                                               Detector Truth                                                        Detector Truth
                                        1.5                                                                  1.5
                             σ dpT,q1

                                                                                                  σ dpT,q1
                                        1.0                                                                  1.0
                             1 dσ

                                                                                                  1 dσ
                                        0.5                                                                  0.5

                                        0.0                                                                  0.0
                                        1.2                                                                  1.2
                              Truth

                                                                                                   Truth
                              cINN

                                                                                                   cINN
                                        1.0                                                                  1.0
                                        0.8                                                                  0.8
                                               0      25   50   75 100 125     150    175   200                     0      25   50   75 100 125     150    175   200
                                                                 pT,q1 [GeV]                                                          pT,q1 [GeV]
                                              ×10−2                                                                ×10−2
                                        2.5
                                                                               3 jet excl.                                                          4 jet excl.
                                        2.0                                     Parton Truth                 2.0                                     Parton Truth
                                                                                Parton cINN                                                          Parton cINN
                             [GeV−1 ]

                                                                                                  [GeV−1 ]
                                        1.5                                     Detector Truth               1.5                                     Detector Truth

                                        1.0                                                                  1.0
                             σ dpT,q1

                                                                                                  σ dpT,q1
                             1 dσ

                                                                                                  1 dσ

                                        0.5                                                                  0.5

                                        0.0                                                                  0.0
                                               0      25   50   75 100 125     150    175   200                     0      25   50   75 100 125     150    175   200
                                                                 pT,q1 [GeV]                                                          pT,q1 [GeV]
Invertible
  Networks     Inverting to hard process
Tilman Plehn
                What theorists want: undo ISR
Simulations

Events            – detector-level process pp → ZW +jets      [variable number of objects]

Unfolding         – ME vs PS jets decided by network
Inverting         – training jet-inclusively or jet-exclusively
Measurements        parton-level hard process chosen 2 → 2

                Towards systematic inversion

                  – detector unfolding known problem
                  – QCD parton from jet algorithm standard
                  – jet radiation possible
                 ⇒ Invertible simulation in reach
Invertible
  Networks     Inverting to QCD
Tilman Plehn
                cINN for inference     [Bieringer, Butter, Heimel, Höche, Köthe, TP, Radev]
Simulations

Events            – condition jets with QCD parameters
Unfolding           train     model parameters −→ Gaussian latent space
Inverting
                    test      Gaussian sampling −→ QCD parameter measurement
Measurements      – going beyond CA vs CF               [Kluth etal]
                                "                                                               #
                                       2z(1 − y)
                     Pqq = CF   Dqq                    + Fqq (1 − z) + Cqq yz(1 − z)
                                      1 − z(1 − y )
                                                                                        !
                                           z(1 − y)               (1 − z)(1 − y )
                                                                                                                            
                     Pgg = 2CA Dgg                         +                                + Fgg z(1 − z) + Cgg yz(1 − z)
                                    1 − z(1 − y)    1 − (1 − z)(1 − y )
                             h                                  i
                                   2          2
                     Pgq = TR Fqq z + (1 − z)     + Cgq yz(1 − z)
Invertible
  Networks     Inverting to QCD
Tilman Plehn
                cINN for inference     [Bieringer, Butter, Heimel, Höche, Köthe, TP, Radev]
Simulations

Events
                  – condition jets with QCD parameters
Unfolding
                    train     model parameters −→ Gaussian latent space
                    test      Gaussian sampling −→ QCD parameter measurement
Inverting

Measurements
                  – going beyond CA vs CF               [Kluth etal]
                                "                                                                       #
                                       2z(1 − y)
                     Pqq = CF   Dqq                    + Fqq (1 − z) + Cqq yz(1 − z)
                                      1 − z(1 − y )
                                                                                            !
                                           z(1 − y)               (1 − z)(1 − y )
                                                                                                                                                    
                     Pgg = 2CA Dgg                         +                                    + Fgg z(1 − z) + Cgg yz(1 − z)
                                    1 − z(1 − y)    1 − (1 − z)(1 − y )
                             h                                 i                                      σ =0.9
                                   2          2
                     Pgq = TR Fqq z + (1 − z)     + Cgq yz(1 − z) 0.4                                                       Posterior
                                                                                  0.3                                       Gaussian fit
                  – idealized shower        [Sherpa]                              0.2                                       Absolute error of 2.5

                                                                                  0.1
                                                                                                            Cqq
                                                                                  -2.5          0           2.5
                                                                                  Cqq                             0.08              σ =5.6
                                                                                  2.5
                                                                                                                  0.06

                                                                                  0                               0.04

                                                                                                                  0.02
                                                                                  -2.5
                                                                                                            Cgg                        Cgg
                                                                                      -10           0       10        -10       0      10
                                                                                  Cqq                             Cgg                        0.08            σ =4.9
                                                                                  2.5
                                                                                                                                             0.06
                                                                                                                  0
                                                                                  0                                                          0.04
                                                                                                                  -10                        0.02
                                                                                  -2.5
                                                                                                            Cgq                        Cgq                      Cgq
                                                                                      -10           0        10       -10       0      10      -10       0       10
Invertible
  Networks     Inverting to QCD
Tilman Plehn
                cINN for inference       [Bieringer, Butter, Heimel, Höche, Köthe, TP, Radev]
Simulations

Events
                  – condition jets with QCD parameters
Unfolding
                    train     model parameters −→ Gaussian latent space
                    test      Gaussian sampling −→ QCD parameter measurement
Inverting

Measurements
                  – going beyond CA vs CF                 [Kluth etal]
                                "                                                                     #
                                         2z(1 − y)
                     Pqq = CF   Dqq                      + Fqq (1 − z) + Cqq yz(1 − z)
                                      1 − z(1 − y )
                                                                                                !
                                             z(1 − y)               (1 − z)(1 − y )
                                                                                                                                                       
                     Pgg = 2CA Dgg                           +                                      + Fgg z(1 − z) + Cgg yz(1 − z)
                                    1 − z(1 − y)    1 − (1 − z)(1 − y )
                             h                                 i
                                   2          2                                                       σ =0.06                 Posterior
                     Pgq = TR Fqq z + (1 − z)     + Cgq yz(1 − z) 6
                                                                                                                              Gaussian fit
                                                                                                                              Relative error of 2%
                                                                                    4
                  – idealized shower          [Sherpa]                                                                        Absolute error of 2.5
                                                                                    2
                  – reality hitting...                                                                        Dqq
                                                                                          0.9   1.0       1.1
                  – More ML-opportunities...
                                                                                   Dqq                              2                  σ =0.2
                                                                                    1.1

                                                                                    1.0
                                                                                                                    1
                                                                                    0.9
                                                                                                              Fqq                           Fqq
                                                                                        0.5         1.0    1.5          0.5      1.0        1.5
                                                                                   Dqq                              Fqq                                         σ =2.3
                                                                                    1.1

                                                                                    1.0                             1.0                           0.1

                                                                                    0.9                             0.5
                                                                                                              Cqq                           Cqq                    Cqq
                                                                                     -5         0         5          -5         0       5          -5       0     5
Invertible
  Networks     Machine learning for LHC theory
Tilman Plehn
                Machine learning for the LHC
Simulations

Events            – Classification/regression standard
Unfolding           learning vs smart pre-processing
Inverting           uncertainties?
Measurements        experimental realities?
                  – GANs the cool kid
                    generator trying to produce best events
                    discriminator trying to catch generator
                  – INNs my theory hope
                    flow networks for control
                    condition for inversion
                    Bayesian for errors
                  – Progress needs crazy ideas
Invertible
  Networks     Bonus: subtraction
Tilman Plehn
                Subtract samples without binning                    [Butter, TP, Winterhalder]
Simulations

Events            – statistical uncertainty                                   q
Unfolding                                                        ∆B−S =        ∆2B + ∆2S > max(∆B, ∆S)
Inverting
                  – GAN setup: differential class label, sample normalization
Measurements
                  – toy example
                                               1                          1
                            PB (x) =             + 0.1        PS (x) =           ⇒       PB−S = 0.1
                                               x                          x

                                  100                                                             0.13
                                                                       GAN vs Truth                                                (B − S)GAN
                                                                            B                                                      (B − S)Truth ± 1σ
                                                                                                  0.12
                                                                            S
                                                                            B−S
                                 10−1                                                             0.11
                         P (x)

                                                                                          P (x)
                                                                                                  0.10

                                      −2
                                 10                                                               0.09

                                                                                                  0.08
                                           0   25   50   75   100 125 150 175 200                        0   25   50   75   100 125 150 175 200
                                                               x                                                             x
Invertible
  Networks     Bonus: subtraction
Tilman Plehn
                Subtract samples without binning    [Butter, TP, Winterhalder]
Simulations

Events            – statistical uncertainty                   q
Unfolding                                         ∆B−S =       ∆2B + ∆2S > max(∆B, ∆S)
Inverting
                  – GAN setup: differential class label, sample normalization
Measurements
                  – toy example
                                     1                    1
                          PB (x) =     + 0.1   PS (x) =          ⇒               PB−S = 0.1
                                     x                    x
                  – event-based background subtraction           [weird notation, sorry]
                                + −                           + −                                                   + −
                         pp → e e       (B)    pp → γ → e e                (S)          ⇒              pp → Z → e e        (B-S)
                                                                                              1
                                                                                        ⇥10
                                                                                                                          GAN vs Truth
                                                                                  2.0                                          B
                                                                                                                               S

                                                                      [pb/GeV]
                                                                                  1.5                                          B S

                                                                                  1.0

                                                                     dEe
                                                                      d
                                                                                  0.5

                                                                                  0.0
                                                                                                  20      40        60     80      100
                                                                                                               Ee [GeV]
Invertible
  Networks     Bonus: subtraction
Tilman Plehn
                Subtract samples without binning            [Butter, TP, Winterhalder]
Simulations

Events            – statistical uncertainty                           q
Unfolding                                                ∆B−S =        ∆2B + ∆2S > max(∆B, ∆S)
Inverting
                  – GAN setup: differential class label, sample normalization
Measurements
                  – toy example
                                     1                            1
                          PB (x) =     + 0.1         PS (x) =            ⇒           PB−S = 0.1
                                     x                            x
                  – event-based background subtraction                   [weird notation, sorry]
                                  + −                                 + −                                         + −
                         pp → e e         (B)      pp → γ → e e                  (S)             ⇒   pp → Z → e e       (B-S)

                  – collinear subtraction       [assumed non-local]

                        pp → Zg         (B: matrix element, S: collinear approximation)
                                                                                                                        GAN vs Truth

                                                                                           101                                B
                                                                                                                              S

                                                                               [pb/GeV]
                                                                                                                              B−S
                                                                                           100

                                                                                          10−1
                                                                              dpT,g
                                                                               dσ
                                                                                          10−2

                                                                                                 0   20   40       60   80        100
                                                                                                           pT,g [GeV]
You can also read