Observational Methods - Lecture 9: Data Analysis WS 2020/21 Prof Joe Mohr, LMU
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Observational Methods
Lecture 9: Data Analysis
Prof Joe Mohr, LMU
WS 2020/21
Observational Methods - Lecture 9 12. Feb 2021 1Summary
• Poisson and Gaussian Noise
• Fitting Data and Confidence Intervals
• Mock Datasets
Observational Methods - Lecture 9 12. Feb 2021 2Poisson Noise
• Source of constant flux has
a fixed probability per unit
time of photon arriving
o Poisson distribution describes prob
of observing x photons given the
expectation is µ=
µ x −µ
P (x | µ) = e
x!
o At low expectation, significant
assymmetry in Poisson distribution
o Probability of detecting 0 is still
significant if expectation is 3 or 4
• Poisson noise generally
termed “sampling noise”
o Applied broadly to physical
processes
Poisson distribution for m=3, 6, 10.3
Observational Methods - Lecture 9 12. Feb 2021 3Normal or Gaussian Distribution
• Gaussian or normal
distribution described as: 2
1 " x−µ %
1 − $
2# σ &
'
P ( x | µ, σ ) = e
2πσ
o Where expectation µ= and
FWHM 2.354s
o 68% of prob lies between +/-1s
o Note the normalization
• Distribution is symmetric
about the mean µ.
o 68% of the integral lies between
µ+/-s
o 95% between µ+/-2s
o 99.7% between µ+/-3s
o 6.3x10-5 outside µ+/-4s
o 5.7x10-7 outside µ+/-5s
o 2.0x10-9 outside µ+/-6s
Observational Methods - Lecture 9 12. Feb 2021 4Poisson Noise in Gaussian Limit • In the limit that >10 the Poisson distribution can be approximated as Gaussian or normal distribution with mean µ and s=sqrt(µ) • Out in the tails of the distribution the differences are (much) larger. Observational Methods - Lecture 9 12. Feb 2021 5
Variance and Standard Deviation
• Width of distribution indicates
range of values obtained 1 n
2
within a set of measurements 1 σ 2 ≡ ∑( x − µ )
n i=1
• The variance s2 is the mean
squared deviation from the 1 n
2
mean. For an ensemble of n 2 2
σ ≡ ∑ ( x − µmeas )
points drawn from a n −1 i=1
distribution the variance can
be
o Slight difference when mean is extracted
from the dataset (see Eq. 2)
o For a probability distribution P(x) rather
than a sample drawn from the distribution +∞
one calculates the variance as in Eq. 3 2
σ2 ≡ ∫ dx ( x − µ ) P ( x )
• The Standard Deviation s is 3
−∞
the square root of the +∞
variance µ≡ ∫ dx x P ( x )
−∞
Observational Methods - Lecture 9 12. Feb 2021 6Gaussian Distribution
+∞
• Variance in Gaussian 2
σ ≡
2
∫ dx ( x − µ ) P ( x )
distribution is in fact the −∞
s2 that appears in the +∞ 1 % x−µ (
− '' 2 **
2
distribution 2 2 1 2& σ w )
σ = ∫ dx ( x − µ ) 2πσ w
e
−∞
σ 2 = σ w2
• The standard deviation
of the Gaussian sets the
standard for discussing
measurement
significance
Observational Methods - Lecture 9 12. Feb 2021 7Skewness and Kurtosis
• In addition to the first moment
(mean), second moment +∞
(variance) of distributions P(x), 3
the third moment (Skewness)
Skewness ≡ ∫ dx ( x − µ ) P ( x )
and fourth moments (Kurtosis) −∞
are also often valuable
• Skewness measures the
asymmetry of distribution
about the mean
• Kurtosis measures the extent +∞
4
of a distribution around the Kurtosis ≡ ∫ dx ( x − µ ) P ( x ) − 3
mean with respect to the −∞
Gaussian distribution
Observational Methods - Lecture 9 12. Feb 2021 8Poisson Distribution
• The variance of the Poisson distribution is s2=µ
• Consider a source of a particular flux f and to
observations of length t1 and 2t1
o The photon number N1=f*t1 and N2=2N1
o Standard deviations are s1 and s2
σ 1 = N1 and σ 2 = N 2 = 2σ 1
o So the noise is higher in sources that are observed with more photons
o Importantly, the fractional noise s/N drops the more counts one obtains
1 1 σ1
σ 1 N1 = and σ 2 = =
N1 N2 2
Observational Methods - Lecture 9 12. Feb 2021 9Mapping and Sample Purity
• Common application of Gaussian statistics is within
source finding in maps whose noise is Gaussian. Gaussian
noise is not so uncommon.
o A good example would be mm-wave maps (like from SPT)
o Optical/NIR images where background count levels are high
• Why on earth might one ever be interested in restricting
to 5 sigma sources?
o Prob is 6x10-7 of Gaussian distribution delivering such an outlier
o Typically in mapping experiments the solid angle mapped encompasses many
PSFs or Beams. So even extremely rare events are possible
• Ex: SPT beam is ~1 arcmin2. So there are 3600
independent beams per deg2. In a survey of 2500 deg2
there are then ~5 noise fluctuations above 5s expected
o Because SZE selected galaxy clusters are quite rare (~500 over same area), a
restriction to 5s is required to keep contamination at ~5/500=1% level
Observational Methods - Lecture 9 12. Feb 2021 10Fitting to a Model
• Comparison of data to theory can be carried out using a least
squares fit 2
n # yi,obs − ymod ( xi ) &
χ ≡ ∑%
2
(
i=1 $ σi '
• One chooses the model parameters such that c2 is minimized.
• Where does this come from?
o In the limit of Gaussian measurement uncertainties si and n measurements we could construct
the Likelihood of the measurements and model as the product of the individual Gaussian
probabilities
2
# y −y ( xi ) &(
n − 12 % i,obs mod
1 σi
L≡∏ e $ '
i=1 2πσ i
2
" yi,obs − ymod ( xi ) %
n
−2 ln L = ∑$
i=1 # σi
' + 2 ln
&
( 2πσ i )
o The last term does not depend on the model parameters if si is known and so can be
dropped
Observational Methods - Lecture 9 12. Feb 2021 11Minimizing c 2 • Any model is possible. In general one must have the number of observations n be large compared to the number of model parameters p or else the parameter values are not well constrained • There are many tools that have been developed to find the best fit o Matrix inversion including singular value decomposition for linear systems o Simplex minimization (e.g. Amoeba) o Methods that explicitly use the functional form of the gradient o Levenberg-Marquardt iterative method • Least squares fitting has been justified within the context of Gaussian, independent errors. It is nevertheless useful in a much broader context Observational Methods - Lecture 9 12. Feb 2021 12
Fitting to a Model in the Poisson Limit
• Often in astronomy one is working in the Poisson limit– where the
number of detected photons or objects is subject to Poisson
noise and the expectation value is low enough that one cannot
apply Gaussian statistics
o Consider a study of the galaxy luminosity function of a galaxy cluster– at least on the bright end
• Even in situations where there is plenty of data, one often ends
up working in the Poisson limit
o Examining trends in the behavior of the sample with mass, redshift or some other property often drives
the analysis of smaller and smaller subsets of the data
o To study the distribution of galaxy clusters in observable and mass one typically introduces binning
such that most bins have zero occupation and a few bins have occupation numbers of >=1
• In such a case one turns to the Poisson likelihood directly:
)
x
µ −µ
P (x | µ) = e ln * = x ln , − , − x ln . + x ln ℒ = % ln *&
x! &'(
o One must simply evaluate the expectation value of the model µ for each subsample of the data (in
each bin in luminosity and redshift, for example)
o Note that likelihood is sensitive even to empty bins, note also Stirling’s approximation for ln(x!)
Observational Methods - Lecture 9 12. Feb 2021 13Background in Poisson Limit
• Typically in astronomy analyses, there is a background
o In Gaussian limit can simply subtract background, adjust uncertainty
• if # = % − ', then -./ = -0/ + -1/ (Gaussian error propagation)
o In Poisson limit one is working with integer data and cannot simply subtract the
background
• A forward modeling of the full signal (source plus
background) is the typical solution
o The model becomes the sum of the (observed) background and the adopted
model that is being fitted
o The observation becomes the actual number of objects or photons in each
bin, regardless of whether those objects/photons are source or background
• This forward modeling approach works just as well in the
Gaussian limit and so is typically the best way to model
data in astronomy
Observational Methods - Lecture 9 12. Feb 2021 14Robust Statistics
• Real datasets are often not described by simple
Gaussian distributions- often there is a small fraction
of objects that exhibit much larger deviations than
expected
• A c2 is strongly affected by outliers, because it is the
sum of the square of the deviations. In the absence
of prior knowledge of the true underlying distribution,
a variety of tools exist to deal with this issue:
o One can use the sum of the absolute value of the deviations
o One can use the full distribution of deviations, extracting a characteristic
value using the median of the distribution
• MAD: median absolute deviation- perhaps normalized to act like a
Gaussian s (NMAD)
Observational Methods - Lecture 9 12. Feb 2021 15Goodness of Fit
• In general one can choose a minimization algorithm that
will find the minimum c2, corresponding to the “best fit”
But the “best fit” need not be a “good fit.”
• One can use the value of the minimized c2 to evaluate
tension between the data and the model
o The c2 probability function Q(c2|n) gives the expected range of c2 assuming
that the underlying errors are Gaussian and uncorrelated
o To evaluate the goodness of fit one needs the number of degrees of freedom n,
which is defined to be n=n-p, the difference between the number of
observations and the number of free parameters
• A reduced c2red=c2/n should be ~1, corresponding to a
typical deviation of 1s for each measurement
o Values that are too large indicate inconsistency between the data and the
model
o Values that are too small suggest flaws with the uncertainties (s too large or
correlated measurements)
Observational Methods - Lecture 9 12. Feb 2021 16Confidence Intervals
• Within a c2 context one can interpret a change in c2 or a
Dc2 in a probabilistic sense, and this allows one to define
confidence intervals on parameters
• One can use this approach to define single parameter
uncertainties or joint parameter uncertainties
• This table shows the Dc2 corresponding to 1, 2 and 3s
parameter intervals in the case where we have 1, 2 and 3
parameters of interest in our model
p 1 2 3
68.3% 1.00 2.30 3.53
95.4% 4.00 6.17 8.02
99.73% 9.00 11.8 14.2
Observational Methods - Lecture 9 12. Feb 2021 17Error Propagation
• Typically parameters are presented with their
uncertainties and these are often called “errors”
• Within the context of Gaussian, uncorrelated errors it
is straightforward to propagate these errors
• Consider a function of x and y: f(x,y) where there is
Gaussian scatter with sx and sy for the two variables.
%" # # %" %" %" # #
!"# = !& + !& !( + !(
%& %& %( %(
o In case of independent errors in x and y and middle term vanishes
Observational Methods - Lecture 9 12. Feb 2021 18Mock Samples
• A valuable technique is to create mock samples of
observations that exhibit the noise properties and
other characteristics expected of your dataset
o One can use these mocks to test fitting routines
o One can use these mocks to establish the significance of deviations of data
from a particular model or to establish confidence intervals on parameters
• The core underlying tool is the ability to produce a
Uniform Random Deviate, which is a random number
from 0 to 1
• Operating systems offer such tools, and scientific
algorithm packages (e.g. Numerical Recipes)
typically offer improved options
Observational Methods - Lecture 9 12. Feb 2021 19Arbitrary Distributions
• Using the uniform random deviate to produce more
generic distributions f(x) is straightforward
o One transforms to the cumulative distribution F(x) of the generic distribution
f(x), selects a URD and then infers the value x at which the cumulative
distribution F(x) equals that value
o To find F-1(x) one typically uses a root finder algorithm
Observational Methods - Lecture 9 12. Feb 2021 20Cumulative Distributions
• A particularly useful way of probing for
differences in two observed
distributions or an observed distribution
and a model is to examine the
cumulative distributions
• The Kolmogorov-Smirnov (K-S) test
allows one to characterize the
probability of the maximum distance
between the two distributions
o One extracts the probability that two distributions are drawn
from the same parent distribution
• There are many variants of the K-S test.
o 2D K-S test, Anderson-Darling test that improves sensitivity to
changes in the tails of the distribution
• With a model one can draw many
random samples and place the
observed sample in the context of the
large ensemble of randoms to quantify
the probability the observed
distribution is consistent with the
modelled distribution
Observational Methods - Lecture 9 12. Feb 2021 21Power Law Relations
• Physical parameters of astrophysical objects
often span many orders of magnitude and β
there often exist relations among them !M $
LX ( M ) = α # &
• These “scaling relations” are typically fit by
power law relations
" M0 %
• Often these relations are fit in log space rather
than linear space, but they need not be
• Linear space: A goodness of fit measure !M $
(! ")will be heavily influenced by the systems log LX ( M ) = log α + β log # &
with the largest values (e.g., Luminosity or M
" 0%
Mass), whereas the systems with small values
will have little impact.
• Log space: One is effectively using the
fractional deviation between the model and #&
the data, and so every system (high or low # ln & =
value) has similar impact (reflected by &
uncertainty)
Observational Methods - Lecture 9 12. Feb 2021 22Normal and Log-Normal
• When fitting a relation one must
be careful to characterize the
uncertainties properly
• Often the noise is characterized as
log-normal or normal
o Poisson noise in Gaussian limit-> normal
o Intrinsic variation-> log-normal
" x− x %2 " log x− log x %2
− 12 $ '
1 − 12 $
# σ &
' 1 # σ &
P(x) = e P(log x) = e
2πσ 2πσ
• Implications are dramatically
different in limit of dataset with
values extending over an order of
magnitude
o And thus fit results are sensitive to this choice
Observational Methods - Lecture 9 12. Feb 2021 23Pivot Points and Parameter Correlations
• When fitting a power law one
can inadvertently introduce a logY = A(log X − log X pivot ) + B
false correlation between the
amplitude and the slope by
poorly choosing the pivot point
• In general, when fitting power
• Consider a normal distribution
of points in a logX-logY space law relations, one wants to
with mean position select the pivot point in the
(,) and dispersion independent variable to be the
slogX-Y mean of the sample
• Choosing the pivot at
would lead to uncorrelated • Similar thinking should guide
errors in parameters A and B parametrization in more
general situations
• Choosing the pivot away from
introduces a strong
correlation between
parameters A and B
Observational Methods - Lecture 9 12. Feb 2021 24Intrinsic Scatter and Parameter Uncertainties
• Often “measurement • Example: cluster redshift
uncertainties” are not enough to o Measure redshift of single galaxy with
measurement uncertainty of 50 km/s
estimate the true uncertainties o Have you measured the cluster
on the parameters redshift with the same accuracy?
o Consider the case that the velocity
dispersion of the cluster is 1000km/s
• Often times intrinsic scatter must
be included to get realistic • In cases where the intrinsic
estimates of the parameter scatter isn’t understood one
uncertainties can often estimate the
scatter by requiring that the
• If a model of intrinsic scatter exists best fit model produces a
then one must adopt intrinsic reduced c2=1
scatter as another source of
measurement uncertainty • In fact, often the intrinsic
(added in quadrature, assuming scatter is at least as important
independence from from a science perspective
measurement noise) as any other parameter of
the model
2 2 2
σ tot = σ meas + σ int
Observational Methods - Lecture 9 12. Feb 2021 25Malmquist Bias
0.5 1.0 1.5
• Often times one studies a sample
log luminosity
of flux limited objects
−0.5
• Such a selection introduces
biases– the more luminous an
−1.5
object the better represented it is
in the sample −1.0 −0.5 0.0
log mass
0.5 1.0
o Malmquist 1925
o Consider the volume within which an object
0.5 1.0 1.5
would exceed the selection threshold
log luminosity
• When fitting power laws to a
sample the results of the
−0.5
Malmquist bias can be dramatic!
o To avoid biases one typically has to include
the selection effects in the model
−1.5
o The intrinsic scatter is critical here Mantz et al 2010
−1.0 −0.5 0.0 0.5 1.0
log mass
Observational Methods - Lecture 9 12. Feb 2021 26Eddington Bias
• Symmetric scatter (normal • Following discussion in
or log-normal) can lead to Mortonson et al 2011, the net
biases in common effect is:
astrophysical situations
ln M M = ln M obs + 12 nσ M2
• Eddington bias occurs when o In the presence of log-normal
the underlying population is scatter in the Mobs with width sM
varying rapidly with the o This result can be derived from a
observable Bayesian framework which
expresses the probability of a true
mass M given an observed mass
• Consider the mass function Mobs as the product of the
of objects P(M)~Mn with nScaling Relations and
0.5 1.0 1.5
the Eddington Bias
log luminosity
• Similarly to the Malmquist
−0.5
bias, the Eddington bias
makes it difficult to extract
−1.5
the true underlying model
from an observed (selected) −1.0 −0.5 0.0 0.5 1.0
dataset
log mass
0.5 1.0 1.5
• As previously noted, the
log luminosity
likelihood of a particular set
of parameters must include
−0.5
the selection effects to
enable extraction of an
−1.5
Mantz et al 2010
unbiased answer −1.0 −0.5 0.0 0.5 1.0
log mass
Observational Methods - Lecture 9 12. Feb 2021 28References
• Astronomy Methods (Bradt)
• Numerical Recipes in C (Press et al, 2nd Edition)
• Mortonson et al 2011
• Mohr et al 1999
• Mantz et al 2010
Observational Methods - Lecture 9 12. Feb 2021 29You can also read