Using Gaia DR2 to Constrain Local Dark Matter Density and Thin Dark Disk - arXiv

Page created by Nelson Ross
Prepared for submission to JCAP

                                               Using Gaia DR2 to Constrain Local
                                               Dark Matter Density and Thin
                                               Dark Disk
arXiv:1808.05603v1 [astro-ph.GA] 16 Aug 2018

                                               Jatan Buch                , Shing Chau (John) Leunga , and JiJi Fana
                                               a Department    of Physics, Brown University, Providence, RI, 02912, USA

                                               Abstract. We use stellar kinematics from the latest Gaia data release (DR2) to measure the
                                               local dark matter density ρDM in a heliocentric cylinder of radius R = 150 pc and half-height
                                               z = 200 pc. We also explore the prospect of using our analysis to estimate the DM density
                                               in local substructure by setting constraints on the surface density and scale height of a thin
                                               dark disk aligned with the baryonic disk and formed due to dark matter self-interaction.
                                               Performing the statistical analysis within a Bayesian framework for three types of tracers,
                                               we obtain ρDM = 0.023 ± 0.012 M /pc3 for A stars; early G stars give a similar result, while
                                               F stars yield a significantly higher value. For a thin dark disk, A stars set the strongest
                                               constraint: excluding surface densities (5-15) M /pc2 for scale heights below 100 pc with 95%
                                               confidence. Comparing our results with those derived using Tycho-Gaia Astrometric Solution
                                               (TGAS) data, we find that the uncertainty in our measurements of the local DM content
                                               is dominated by systematic errors that arise from assumptions of our kinematic analysis in
                                               the low z region. Furthermore, there will only be a marginal reduction in these uncertainties
                                               with more data in the Gaia era. We comment on the robustness of our method and discuss
                                               potential improvements for future work.


1 Introduction                                                                               1

2 Data Selection                                                                             4
  2.1 Selection Function                                                                     4
  2.2 Vertical Number Density Distribution                                                   5
  2.3 Midplane Velocity Distribution                                                         7

3 Poisson-Jeans Theory                                                                       8

4 Statistics and Data Analysis                                                              12
  4.1 Basic setup: Priors, likelihood, and uncertainties                                    12
  4.2 Bayesian Analysis                                                                     14

5 Results and Discussion                                                                    14
  5.1 Local DM Content Using Gaia DR2                                                       15
      5.1.1 Local DM Density                                                                15
      5.1.2 Constraints on a Thin DD                                                        16
  5.2 Comparison of Constraints between DR2 and TGAS                                        16
  5.3 Possible Interpretation of Our Measurement of ρDM                                     19

6 Conclusions and Outlook                                                                   20

A Color-magnitude Modeling                                                                  21

B Uncertainty Analysis                                                                      22

C Variation of Midplane Cut                                                                 23

D Bootstrap Statistics                                                                      23

E Frequentist Analysis                                                                      24

1   Introduction

The second release of data collected by European Space Agency’s Gaia telescope provides the
positions and proper motions, with unprecedented precision, of more than one billion sources
in the Milky Way (MW) [1–24]. With the release of line-of-sight velocities for about seven
million stars, DR2 also allows, for the first time, a dynamical analysis with a self-consistent
measurement of the 6D phase space for a stellar population.
      DR2 presents an exciting opportunity to use the vertical velocity and number density
distributions of different populations of stars that trace the gravitational potential for pre-
cisely determining the total matter density, including baryons and dark matter (DM), in the
local solar neighborhood. Significant progress has been made in modeling the local baryon
budget (interstellar gas, stars, stellar remnants) and its uncertainties [25–28] since Oort’s
early estimate [29] of the baryon density. Meanwhile, kinematic methods for estimating the

local DM density rely on constraining the total matter content using motions of tracers after
assuming a model for the baryons and attributing any additional density, within uncertainty,
to DM. These methods are based on: a) the Jeans analysis that reduces the collisionless
Boltzmann equation for the phase space distribution function into a set of moment equations
by integrating over all velocities, and b) the Poisson equation which uses the total matter
density in all components to calculate the gravitational potential. In this work, we primar-
ily focus on the 1D distribution function method developed by Refs. [30–34] and used by
Refs. [35, 36] to constrain the local DM density with data from the Hipparcos satellite [37].
However, the approximations of isothermality and decoupling of radial and vertical motions in
this method are only valid up to scale height z ∼1 kpc. Therefore, for using tracer data at high
z, Refs. [38, 39] adopt the more general moment-based method to estimate the DM density. A
non-parametric formulation of the moment-based method, described by Ref. [40] and imple-
mented in Ref. [41], uses SDSS/SEGUE G stars in a heliocentric cylinder with R ∼1 kpc and
0.5 kpc
Selection function

                                  Color and volume cuts

                                           Negative parallax and midplane latitude cuts

                                                         Radial velocity in DR2


                                             Yes                  Average radial velocity

             Effective completeness                        Velocity distribution

          Density distribution from data        Predicted density using PJ solver

                             MCMC sampling of the posterior

                                Figure 1: Flowchart of our analysis.

their differences with results using TGAS in Section 5.2, and discuss their robustness in the
context of our method in Section 5.3. We conclude and comment on future directions in
Section 6.

2        Data Selection
Gaia DR2 contains ∼1.7 billion stars, among which ∼1.3 billion stars have a five-parameter
astrometric solution: (α, δ, µαe , µδ , $), representing positions and proper motions along the
right ascension and declination, and parallax respectively. We emphasize that for DR2, the
parallaxes and proper motions are based solely on Gaia measurements, unlike DR1 which
depends on the Tycho-2 Catalogue. DR2 also provides photometric data for a majority of
its sources in three passbands, G, GBP , and GRP with 3 . G . 21. Another new feature in
DR2 is the line-of-sight radial velocities, vR , for ∼7.2 million stars brighter than GRVS > 12.
We refer readers to Refs. [3, 10, 18] for more details of the Gaia DR2 measurements and
astrometric solution.
      However, Gaia DR2 is still volume incomplete for bright stars with G < 12. We use the
Two Micron All-Sky Survey (2MASS ) catalog [68] to compute the sky completeness of DR2.
2MASS is a full-sky infrared astrometric and photometric survey and is 99% complete in the
sky volume of our interest. It provides the angular position of the stars in the celestial sphere
allowing for a cross-match with the Gaia catalog. It also provides the J and Ks magnitudes
of each star, which we use to categorize the stars in our data sample.
      We query the Gaia archive1 for DR2 cross-matched with the 2MASS catalog, requiring
that the apparent magnitude J < 14, which cuts away stars that are either too dim for the
main sequence, or too distant from the Sun.2 The resulting cross-matched catalog contains
∼36 million stars. We then use the star counts in the complete 2MASS catalog for J < 14 to
compute the effective volume completeness of DR2, as we discuss in the following section.

2.1        Selection Function

                  0                                    50               0                                10
                      Number of good AL observations                        SD of good AL observations

Figure 2: Skymaps showing the number (left) and variance (right) of good AL observations
in 3.36 deg2 (Nside = 25 ) HEALPix pixels. The white regions are the “bad” parts of the sky,
which do not pass our selection cuts defined in the main text.

      In absence of an official Gaia selection function, we employ the quality cuts used in
Ref. [69] to identify the “good” part of the sky in the cross-matched data sample,
    (i) The mean number of along-scan (AL) observations ≥ 8.5,
    (ii) Spread in the number of AL observations ≤ 10.
         In our selected volume, the apparent magnitude of all tracer stars satisfy J < 12.

After these cuts, 95.6% of the sky remains with a mean parallax uncertainty <           ∼ 0.1 mas.
Although DR2 provides an order of magnitude improvement over TGAS in the uncertainties
of astrometric parameters within our volume of interest, we still need to include its error
budget in our data analysis. As suggested by Ref. [17], we add 0.05 mas to the reported
parallax to account for the global offset. Following Ref. [18], we also add in quadrature a
systematic uncertainty of ±0.1 mas and ±0.1 mas/yr to the reported values of parallax and
proper motions respectively.
     We focus on the solar neighborhood and select tracer stars in a heliocentric cylinder
with radius R = 150 pc (we discuss this choice in more detail in Section 2.2) and half-height
z = 200 pc. An important factor in selecting ‘good’ tracers3 of the local galactic potential
is their sensitivity to disequilibria. In particular, as concluded by Ref. [70], disequilibria
could have disparate effects on different tracer subpopulations resulting in incompatible ρDM
measurements. While there are conflicting views in the literature about which stars, old [35,
71] or young [41], are in dynamical equilibrium,4 we follow Ref. [41] in choosing younger A
(A0-A9), F (F0-F9), and early G (G0-G3) dwarf stars (simply stars henceforth) in our analysis
which have lower scale heights and consequently shorter equilibration timescales, instead of
older stars.5 We use the color cuts introduced in Ref. [69] to define tracer populations based
on different spectral subtypes (indicated in parentheses above). Other important criteria in
the choice of tracers for our analysis are sufficient statistics and a reasonable change in the
number densities within |z| ≤ 200 pc.
     Since Gaia DR2 is volume incomplete, the stellar density profile needs to be normalized
appropriately. The color-magnitude dependent normalization, referred to as the effective
volume completeness, is the ratio of DR2 number counts to those of the volume complete
2MASS catalog in our region of interest. We follow the method in Ref. [69] developed for
the TGAS catalog and use the gaia_tools package6 to compute this quantity for DR2. In
deriving the effective volume completeness, we also need to compute the selection function,
which is the fraction of stars at a given (J, J −Ks , α, δ) in the DR2 catalog. In the gaia_tools
package, the selection function is obtained by a spline interpolation of a modified magnitude
in three J − Ks color bins. We vary the modeling of the selection function by increasing the
number of color bins and find that the change in the resulting selection function and effective
completeness is negligible. A more detailed analysis is presented in Appendix A. The effective
completeness as a function of scale height for our DR2 tracer populations is plotted in Fig. 3.
We also include the effective completeness for the TGAS data as a reference, and note that
the DR2 sample is significantly more complete.

2.2    Vertical Number Density Distribution
There are 4544 A, 38431 F, and 44075 early G stars in the solar neighborhood defined by our
heliocentric cylinder. The volume complete vertical number density for each tracer, shown in
Fig. 4, is obtained by dividing the number counts with the effective volume completeness in
each z bin. We choose 20 pc as the bin size based on parallax uncertainties as discussed in
Appendix B. Varying the bin size doesn’t significantly affect the results of our analysis. We
     Sec. 3.6 of Ref. [45] gives a thorough, even if slightly outdated with the release of DR2, overview of
important characteristics for the choices of tracers.
     It will take a detailed study using N-body simulations to answer this question definitively, which is beyond
the scope of this paper.
     The velocity dispersions of stars increases with age due to scattering with structures in the MW. See
Chapter 8.4 of Ref. [72] for a detailed discussion.

R = 150 pc

       Effective volume completeness




                                                                                                              A stars
                                                                                                              F stars
                                                                                                              Early G stars

                                        −200     −150      −100     −50        0         50           100          150        200
                                                                             z [pc]

Figure 3: Effective volume completeness of each stellar type. The completeness of DR2
(solid) is improved by a factor of ∼3 compared to that of TGAS (dashed) for A, F and early
G stars.

also present a comparison of star counts in the full volume and in the midplane (defined to be
the region with b < 5o in the cylinder) between DR2 and TGAS in Table 1. There is roughly
a factor ∼2 increase in the number of stars in both the full volume and midplane region for
each tracer, leading to a ∼30% reduction in the statistical uncertainty due to Poisson error.

                                                     Data set             Gaia DR2                    TGAS
                                                Type      Subtype   Total    Midplane         Total    Midplane
                                                 A        A0-A9     4544           310        1729          182
                                                 F         F0-F9    38431       2213          16789         1308
                                               Early G    G0-G3     44075       2166          18653         1205

Table 1: Star counts in DR2 and TGAS catalogs for the heliocentric cylinder and the
midplane region (|b| < 5o ) inside it.

     The uncertainty in the star√number Nk in the k th z bin is obtained by adding in quadra-
ture the statistical uncertainty Nk and a 3% systematic uncertainty due to dust extinction.
We expect the dust extinction to be important in the visible spectrum such as the B and
V colors used in Hipparcos catalog, or the GBP and GRP used in DR2. However, colors
in the infrared spectrum, i.e. the J and Ks colors used in our cross-matched DR2-2MASS
catalog, are associated with longer wavelengths and therefore less affected by galactic dust.
Ref. [69] finds that the effect of dust reddening on the number density of stars in the solar
neighborhood defined using J and Ks is . 3% and mostly affects the overall normalization.





                                                       A stars
                                                       F stars
                                                       Early G stars

                         −200   −150   −100    −50         0        50       100       150       200
                                                        z [pc]

Figure 4: Vertical number density profiles in z = 20 pc bins for A (blue), F (green), and
early G (orange) stars.

      We notice that increasing the cylinder radius R from 150 pc to 200 pc results in an overall
broadening of the tracers’ density distribution. This is similar to the broadening reported by
Ref. [67] in the TGAS data. A broader density distribution could potentially lead to a much
stronger constraint on the local DM content since additional matter tends to pinch the density
distribution. Ref. [67] attributed the broadening to the so-called “Eddington” bias: higher
parallax uncertainties of distant stars could lead to a smearing of the density distribution
at large |z|. While this could be true for the TGAS catalog, the parallax uncertainties are
significantly reduced in DR2 and remain small at large |z|: the average and 1σ variation of
parallax uncertainty is below 10 pc (still smaller than the bin size 20 pc) at z = 200 pc in
DR2, even when R is increased to 250 pc as shown in Fig. 13. Thus, it seems unlikely that
the broadening of the density distribution is due to the “Eddington” bias.

2.3    Midplane Velocity Distribution
The last ingredient we need from the data is the vertical velocity distribution in the midplane,
i.e. at z = 0. The vertical velocity of a star is given by,
                                       w=w +         cos b + vR sin b,                                  (2.1)
where w is the Sun’s vertical velocity that we determine by fitting a Gaussian distribution
to the data, κ = 4.74 km yr s −1 is a unit conversion constant, µb is the proper motion along
the galactic latitude b in mas/yr, $ is the parallax in mas, and vR is the radial velocity in
     There are two options for defining the ‘midplane region’,7 imposing a cut on the height
    At larger b and consequently larger z, the kinematically hotter stars broaden the distribution [35]. Mean-
while, simply choosing stars with z = 0 yields poor statistics.

|z|, or the galactic latitude |b|. Since, until the release of DR2, radial velocities have been only
available for a subset of tracers, previous analyses chose a region with |b|  1 (in radians).
With that choice, substituting vR by its mean value,
                       hvR i = −u cos l cos b − v sin l sin b − w sin b,                      (2.2)
where u = 11.1 ± 0.7stat ± 1.0sys km/s and v = 12.24 ± 0.47stat ± 2.0sys km/s [73], only has
a subdominant contribution to w since sin b  1.
       We explore the possibility of using the z-cut [74] in Appendix C by including the newly
measured radial velocities in DR2. Unfortunately, DR2 only contains radial velocities for
approximately 2% of A stars, 53% of F stars, and 62% of early G stars for |z| < 20 pc. We
check that the percentage of tracers with radial velocity doesn’t change significantly for higher
values of z. In that case, only including stars with vR available could potentially introduce a
selection bias, while approximating vR by its mean value might result in large errors at higher
b (even at low z). Thus, defining the midplane region using a z-cut isn’t viable currently, but
that could change with future data releases.
       We follow Ref. [67] in choosing |b| < 5◦ as our midplane cut. After imposing an additional
cut to remove stars with negative parallaxes, we are left with 310, 2213 and 2166 A, F and early
G stars respectively. The mean of the best fit Gaussian distributions to the midplane vertical
velocity, weighted by the star counts of each tracer population in the midplane, is w = 6.9±0.2
km/s. We take this to be the Sun’s vertical velocity w and note that it is consistent within
1σ with the value in Ref. [73]. Subtracting w from the stars’ vertical velocity, we find
the distributions are roughly symmetric about w = 0. The resultant normalized midplane
vertical velocity distribution f0 (w) with a w-bin size of 1.5 km/s (see Appendix B for more
details about this choice) is plotted in the left panel of Fig. 5. We consider the asymmetry
between the star counts in −|w| and +|w| bins to be the systematic uncertainty, which may
be due to non-equilibrium effects. We illustrate the magnitude of this uncertainty in the
right panel of Fig. 5 by adding it in quadrature with the statistical error for every w bin. In
practice, however, we propagate these errors into the uncertainty of the prediction density, as
we elaborate in Sec. 4.1.
       We also check the isothermality of the tracers by fitting the midplane data with Gaussian
distributions. From the fits, we find that the velocity dispersions σz are 6.1, 10.4, 16.6 km/s
for A, F and G stars respectively. The χ2 ’s of the fits are 14.2, 38.6 and 30.0 for 14, 21 and
30 degrees of freedom respectively. The Gaussian (isothermal) distributions give reasonable
fits for A and G stars, but not as good a fit for F stars. In the rest of our analysis, we always
use the distributions from data and never their Gaussian fits.

3   Poisson-Jeans Theory
The phase space distribution function of a self-gravitating stellar population follows the col-
lisionless Boltzmann equation. Assuming the population is in equilibrium, we integrate the
Boltzmann equation over velocity to obtain a set of moment equations, also called the Jeans
equations [72]. Using cylindrical coordinates (r, φ, z) and focusing on the Jeans equation in
the z direction,
                  1 ∂                   1 ∂                 1 d       2
                        (rνi σrz;i ) +        (νi σφz;i ) +       νi σz;i   =− ,          (3.1)
                 rνi ∂r                rνi ∂φ               νi dz             dz
where νi is the stellar number density of the i-th species, σrz (σφz ) are the off-diagonal entries
in the velocity dispersion tensor that couple radial (axial) and vertical motions, σz is the

0.14                                                          0.25
                                           A stars                                                    A stars
                                           F stars                                                    F stars
        0.12                               Early G stars                                              Early G stars

        0.08                                                          0.15





        0.00                                                          0.00
               −40   −20      0       20            40                       0       10      20      30               40
                           w [km/s]                                                       w [km/s]

Figure 5: Midplane velocity distributions of A, F, and early G stars after subtracting w
(left). The best-fit Gaussian distribution to f0 (|w|) with error bars that include contributions
from the statistical uncertainty due to Poisson error and the asymmetry in −|w| and +|w|
bins (right).

vertical velocity dispersion (the diagonal zz component of the velocity dispersion tensor) and
Φ is the gravitational potential. The first term, usually referred to as the “tilt” term, is
negligible for small z: for instance, in case of G stars, σrz < 20 km2 /s2 for |z| . 200 pc [75].
The second term, the so-called “axial” term, is also negligible since our volume of interest is
assumed to be (approximately) axisymmetric. In our analysis, we only keep the third term on
the left hand side of the Jeans equation, leading to a simple solution for the i-th population,
                                       νi (z) = νi (0)e−Φ(z)/σz;i .                                              (3.2)

where we assume that each population is well thermalized near the galactic plane and thus
take σz;i to be a constant. If all constituents of a population have the same mass, then the
mass density ρi is proportional to the number density νi and satisfies,
                                      ρi (z) = ρi (0)e−Φ(z)/σz;i .                                               (3.3)

     The gravitational potential is determined by the mass density of the local neighborhood
through the Poisson equation,

                                 ∂2Φ 1 ∂
                        ∇2 Φ =        +       r      = 4πGρtot (z),                     (3.4)
                                 ∂z 2   r ∂r    ∂r

We treat the effective contribution from the radial term, 1r ∂r
                                                                  r ∂Φ
                                                                    ∂r , as a constant mass
density8 with a value (3.4 ± 0.6) × 10−3 M /pc3 determined from the TGAS data [76].
    For an axisymmetric system, the radial term can be related to Oort’s constants. Strictly speaking, the
Oort’s constants and consequently the radial term also depend on z. However, since our tracers only explore
a small volume close to the midplane, the variation is smaller than the measurement uncertainty [39].

The total mass density, ρtot , contains contributions from Nb baryon components, DM in
the halo, and other gravitational sources such as thin DD. The mass density for the baryons
is given by the Bahcall model that consists of a set of isothermal components for gas, stars,
and star remnants [77–79]. Each isothermal component is characterized by the midplane
density, ρ(0), and the vertical velocity dispersion, σz as shown in Table 2. We adapt this
table from Ref. [67], who, in turn, compiled it from the results of Ref. [27] and supplemented
with velocity dispersions from Refs. [25, 28].9
      The baryon mass densities as a function of z can be constructed in a straightforward
manner using Eq. (3.3). We approximate the density of halo DM in the disk, ρDM , to be
constant. As shown by Eq. (28) in Ref. [39], the DM density at or below 200 pc is equal to
that in the midplane up to a 2% correction.

                      Baryonic components           ρ(0) [M /pc3 ]       σz [km/s]
                       Molecular gas (H2 )         0.0104 ± 0.00312      3.7 ± 0.2
                    Cold atomic gas (HI (1))       0.0277 ± 0.00554      7.1 ± 0.5
                   Warm atomic gas (HI (2))         0.0073 ± 0.0007      22.1 ± 2.4
                      Hot ionized gas (HII )       0.0005 ± 0.00003      39.0 ± 4.0
                           Giant stars             0.0006 ± 0.00006      15.5 ± 1.6
                             MV < 3                0.0018 ± 0.00018      7.5 ± 2.0
                           3 < MV < 4              0.0018 ± 0.00018      12.0 ± 2.4
                           4 < MV < 5              0.0029 ± 0.00029      18.0 ± 1.8
                           5 < MV < 8              0.0072 ± 0.00072      18.5 ± 1.9
                       MV > 8 (M dwarfs)            0.0216 ± 0.0028      18.5 ± 4.0
                          White dwarfs              0.0056 ± 0.001       20.0 ± 5.0
                          Brown dwarfs              0.0015 ± 0.0005      20.0 ± 5.0

                  Table 2: Bahcall model for baryons adapted from Ref. [67].

     In models with a thin DD, we assume that the DD is isothermal, axisymmetric, and
perfectly aligned with the baryonic disk. Following Ref. [80], we choose the parametrization
of the thin DD density to be,
                                        ΣDD        2      z
                             ρDD (z) =        sech             ,                       (3.5)
                                       4hDD            2 hDD

where ΣDD is the surface density and hDD is the disk height. A thin DD aligned with the
baryonic disk contributes an additional source of attractive potential, which pulls baryonic
matter towards the midplane (see Section 2.2 of Ref. [57] for an example with a toy model).
This results in a narrowed vertical density profiles of tracers, as illustrated in Fig. 6.
     For a given mass model characterized by 2Nb baryonic parameters, ρDM , ΣDD and hDD ,
     We anticipate that future analyses could relax the assumption regarding isothermality of the baryon
components and adopt a self-consistent, data-driven approach for modeling the baryon mass density. For
example, the mass density for all stellar components could be constructed directly from the Gaia data.

                                                – 10 –
the total energy density, ρtot , can be written as,
                                                         ρi (0)e−Φ(z)/σz;i + ρDM + ρDD (z).
                                      ρtot (z) =                                                                 (3.6)

Plugging the expression into Eq. (3.4), we can solve the resulting second-order differential
equation numerically with scipy.ODEint to obtain the gravitational potential as a function
of z. We also explicitly check that our results agree with that of the iterative solver used by
Refs. [57, 67].



              ln(ν/ν0 )



                                                               No dark disk
                                                               ΣDD = 20 M /pc2 , hDD = 10 pc

                               −200      −150      −100        −50        0        50          100   150   200
                                                                        z [pc]

Figure 6: The predicted number density of a tracer in a model containing a thin DD with
surface density ΣDD = 20 M /pc2 and scale height hDD = 10 pc (dashed). For comparison,
we also plot the prediction of a model with the same matter content but without the thin DD

      After computing the gravitational potential for a given model, we can combine it with
the midplane vertical velocity distribution to predict the number density of tracers. If the
i-th type of tracer is in equilibrium and its vertical distribution is independent of R and φ, its
phase space distribution satisfies the Boltzmann equation in the z direction,
                                                             ∂fi ∂Φ ∂fi
                                                         w      −       = 0.                                     (3.7)
                                                             ∂z   ∂z ∂w
whose solution takes the form fi (z, w) = F w2 /2 + Φ(z) . In addition, if the distribution

function is separable in phase space,
        Z                                                          Z
            dwfi (z, w) = νi (z) → fi (z, w) = νi (z)fi,z (w) with   dvz fi,z (w) = 1, (3.8)

where fi,z (w) is the vertical velocity distribution function at scale height z. Finally, we

                                                                     – 11 –
integrate the distribution function over velocity to obtain the density distribution [33],
                              Z ∞                   Z ∞          p
                   νi (z) = 2       dwfi (z, w) = 2      dwfi (0, w2 + 2Φ(z))
                               0                      0
                                    Z ∞            p
                          = 2νi (0)      dwfi,z=0 ( w2 + 2Φ(z))
                                    Z ∞
                                              f (|w|) w dw
                          = 2νi (0) √        p0            ,                               (3.9)
                                       2Φ(z)    w2 − 2Φ(z)
where fi,z=0 (|w|) is the midplane velocity distribution for the ith tracer determined from data
as shown in the right panel of Fig. 5.

4     Statistics and Data Analysis
We analyze the ingredients described in previous sections within a Bayesian framework. For
each tracer population, i.e. A, F and G stars, we constrain the local DM content by adding
to the baryonic Bahcall model: either a) a constant density contribution from the DM halo,
ρDM ; or b) ρDM and a thin DD, as defined in Eq. (3.5), parametrized by its surface density,
ΣDD , and scale height, hDD . In Section 4.1, we discuss our choices for the prior distribution,
and details of the likelihood function and uncertainty analysis, before presenting an overview
of the MCMC sampling procedure in Section 4.2.

4.1    Basic setup: Priors, likelihood, and uncertainties
Our model M is characterized by θ = {ψ, ξ}, such that ψ = {ρDM , ΣDD , hDD } are our
parameters of interest while ξ are the nuisance parameters. These include: midplane densities,
ρk (0), and velocity dispersion, σz;k , for each baryonic component in the Bahcall model; overall
normalization constants for each stellar population, Nν ; height of the sun above the midplane,
z . We assume uniform prior distributions for all parameters except the baryonic ones; their
priors are assumed to follow Gaussian distributions,

                                                                                         !
                                      (ρk − ρ̄k )2                        (σz,k − σ̄z,k )2 
                 Y        1                               1
    pb (ζ|M) =       q         exp −       2
                                                                      exp −                    ,
                                                                                2 σσ2z,k
                       2πσρ2k            2 σρk          2πσ 2
                 k=1                                           σz;k
where the mean and variance for each component are taken from Table 2. We summarize the
details and ranges of assumed prior distributions for all parameters, θ, used in our analysis
in Table 3.
     The predicted number density is constructed by integrating the midplane velocity dis-
tribution using Eq. (3.9), and applying Gaussian kernel smoothing to approximate the effect
of parallax uncertainties that smear the exact positions of stars. However, since the parallax
uncertainties in DR2 are significantly reduced as compared to TGAS, this procedure only has
a negligible effect on the predicted density.
     For each population, the predicted number density is compared to the distribution from
the data with a likelihood function
                             Y       1            (ln(Nν νimod (θ)) − ln νidata )2
              pν (d|M, θ) =      q         exp −                2 (θ)                ,   (4.2)
                             i=1   2πσ 2                    2 σ ln ν i
                                      ln νi

                                              – 12 –
Parameters      Prior type           Range              Total
                      ρk (0), σz;k   Gaussian            Eq. (4.1)            24
                          Nν          Uniform            [0.9, 2.0]           3
                          z           Uniform       [−30.0, 30.0] pc          1
                         hDD          Uniform        [0.0, 100.0] pc          1
                         ρDM          Uniform      [0.0, 0.06] M      /pc3    1
                         ΣDD          Uniform      [0.0, 30.0] M      /pc2    1

                       Table 3: Prior distributions of model parameters.

where Nz is the number of z bins, νimod is the prediction of a model with parameters θ and
νidata is volume complete number density constructed from data, as described in Sec. 2.2.
We do not multiply the likelihood functions for different stellar populations in our analysis
since doing so assumes all populations are similar and trace the same galactic potential
independently. This is a rather simplified assumption which ignores the evolution history
of different stellar types. We comment more on this in Section 5.1.1.
      The squared error σln 2
                               νi is obtained by adding in quadrature the data and the prediction
                                 2              2
                                                        mod    2
                               σln  νi (θ) = σln νi (θ)      + σln νi       .                    (4.3)
The data uncertainty σln       νi        is discussed in Sec. 2.2, whereas for a fixed set of θ, the
prediction uncertainty σln νi              originates from the uncertainties of the velocity profile
fz=0 (|w|). The uncertainty consists of two sources: a) the statistical uncertainty due to the
finite sample size, and b) the systematic uncertainty due to possible non-equilibrium effects,
which we characterize by the difference between fz=0 (w > 0) and fz=0 (w < 0) following the
treatment in Ref. [67].
      Direct error propagation from uncertainties of fz=0 (|w|) to σln          νi      by derivatives
proves to be difficult due to the large number of parameters and their correlations involved.
Instead, we estimate the errors by bootstrap resampling. The bootstrap is a technique that
extracts statistical estimators, like mean and standard deviation, by repeated random sam-
pling of a data set with replacement. For each stellar type, the raw midplane star data sets
are bootstrapped many times to generate many different velocity distributions. For every
distribution, we use Eq. (3.9) to derive a predicted density distribution. The statistical un-
certainty is extracted from the shape fluctuation in the collection of the predicted density
distributions. We approximate the systematic uncertainty due to non-equilibrium effect by
computing the difference between predictions based on the distributions of subsets of velocity
data with w > 0 and w < 0. We find that the systematic uncertainties dominate over the
statistical ones in the prediction error. More details of the bootstrap procedure can be found
in Appendix D.
      Our statistical analysis closely follows that of Ref. [67] with one major difference: the
treatment of velocity uncertainties. In Ref. [67], normalization of each velocity bin is also
treated as a nuisance parameter, which adds an additional 20-30 parameters to the analysis.
In our approach, we propagate the velocity uncertainties, both statistical, estimated using
bootstrap resampling, and systematic, into the prediction uncertainties. We check that these

                                                – 13 –
two methods yield similar results for TGAS and DR2 data. The sources of uncertainties in
our analysis and their corresponding treatment are summarized in Table 4.

             Type                 Source                               Treatement
                                 Poisson                             Nk in the k-th bin
             ν data        3% dust extinction                          0.03 × ν data
                      Gaia systematic uncertainty        ±0.1 mas in $; ±0.1 mas/yr in µαe , µδ
                      statistical errors of fz=0 (|w|)           bootstrap resampling
             ν mod    fz=0 (w > 0) − fz=0 (w < 0)               | ln ν (+) (z) − ln ν (−) (z)|
                          parallax uncertainty                Gaussian kernel smoothing

                                  Table 4: Uncertainties in our analysis.

4.2        Bayesian Analysis
We adopt the Bayesian approach to estimate values of parameters and determine correlations
between them. The posterior probability density function (simply the posterior henceforth)
of the parameters can be defined using Bayes’ theorem,

                                                     p(d|M, θ)p(θ|M)
                                      p(θ|M, d) =                    ,                            (4.4)
where the numerator is given by Eqs. (4.1) and (4.2) and the denominator, referred to in the
literature as ‘marginal likelihood’ or ‘evidence’, is defined as
                               p(d|M) =      p(d|M, θ)p(θ|M) dθ.                       (4.5)

      We sample the posterior in Eq. (4.4) with the Markov Chain Monte Carlo (MCMC)
sampler emcee10 . To draw samples from a d-dimensional parameter space, emcee implements
the affine-invariant ensemble sampling algorithm of Ref. [81] that is based on simultaneously
evolving an ensemble of N walkers. Since each walker in the ensemble independently samples
the posterior, emcee is naturally suited for parallel computing on multicore systems (see
Ref. [82] for more details).
      In our implementation, we use (100-300) walkers for (15000-25000) steps depending on
the stellar type and components (ρDM or ρDM + thin DD) of the local DM content. These
numbers are chosen to achieve an acceptance fraction af ≈ 0.3 [83] for each walker. After
accounting for the ‘warm-up’ time, ∼4000 steps, of the ensemble, we obtain >         6
                                                                             ∼ 2 × 10 samples
on average for each iteration of our analysis.

5        Results and Discussion

We discuss the results from the MCMC sampling of the posterior for different local DM com-
ponents using DR2 data in Section 5.1. Since our statistical analysis closely follows Ref. [67],
we cross-validate it by repeating the procedure outlined in Sec. 4.1 with TGAS data in the
same galactic volume, and compare the results with those from DR2 in Section 5.2. Although

                                                     – 14 –
we only compare the results of our respective analyses with a thin DD, the conclusions should
also hold for the case with only ordinary DM and no thin DD as well. We comment on
possible interpretations of our result for ρDM , the prospect of discovering a thin DD, and the
robustness of our kinematic analysis in Section 5.3.

5.1                  Local DM Content Using Gaia DR2
5.1.1                 Local DM Density

                                   A stars                                                                          F stars
              0.12                                                                         0.12

              0.11                                                                         0.11

              0.10                                                                         0.10
ρb [M /pc3]

                                                                             ρb [M /pc3]
              0.09                                                                         0.09

              0.08                                                                         0.08

              0.07                                                                         0.07

              0.06                                                                         0.06
                         0.01   0.02                  0.03   0.04    0.05                             0.02           0.04     0.06   0.08
                                ρDM [M /pc ]                                                                 ρDM [M /pc3]

                                                                      Early G stars


                                       ρb [M /pc3]




                                                              0.01    0.02                  0.03   0.04      0.05
                                                                      ρDM [M /pc3]

Figure 7: Marginalized posteriors indicating the degeneracy between the local densities of
baryons ρb and halo DM ρDM .

     We summarize the results from the posterior sampling for the analysis with baryons and
a constant halo DM density ρDM in Table 5. The median value of ρDM obtained through
our kinematic analysis of A and early G stars are similar to each other, while using F stars

                                                                      – 15 –
yields a significantly higher value. We also note that our value of ρDM determined using A
and early G stars is consistent with previous measurements made using SDSS/SEGUE G star
data [84], ρDM = 0.012+0.001           3                              +0.025      3
                          −0.002 M /pc (within 1σ) and ρDM = 0.008−0.025 M /pc (within
2σ), by Refs. [41] and [26] respectively.

          Stellar type     ρDM [M /pc3 ]       ρDM [GeV/cm3 ]        ρb [M /pc3 ]        z [pc]
             A stars         0.023+0.010
                                  −0.010          0.874+0.380
                                                       −0.380         0.089+0.007
                                                                           −0.007       4.95+3.78
             F stars         0.047+0.006
                                  −0.007          1.786+0.228
                                                       −0.266         0.091+0.007
                                                                           −0.006       2.52+2.58
            G stars          0.021+0.014
                                  −0.011          0.798+0.532
                                                       −0.418         0.090+0.007
                                                                           −0.007      −8.46+4.61

Table 5: Median posterior values with 1σ errors for the local densities of baryons ρb and
halo DM ρDM , and height of the sun above the midplane z . The halo DM density ρDM is
expressed in both M /pc3 (astronomical unit) and GeV/cm3 (particle physics unit), where
1 M /pc3 ≈ 38 GeV/cm3 .

      While the 95% credible region (CR) for measurements of ρDM with A, F, and early G
stars in Fig. 7 overlap and seem consistent with each other at the 2σ level, we emphasize that
each tracer population doesn’t necessarily probe the same galactic environment (for instance,
sensitivity to non-equilibrium features of the MW [70]) due to differences in age and star
formation history. Consequently, without appropriate modeling of all prior information in a
Bayesian framework, results derived from different tracers should be compared with caution.

5.1.2    Constraints on a Thin DD
We perform a full MCMC scan of the posterior after including a thin DD component along
with local density of halo DM ρDM , and plot the marginalized posteriors for thin DD pa-
rameters, ρDM , and the total midplane baryon density ρb in Figs. 17–19. We find that after
marginalizing over the uncertainties of the baryon mass model and asymmetries in velocity
distribution, none of the tracers exclude zero surface density ΣDD for the thin DD at the 1σ
level. Given the exploratory nature of our analysis, this may be interpreted, at best, as an
approximate upper bound on the thin DD parameters.

5.2     Comparison of Constraints between DR2 and TGAS
We plot the 95% CR upper limit contours for the thin DD parameters using data from
DR2 (TGAS) in the left (right) panel of Fig. 8. Both sets of exclusion curves are significantly
stronger than previous results based on the Hipparcos catalog [57]. However, there are obvious
differences between our results derived using DR2 and TGAS data.11
      Using TGAS data, early G stars exclude ΣDD >                      3
                                                      ∼ (5 − 10) M /pc depending on hDD
while A stars set the weakest constraint. On the other hand, using DR2 data, A stars
exclude ΣDD >                    3
              ∼ (5 − 15) M /pc while the weakest constraint is due to F stars. Naively,
we would expect that there might be a (modest) improvement in the constraints from DR2
data compared to those from TGAS due to increased statistics (about a factor of ∼2.5) and
decreased parallax uncertainties (due to our choice of binning, these only affect the high z
    Our TGAS results roughly agree with Ref. [67], Fig. S12 in particular, although their plot was made using
the profile likelihood method while the contours in Fig. 8 have been obtained using a fully Bayesian analysis.
We obtained a similar result when we repeated our analysis with the profile likelihood method, which is shown
in Appendix E.

                                                   – 16 –
Gaia DR2                                                           TGAS
               20.0                                                           20.0

               17.5                                                           17.5

                           F                                                  15.0
ΣDD [M /pc2]

                                                               ΣDD [M /pc2]
               12.5                                                           12.5            A
               10.0            A                                              10.0

                7.5                                                            7.5        F
                5.0                                                            5.0

                2.5                                                            2.5

                0.0                                                            0.0
                      20       40              60   80   100                         20       40              60   80   100
                                    hDD [pc]                                                       hDD [pc]

Figure 8: 95% CR upper limit contours for surface density ΣDD and scale height hDD of a
thin DD for A (blue), F(green), and G (orange) stars using data from DR2 (left panel) and
TGAS (right).

bins). We check numerically that if we take central values from TGAS and uncertainties from
DR2 to generate mock distributions for the tracers, the derived constraints on thin DD are
indeed similar to those from TGAS data with minor improvements. Given this expectation,
it seems counterintuitive that our DR2 constraints are different from the TGAS ones.
      Before discussing possible origins of the differences for each tracer population, we note
that adding more matter pinches the density profile of tracer stars, such as the effect of thin
DD discussed in Sec. 3. Thus, the narrower the profile from data or broader the predicted
density is, the more matter that can be included, and weaker the constraint on local DM
      The significant weakening of constraints for F stars stems from small differences in the
midplane velocity distributions, as shown in the right panel of Fig. 9. The DR2 velocity
distribution is slightly broader. We verify that this trend in the velocity distribution is not
an artifact of our choice of the midplane latitude cut or the binning of the velocity data.
Although velocity (and vertical density) profiles from TGAS and DR2 are consistent with
each other within uncertainties, the predicted density distribution with DR2 data is broader
than that with TGAS data with fixed model parameters (one example is shown in the left
panel of Fig. 9). As a result, a higher density in DM components is required to fit the
predicted density of F stars to the DR2 number density profile for given baryon parameters.
      We also present the volume complete number density profiles and midplane velocity
distributions for A and early G stars in Fig. 10 and Fig. 11. From the plots, we note that
all the distributions based on TGAS and DR2 data for both these tracers are also consistent
within uncertainties, yet there are subtle differences. For the number density profiles, a) there
is a narrowing of the DR2 profile at high z due to a reduction in parallax uncertainties for
both A and early G stars; b) the DR2 profile of G stars is consistently narrower below the
midplane. The velocity distributions using DR2 data are smoother compared to the TGAS
ones with smaller systematic uncertainties from asymmetry between negative and positive
velocity data.
      The constraint from early G stars in the DR2 data set gets weaker due to both: a slightly

                                                         – 17 –
            0.0                                                                                              Gaia DR2


           −0.6                                                 0.04

           −1.0              TGAS
                             Gaia DR2
              −200   −100     0         100   200                      0   10        20           30             40
                            z [pc]                                                 w [km/s]

Figure 9: F stars: (left) volume complete number density profiles overlaid with the predicted
density derived using the mean TGAS and DR2 velocity distributions assuming fiducial values
for baryons and ρDM = 0.02 M /pc3 ; (right) midplane velocity distributions with interpolated
fits to the data. Note that the TGAS velocity distribution has a bin size of 2 km/s while DR2
bin size is 1.5 km/s.

                            A stars                                               Early G stars


           −1.0                                                −0.2

           −1.5                                                −0.3

           −2.0                                                −0.4
                             TGAS                                                     TGAS
           −2.5                                                −0.5
                             Gaia DR2                                                 Gaia DR2

              −200   −100     0         100   200                  −200    −100           0            100            200
                            z [pc]                                                   z [pc]

Figure 10: Comparison of volume complete number density profiles in TGAS and DR2 data
for A (left) and G (right) stars.

narrower density profile, and a slightly broader predicted density. However, in the case of
A stars, the constraint gets considerably stronger at high hDD due to the reduction in the
systematic errors from the asymmetry in the midplane velocity distribution.
     We reiterate that Gaia DR2 should be regarded as a different data catalog from TGAS,
rather than just a statistical improvement over it [2]. DR1 incorporated positions from the
Tycho-2 catalog to generate the five-parameter astrometric solution in the TGAS catalog,
whereas, the DR2 catalog is independent from any other external catalogs with its own self-
consistent astrometric solution. Any comparison between the constraints on local DM content
from TGAS and DR2 should be made bearing this difference in mind.

                                              – 18 –
A stars                                               Early G stars
          0.20                                TGAS                                                        TGAS
                                              Gaia DR2               0.07                                 Gaia DR2


          0.10                                                       0.04


          0.05                                                       0.02

          0.00                                                       0.00
                 0     10      20        30       40                        0   10      20           30       40
                              w [km/s]                                                w [km/s]

Figure 11: Comparison of midplane velocity distributions in TGAS and DR2 data for A
(left) and G (right) stars. Note that the TGAS velocity distribution has a bin size of 2 km/s.

5.3              Possible Interpretation of Our Measurement of ρDM
Our main results from the MCMC sampling of the posterior, e.g. for A stars, imply that
the local DM content can accommodate a constant density ρDM = 0.023 ± 0.010 M /pc3 , or
ρDM = 0.011+0.012           3                                   +4.53       2
             −0.010 M /pc and a thin DD with ΣDD = 3.74−2.73 M /pc , the precise value
depending on hDD . We observe that the 1σ errors are fairly large in both cases and suggest a
poor modeling of the systematics in the predicted density, a latent degeneracy between DM
and baryons at low z, or, more likely, a combination of both effects. We elaborate upon these
ideas in the rest of the section.
      An implicit assumption in our modeling of the tracer density profile is that the local
neighborhood is axisymmetric and the stellar disk is in dynamic equilibrium. However, grow-
ing evidence for disequilibria at |z| >
                                      ∼ 0.4 kpc: asymmetry in the vertical number counts [85];
vertical waves in the disk at Sun’s position [86–88]; substructure in the velocity distribution
of stars in DR2 data [89–91], warrants a closer look at sources of disequilibria in the solar
neighborhood using DR2 data. We defer searches of local disequilibria and the corresponding
revision of our traditional kinematic method outlined in Sec. 3 to future work. Presently, we
only approximate the effect on non-equilibrium behavior by propagating the asymmetry in
the midplane velocity distribution to the error in the predicted density.
      The marginalized posterior for each tracer in Fig. 7 indicates a strong degeneracy be-
tween measurements of ρb and ρDM . As proposed by Ref. [77], and recently implemented on
simulated data by Ref. [74], this degeneracy can only be broken if any kinematic analysis
includes the density falloff at larger |z| away from the midplane. Since most of the baryonic
matter is confined to the stellar disk with a scale height O(kpc), any excess matter that
causes the falloff can be attributed to (at least to leading order) to DM, allowing a more pre-
cise measurement of ρDM with smaller error bars. On the other hand, this introduces another
layer of complexity as the tilt term in Eq. (3.3) that couples the radial and vertical motions
is no longer negligible at |z| >
                               ∼ 0.5 kpc and must be modeled by simultaneously fitting to the
σRz data [40, 75].
      Meanwhile, the highly diagonal posterior in the ρDM –ΣDD plane combined with identi-

                                                         – 19 –
cally flat posterior in the ρDM –ρb and ΣDD –ρb planes of Figs. 17–19 implies that introducing
a thin DD in our analysis merely shifts some of the DM density from ρDM while increasing
its relative error. Thus, to set realistic constraints on, or seek evidence for, DM density in
the thin DD (or equivalently some form of extended substructure near the midplane) using
our procedure, we would need more physical insight into breaking the degeneracy between
DM in the halo and a thin DD. In the language of statistics this translates to expanding the
likelihood function with more data, and using hierarchical modeling to define a general class
of models for the local DM content such that our two scenarios: ρDM , and thin DD + ρDM
emerge as special cases with appropriate model-dependent posterior probabilities. Moreover,
we always assume that the thin DD is perfectly aligned with the baryonic disk. Since there are
no numerical simulations for the thin DD model, the validity of the alignment assumption is
unknown. Modifying our analysis to account for a tilted disk could yield different constraints.
      As the above discussion indicates, our results are dominated by systematic errors stem-
ming from an approximate modeling of non-equilibrium behavior and a strong degeneracy
between different matter components near the midplane. We note that these errors, in the
context of our method, may not be reduced significantly in future Gaia data releases.

6   Conclusions and Outlook

We apply the 1D distribution function method to Gaia DR2 and use stellar kinematics in the
solar neighborhood to constrain the local DM density and properties of a thin DD aligned
with the baryonic disk by performing our analysis within a Bayesian framework. We adopt
young A, F, and early G stars as tracers since they have shorter equilibration timescales and
consequently are expected not to be strongly affected by disequilibria. Using A stars gives an
estimate of ρDM = 0.023 ± 0.01 M /pc3 and sets the strongest constraint on the thin DD,
excluding ΣDD >                    2
                 ∼ (5-15) M /pc depending on the scale height with 95% confidence. While
we obtain similar results from early G stars, F stars seem to prefer a much higher value of
the local DM content. Even though the distributions derived from DR2 are consistent with
those from TGAS data within uncertainties, the allowed DM density and parameters of DD
model are quite different for all tracers. In light of these results, we address the origins of the
differences and discuss the robustness of our kinematic analysis.
      Our results also suggest that we need a better understanding of the physical origin of the
systematic uncertainties, which we include in our analysis to account for the asymmetry in the
midplane velocity distributions of tracers. One possibility is that with complete data for radial
velocities, we could define the midplane region using the z-cut instead of the b-cut and obtain
a more precise determination of the velocity distribution. Another possibility is to take a
closer look at local disequilibria and their effects on traditional kinematic methods. Although
we do not find any statistically significant evidence for non-equilibrium in the vertical density
and velocity distributions in our samples, several analyses based on DR2 seem to suggest
various sources of disequilibria at distances larger than the heliocentric cylinder we consider.
In terms of baryon modeling, it could be useful to find a self-consistent, data-driven approach
to determine the baryon distributions instead of assuming the isothermal Bahcall model. One
way to achieve this would be to construct the mass density for stars directly from the data
rather than treating it as an isothermal disk.
      For a more precise determination of the local DM density, the Poisson-Jeans analysis
could be applied to tracers at heights greater than the scale height of the stellar disk to
minimize the latent degeneracy between baryons and DM. However, besides modeling effects

                                              – 20 –
of disequilibria, an analysis at larger scale height has to go beyond the 1D method and
must include terms that couple the motions of tracers in different directions. We also see a
degeneracy between parameters of ordinary DM and thin DD in the marginalized posteriors
obtained through MCMC sampling. To break the degeneracy, we would need to distinguish
between their effects on tracers, develop new observables, and model priors that reflect these


We thank Ian Dell’Antonio, Eric Kramer, Matt Reece, Ben Safdi and Chih-Liang Wu for useful
discussions. JB would like to thank Nicolas Garcia-Trillos and Alexander Fengler for extended
conversations on MCMC sampling methods and Bayesian statistics. This work has made
use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos., processed by the Gaia Data Processing and Analysis Consortium (DPAC, Funding for the DPAC has
been provided by national institutions, in particular the institutions participating in the Gaia
Multilateral Agreement. It also makes use of data products from the Two Micron All Sky
Survey (2MASS), which is a joint project of the University of Massachusetts and the Infrared
Processing and Analysis Center/California Institute of Technology, funded by the National
Aeronautics and Space Administration (NASA) and the National Science Foundation (NSF).
The results in this work were computed using the following open-source packages: astropy
[92], gala [93], gaia_tools [69], and emcee [82]. JF is supported by the DOE grant DE-SC-
0010010 and NASA grant 80NSSC18K1010.

A    Color-magnitude Modeling

                    Effective completeness                         Effective completeness
              12                                             12
                                                   0.8                                            0.8
              10                                             10

               8                                   0.6        8                                   0.6


               6                                              6
                                                   0.4                                            0.4
               4                                              4
                                                   0.2                                            0.2
               2                                              2

               0                                   0.0        0                                   0.0
                   0.0        0.5            1.0                  0.0        0.5            1.0
                           J − Ks                                         J − Ks

Figure 12: The effective completeness in color-magnitude space. Left: 3 J − Ks bins. Right:
20 J − Ks bins.

     We use the gaia_tools package [69], developed for TGAS, for constructing the selection
function and computing the effective completeness. In gaia_tools, the infrared color is
divided into three bins in the range −0.05 < J − Ks < 1.05. In each bin, the completeness is

                                                   – 21 –
an interpolating function of JG , a modified magnitude function that removes the strong color
dependence of TGAS completeness at the faint end J∼12.
      Since the faint end of DR2 extends well beyond J > 12, we use the J magnitude instead
of JG for our computation of effective completeness. As a consistency check, we also vary the
J − Ks color binning from 3 and 20 bins (Fig. 12) and find that the variation of the density
profiles is less than 2%. Thus, we conclude that the effect of color-magnitude modeling is

B            Uncertainty Analysis

In this section, we discuss our choices of bin sizes in the vertical height z and velocity w for
constructing the number density and midplane velocity distribution respectively.

                        R = 150 pc                                              R = 200 pc                                       R = 250 pc
        10                                               10                                                       10
                  A stars                                               A stars                                            A stars
                  F stars                                               F stars                                            F stars
             8                                                8                                                        8
                  Early G stars                                         Early G stars                                      Early G stars

             6                                                6                                                        6
 σ|z| [pc]

                                                  σ|z| [pc]

                                                                                                           σ|z| [pc]
             4                                                4                                                        4

             2                                                2                                                        2

             0                                                0                                                        0
                   50       100        150    200                          50          100      150      200               50         100     150    200
                          |z| [pc]                                                   |z| [pc]                                      |z| [pc]

Figure 13: 1σ spread in the uncertainty (at leading order) of z as a function of z for different
radial cuts.

             The uncertainty in z is given by,
                             2         2          sin b                 2
                                                                                 cos b 2               2 sin b cos b 
                         δz (kpc ) =                                   σ$ +                     σb2 +                            2
                                                                                                                                σ$b                 (B.1)
                                                   $2                                  $                          $3

which is dominated by the parallax uncertainty due to the extra factor of $ in unit of
mas ≈ 10−9 in the first term. We plot the uncertainty in z (at leading order) as a function of
z for all tracers in Fig. 13. Although the maximum uncertainty is ≈ 10 pc, we conservatively
adopt 20 pc as the bin size to account for the underestimation of the reported uncertainties
in DR2 [3].
      Similarly, the uncertainty in w is
                                      σ 2        σ 2                    σ 2
                                       w                      $                 µb
                                              =                        +                     + subleading terms.                                    (B.2)
                                      w                 $                       µb

where the omitted terms are suppressed by 10−2 when |b| < 5◦ . Around the midplane,
σµb /µb <
        ∼ 0.2, which translates to σw ≈ 1.5 km/s. Therefore, we pick 1.5 km/s as the bin size
for obtaining the f0 (w) profile.

                                                                                – 22 –
C             Variation of Midplane Cut

The midplane velocity profile is required in Eq. (3.9) to predict the tracer density for a given
mass model. With partial radial velocity measured by Gaia, we define the midplane in two
ways: one is putting a cut on the galactic latitude |b| < 5o while the other is requiring
|z| < (20 − 50) pc [74]. For both samples, we approximate vR by its mean value hvR i in
Eq. (2.2) when the star’s vR is not measured. However, in the z-cut sample, we discard stars
with |b| > 5o that do not have any vR data.
     The midplane velocity distributions of the z- and b-cut samples are presented in Fig. 14
and agree with each other within 1σ uncertainties. We note that the uncertainties in the
midplane velocity data using z-cut are smaller than those using the b-cut. The uncertainties
are dominated by systematics due to differences between f (w > 0) and f (w < 0). It turns
out that the z-cut data is more symmetric about z = 0 and thus has smaller uncertainties.
In our analysis, we still use the b-cut sample, since there could be a potential selection bias
in the z-cut sample, in which we discard a considerable fraction of stars with five-parameter
astrometric solutions because we don’t know their radial velocities.
                              A stars                                           F stars                                           G stars
              0.20                            z−cut                                             z−cut                                             z−cut
                                              b−cut                                             b−cut                                             b−cut
                                                                0.08                                              0.06



              0.05                                                                                                0.02

              0.00                                              0.00                                              0.00
                     0   10     20       30      40                    0   10     20       30      40                    0   10     20       30      40
                              w [km/s]                                          w [km/s]                                          w [km/s]

Figure 14: Midplane velocity distribution f0 (|w|) for A (left), F (middle) and early G (right)
stars. The distributions obtained using the |b| < 5o cut (green) and the |z| < 20 pc cut (blue)
are consistent within error bars.

D             Bootstrap Statistics

Bootstrap resampling is a standard statistical technique to acquire the mean and uncertainty
when there is only one data set available and analytic propagation of uncertainty cannot be
performed easily. The basic idea of the method is described below.
      Suppose we have a set of N stars labelled as SN = {X1 , X2 , · · · , XN }. Each star
Xk is associated with 6 dimensional phase space coordinates denoted by θk . In bootstrap
resampling, we make random draws with replacement star-by-star from the original set of
stars SN . This generates a new data set SeN of the same size N , with each star labeled as X ek .
Since the draws are with replacement, we expect (many) duplicated coordinate values in the
new data set, such as Xek = θk and Xek+1 = θk , for large N . Therefore, SeN 6= SN in general.
                                                                        (1)   (2)        (B)
      We resample B times the original data set SN , labeling them as SeN , SeN , ..., SeN . The

                                                                            – 23 –
You can also read