Making the Most of Statistical Analyses: Improving Interpretation and Presentation - Stanford University

Page created by Claude Henry
 
CONTINUE READING
Making the Most of Statistical Analyses:
Improving Interpretation and Presentation
Gary King Harvard University
Michael Tomz Harvard University
Jason Wittenberg Harvard University

                                            W
Social scientists rarely take full advan-                 e show that social scientists often do not take full advantage of
tage of the information available in                      the information available in their statistical results and thus
their statistical results. As a conse-                    miss opportunities to present quantities that could shed the
quence, they miss opportunities to          greatest light on their research questions. In this article we suggest an ap-
present quantities that are of greatest     proach, built on the technique of statistical simulation, to extract the cur-
substantive interest for their research     rently overlooked information and present it in a reader-friendly manner.
and express the appropriate degree          More specifically, we show how to convert the raw results of any statistical
of certainty about these quantities. In     procedure into expressions that (1) convey numerically precise estimates of
this article, we offer an approach, built   the quantities of greatest substantive interest, (2) include reasonable mea-
on the technique of statistical simula-     sures of uncertainty about those estimates, and (3) require little specialized
tion, to extract the currently over-        knowledge to understand.
looked information from any statistical          The following simple statement satisfies our criteria: “Other things be-
method and to interpret and present it      ing equal, an additional year of education would increase your annual in-
in a reader-friendly manner. Using this     come by $1,500 on average, plus or minus about $500.” Any smart high
technique requires some expertise,          school student would understand that sentence, no matter how sophisti-
which we try to provide herein, but its     cated the statistical model and powerful the computers used to produce it.
application should make the results of      The sentence is substantively informative because it conveys a key quantity
quantitative articles more informative      of interest in terms the reader wants to know. At the same time, the sen-
and transparent. To illustrate our rec-     tence indicates how uncertain the researcher is about the estimated quan-
ommendations, we replicate the re-          tity of interest. Inferences are never certain, so any honest presentation of
sults of several published works,           statistical results must include some qualifier, such as “plus or minus $500”
showing in each case how the au-            in the present example.
thors’ own conclusions can be ex-
pressed more sharply and informa-
                                            Gary King is Professor of Government, Harvard University, Littauer Center North Yard,
tively, and, without changing any data      Cambridge, MA 02138 (king@harvard.edu). Michael Tomz is a Ph.D. candidate, De-
or statistical assumptions, how our         partment of Government, Harvard University, Littauer Center North Yard, Cambridge,
approach reveals important new infor-       MA 02138 (tomz@fas.harvard. edu). Jason Wittenberg is a fellow of the Weatherhead
                                            Center for International Affairs, Cambridge Street, Cambridge, MA 02138 (jwittenb
mation about the research questions
                                            @latte.harvard.edu).
at hand. We also offer very easy-to-
                                            Our computer program, “CLARIFY: Software for Interpreting and Presenting Statistical
use software that implements our
                                            Results,” designed to implement the methods described in this article, is available at
suggestions.                                http://GKing.Harvard.Edu, and won the Okidata Best Research Software Award
                                            for 1999. We thank Bruce Bueno de Mesquita, Jorge Domínguez, Geoff Garrett, Jay
                                            McCann, and Randy Siverson for their data; Jim Alt, Alison Alter, Steve Ansolabehere,
                                            Marc Busch, Nick Cox, Jorge Domínguez, Jeff Gill, Michael Meffert, Jonathan Nagler,
                                            Ken Scheve, Brian Silver, Richard Tucker, and Steve Voss for their very helpful com-
                                            ments; and U.S./National Science Foundation (SBR-9729884), the Centers for Disease
                                            Control and Prevention (Division of Diabetes Translation), the National Institutes of
                                            Aging, the World Health Organization, and the Global Forum for Health Research for
                                            research support.
                                            American Journal of Political Science, Vol. 44, No. 2, April 2000, Pp. 341–355
                                            ©2000 by the Midwest Political Science Association
                                                                                                                              
                                                               ,  ,   

     In contrast, bad intepretations are substantively am-      second equation gives the systematic component of the
biguous and filled with methodological jargon: “the coef-       model; it indicates how θi changes across observations,
ficient on education was statistically significant at the       depending on values of explanatory variables (typically
0.05 level.” Descriptions like this are very common in so-      including a constant) in the 1 × k vector Xi and effect pa-
cial science, but students, public officials, and scholars      rameters in the k × 1 vector β. The functional form g(⋅,⋅),
should not need to understand phrases like “coefficient,”       sometimes called the link function, specifies how the ex-
“statistically significant,” and “the 0.05 level” to learn      planatory variables and effect parameters get translated
from the research. Moreover, even statistically savvy           into θi.
readers should complain that the sentence does not con-              One member of this general class is a linear-normal
vey the key quantity of interest: how much higher the           regression model, otherwise known as least-squares re-
starting salary would be if the student attended college        gression. To see this, let f (⋅,⋅) be the normal distribution
for an extra year.                                              N(⋅,⋅); set the main parameter vector to the scalar mean
     Our suggested approach can help researchers do bet-        θi = E(Yi) = µi ; and assume that the ancilliary parameter
ter in three ways. First, and most importantly, it can ex-      matrix is the scalar homoskedastic variance α = V(Yi) =
tract new quantities of interest from standard statistical      σ2. Finally, set the systematic component to the linear
models, thereby enriching the substance of social science       form g(Xi ,β) = Xiβ = β0 + Xi1β1 + Xi2β2 + ⋅⋅⋅. The result is
research. Second, our approach allows scholars to assess        familiar:
the uncertainty surrounding any quantity of interest, so it
should improve the candor and realism of statistical dis-                       Yi ~ N(µi ,σ2),    µi = Xi β             (2)
course about politics. Finally, our method can convert raw
statistical results into results that everyone, regardless of   Similarly, one could write a logit model by expressing the
statistical training, can comprehend. The examples in this      stochastic component as a Bernoulli distribution with
article should make all three benefits apparent.                main parameter πi = Pr(Yi = 1)—no ancillary parameter
     Most of this article describes our simulation-based        is necessary—and setting the systematic component to
approach to interpreting and presenting statistical re-         the logistic form:
sults. In many situations, what we do via simulation can
                                                                                                              1
also be done by direct mathematical analysis or other                    Yi ~ Bernoulli(πi),       πi =                  (3)
computationally-intensive techniques, and we discuss                                                      1 + e − X iβ
these approaches as well. To assist researchers in imple-       Equation 1 also includes as special cases nearly every
menting our suggestions, we have developed an easy-to-          other statistical model in the social sciences, including
use, public domain software package called CLARIFY,             multiple-equation models in which Yi is a vector, as well
which we describe in the appendix.                              as specifications for which the probability distribution,
                                                                functional form, or matrix of explanatory variables is es-
                                                                timated rather than assumed to be known.
                                                                      Having estimated the statistical model, many re-
 The Problem of Statistical Interpretation                      searchers stop after a cursory look at the signs and “sta-
                                                                tistical significance” of the effect parameters. This
We aim to interpret the raw results from any member of          approach obviously fails to meet our criteria for mean-
a very general class of statistical models, which we sum-       ingful statistical communication since, for many nonlin-
marize with two equations:                                      ear models, β̂ and α̂ are difficult to interpret and only
                                                                indirectly related to the substantive issues that motivated
               Yi ~ f (θi , α),   θi = g(Xi , β).        (1)    the research (Cain and Watts 1970; Blalock 1967). In-
                                                                stead of publishing the effect coefficients and ancillary
The first equation describes the stochastic component of        parameters, researchers should calculate and present
the statistical model: the probability density (or mass)        quantities of direct substantive interest.
function that generates the dependent variable Yi (i =                Some researchers go a step farther by computing de-
1, . . . , n) as a random draw from the probability density     rivatives, fitted values, and first differences (Long 1997;
f (θi , α). Some characteristics of this function vary from     King 1989, subsection 5.2) which do convey numerically
one observation to the next, while others remain con-           precise estimates of interesting quantities and require
stant across all the i ’s. We represent the varying charac-     little specialized knowledge to understand. Even these
teristics with the parameter vector θi and relegate non-        approaches are inadequate, however, because they ignore
varying features to the ancillary parameter matrix α. The       two forms of uncertainty. Estimation uncertainty arises
                                                                         
                                                                ∞
from not knowing β and α perfectly, an unavoidable             ∫−∞ yP (y )dy , which is not always the most pleasant of ex-
consequence of having fewer than an infinite number of         periences! Alternatively, we could approximate the mean
observations. Researchers often acknowledge this un-           through simulation by drawing many random numbers
certainty by reporting standard errors or t-statistics,        from P(y) and computing their average. If we were inter-
but they overlook it when computing quantities of inter-       ested in the theoretical variance of Y, we could calculate
est. Since β̂ and α̂ are uncertain, any calculations—          the sample variance of a large number of random draws,
including derivatives, fitted values, and first differences—   and if we wanted the probability that Y > 0.8, we could
based on those parameter estimates must also be                count the fraction of draws that exceeded 0.8. Likewise,
uncertain, a fact that almost no scholars take into ac-        we could find a 95-percent confidence interval for a func-
count. A second form of variability, the fundamental un-       tion of Y by drawing 1000 values of Y, computing the
certainty represented by the stochastic component (the         function for each draw, sorting the transformed draws
distribution f ) in Equation 1, results from innumerable       from lowest to highest and taking the 25th and 976th val-
chance events such as weather or illness that may influ-       ues. We could even approximate the entire distribution
ence Y but are not included in X. Even if we knew the ex-      of, say, Y , by plotting a histogram of the square roots of
act values of the parameters (thereby eliminating esti-        a large number of simulations of Y.
mation uncertainty), fundamental uncertainty would                  Approximations can be computed to any desired de-
prevent us from predicting Y without error. Our methods        gree of precision by increasing the number of simula-
for computing quantities of interest must account for          tions (M), which is analagous to boosting the number of
both types of uncertainty.                                     observations in survey sampling. Assessing the precision
                                                               of the approximation is simple: run the same procedure,
                                                               with the same number of simulations, repeatedly. If the
                                                               answer remains the same to within four decimal points
        Simulation-Based Approaches                            across the repetitions, that is how accurate the approxi-
              to Interpretation                                mation is. If more accuracy is needed, raise the number
                                                               of simulations and try again. Nothing is lost by simu-
We recommend statistical simulation as an easy method          lation—except a bit of computer time—and much is
of computing quantities of interest and their uncertain-       gained in ease of use.
ties. Simulation can also help researchers understand the
entire statistical model, take full advantage of the param-
eter estimates, and convey findings in a reader-friendly                     Simulating the Parameters
manner (see Fair 1980; Tanner 1996; Stern 1997).
                                                               We now explain how researchers can use simulation to
                                                               compute quantities of interest and account for uncer-
           What Is Statistical Simulation?                     tainty. The first step involves simulating the main and
                                                               ancillary parameters. Recall that the parameter estimates
Statistical simulation uses the logic of survey sampling to    β̂ and α̂ are never certain because our samples are finite.
approximate complicated mathematical calculations. In          To capture this estimation uncertainty, we draw many
survey research, we learn about a population by taking a       plausible sets of parameters from their posterior or sam-
random sample from it. We use the sample to estimate a         pling distribution. Some draws will be smaller or larger
feature of the population, such as its mean or its vari-       than β̂ and α̂ , reflecting our uncertainty about the exact
ance, and our estimates become more precise as we in-          value of the parameters, but all will be consistent with
crease the sample size, n. Simulation follows a similar        the data and statistical model.
logic but teaches us about probability distributions,               To simulate the parameters, we need the point esti-
rather than populations. We learn about a distribution by      mates and the variance-covariance matrix of the esti-
simulating (drawing random numbers) from it and us-            mates, which most statistical packages will report on re-
ing the draws to approximate some feature of the distri-       quest. We denote γ̂ as the vector produced by stacking β̂
bution. The approximation becomes more accurate as we          on top of α̂ . More formally, γˆ = vec (βˆ , αˆ ) , where “vec”
increase the number of draws, M. Thus, simulation en-          stacks the unique elements of β̂ and α̂ in a column vec-
ables us to approximate any feature of a probability dis-      tor. Let Vˆ (γˆ ) designate the variance matrix associated
tribution without resorting to advanced mathematics.           with these estimates. The central limit theorem tells us
     For instance, we could compute the mean of a prob-        that with a large enough sample and bounded variance,
ability distribution P(y) by taking the integral E(Y) =        we can randomly draw (simulate) the parameters from a
                                                                      ,  ,   

multivariate normal distribution with mean equal to γ̂                these is equivalent to the expected value ( Yˆ ) in a linear
                      ˆ (γˆ ) .1 Using our notation,
and variance equal to V                                               regression, which we discuss in the following subsection.
                                                                          To simulate one predicted value, follow these steps:
                               (
                        γ˜ ~ N γˆ , V     )
                                    ˆ (γˆ ) .                  (4)
                                                                      1. Using the algorithm in the previous subsection, draw
                                                                         one value of the vector γ˜ = vec(β˜ , α ˜ ).
    Thus, we can obtain one simulation of γ by following
                                                                      2. Decide which kind of predicted value you wish to
these steps:
                                                                         compute, and on that basis choose one value for each
1. Estimate the model by running the usual software                      explanatory variable. Denote the vector of such val-
   program (which usually maximizes a likelihood func-                   ues Xc .
   tion), and record the point estimates γ̂ and variance              3. Taking the simulated effect coefficients from the top
   matrix Vˆ (γˆ ) .                                                     portion of γ̃ , compute θ˜ c = g (X c , β˜ ) , where g(⋅,⋅) is
2. Draw one value of the vector γ from the multivariate                  the systematic component of the statistical model.
   normal distribution in Equation 4. Denote the γ̃ =                 4. Simulate the outcome variable Y˜c by taking a random
    vec(β˜ , α˜ ).                                                       draw from f (θ˜ c , α˜ ), the stochastic component of the
                                                                         statistical model.
Repeat the second step, say, M = 1000 times to obtain
1000 draws of the main and ancillary parameters.                      Repeat this algorithm, say, M = 1000 times, to produce
     If we knew the elements of γ perfectly, the sets of              1000 predicted values, thereby approximating the entire
draws would all be identical; the less information we have            probability distribution of Yc . From these simulations
about γ (due to larger elements in the variance matrix),              the researcher can compute not only the average pre-
the more the draws will differ from each other. The spe-              dicted value but also measures of uncertainty around the
cific pattern of variation summarizes all knowledge                   average. The predicted value will be expressed in the
about the parameters that we can obtain from the statis-              same metric as the dependent variable, so it should re-
tical procedure. We still need to translate γ into substan-           quire little specialized knowledge to understand.
tively interesting quantities, but now that we have sum-
marized all knowledge about γ we are well positioned to
make the translation. In the next three subsections, we                                      Expected Values
describe algorithms for converting the simulated param-
                                                                      Depending on the issue being studied, the expected or
eters into predicted values, expected values, and first
                                                                      mean value of the dependent variable may be more inter-
differences.
                                                                      esting than a predicted value. The difference is subtle but
                                                                      important. A predicted value contains both fundamental
                      Predicted Values                                and estimation uncertainty, whereas an expected value
                                                                      averages over the fundamental variability arising from
Our task is to draw one value of Y conditional on one                 sheer randomness in the world, leaving only the estima-
chosen value of each explanatory variable, which we                   tion uncertainty caused by not having an infinite num-
represent with the vector Xc. Denote the simulated θ as               ber of observations. Thus, predicted values have a larger
 θ̃c and the corresponding Y as Y˜c , a simulated predicted           variance than expected values, even though the average
value. Predicted values come in many varieties, depend-               should be nearly the same in both cases.2
ing on the kind of X-values used. For instance, Xc may                     When choosing between these two quantities of in-
correspond to the future (in which case Y˜c is a simulated            terest, researchers should reflect on the importance of
forecast), a real situation described by observed data                fundamental uncertainty for the conclusions they are
(such that Y˜c is a simulated predicted value), or a hypo-            drawing. In certain applications, such as forecasting the
thetical situation not necessarily in the future (making              actual result of an election or predicting next month’s
Y˜c a simulated counterfactual predicted value). None of              foreign exchange rate, scholars and politicians—as well
                                                                      as investors—want to know not only the expected out-
                                                                      come, but also how far the outcome could deviate from
1This distributional statement is a shorthand summary of the
                                                                      expectation due to unmodeled random factors. Here, a
Bayesian, likelihood, and Neyman-Pearson theories of statistical
inference. The interpretive differences among these theories (such
as whether θ or θ̂ is the random variable) are important but need
not concern us here, as our approach can usually be employed          2In linear models, the average predicted value is identical to the ex-

with any of these and most other theories of inference (see Barnett   pected value. For nonlinear cases, the two can differ but are often
1982).                                                                close if the nonlinearity is not severe.
                                                                                

predicted value seems most appropriate. For other appli-                  For instance, to simulate a first difference for the first
cations, the researcher may want to highlight the average           explanatory variable, set the values for all explanatory
effect of a particular explanatory variable, so an expected         variables except the first at their means and fix the first
value would be the best choice.                                     one at its starting point. Denote this vector of starting
     We now offer an algorithm for creating one simula-             values for the explanatory variables as Xs and run the ex-
tion of an expected value:                                          pected value algorithm once to generate E˜(Y s ) , the aver-
1. Following the procedure for simulating the param-                age value of Y conditional on Xs. Next change the value of
   eters, draw one value of the vector γ˜ = vec (β˜ , α˜ ).         the first explanatory variable to its ending point, leaving
2. Choose one value for each explanatory variable and               the others at their means as before. Denote the new vec-
   denote the vector of values as Xc .                              tor as Xe and rerun the algorithm to get E˜(Ye ) , the mean
3. Taking the simulated effect coefficients from the top            of Y conditional on Xe. The first difference is simply
   portion of γ̃ , compute θ˜ c = g (X c , β˜ ) , where g(⋅,⋅) is    E˜(Ye ) − E˜(Y s ) . Repeat the first difference algorithm, say,
   the systematic component of the statistical model.               M = 1000 times to approximate the distribution of first
4. Draw m values of the outcome variable Y˜c(k ) (k =               differences. Average the simulated values to obtain a
   1, . . . , m) from the stochastic component f (θ˜ c , α˜ ).      point estimate, compute the standard deviation to obtain
   This step simulates fundamental uncertainty.                     a standard error, or sort the values to approximate a con-
5. Average over the fundamental uncertainty by calcu-               fidence interval.
   lating the the mean of the m simulations to yield one                  We previously discussed expected values of Y, and
                                                                    until now this section has considered first differences
   simulated expected value E˜(Yc ) = ∑k =1Y˜c(k ) m .
                                             m
                                                                    based on only this type of expected value. Different ex-
When m = 1, this algorithm reduces to the one for pre-              pectations, such as Pr(Y = 3) in an ordered-probit model,
dicted values. If m is a larger number, Step 4 accurately           may also be of interest. For these cases, the expected
portrays the fundamental variability, which Step 5 aver-            value algorithm would need to be modified slightly. We
ages away to produce an expected value. The larger the              have made the necessary modifications in CLARIFY, the
value of m, the more successful the algorithm will be in            software package described in the appendix, which al-
purging E˜(Yc ) of any fundamental uncertainty.                     lows researchers to calculate a wide variety of expected
     To generate 1000 simulations of the expected value,            values and first differences, as well as predicted values
repeat the entire algorithm M = 1000 times for some                 and other quantities of interest.
fixed value of m. The resulting expected values will differ               The algorithms in this article do not require new as-
from each other due to estimation uncertainty, since each           sumptions; rather, they rest on foundations that have be-
expected value will correspond to a different γ̃ . These M          come standard in the social sciences. In particular, we as-
simulations will approximate the full probability distri-           sume that the statistical model is identified and correctly
bution of E(Yc), enabling the researcher to compute aver-           specified (with the appropriate explanatory variables and
ages, standard errors, confidence intervals, and almost             functional form), which allows us to focus on interpret-
anything else desired.                                              ing and presenting the final results. We also assume that
     The algorithm works in all cases but involves some             the central limit theorem holds sufficiently for the avail-
approximation error, which we can reduce by setting                 able sample size, such that the sampling distribution of
both m and M sufficiently high. For some statistical                parameters (not the stochastic component) can be de-
models, there is a shortcut that curtails both computa-             scribed by a normal distribution.3 Although we focus on
tion time and approximation error. Whenever E(Yc) = θc ,            asymptotic results, as do the vast majority of the applied
the researcher can skip steps 4–5 of the expected value al-         researchers using nonlinear models, simulation works
gorithm, since steps 1–3 suffice to simulate one expected           with finite sample distributions, which are preferable
value. This shortcut is appropriate for the linear-normal           when feasible. In short, our algorithms work whenever
and logit models in Equations 2 and 3.                              the usual assumptions work.

                     First Differences                                               Alternative Approaches

A first difference is the difference between two expected,          In this section, we discuss several other techniques for
rather than predicted, values. To simulate a first differ-          generating quantities of interest and measuring the un-
ence, researchers need only run steps 2–5 of the expected           certainty around them. These approaches can be valuable
value algorithm twice, using different settings for the ex-         3From   a Bayesian perspective, we exclude unusual cases where a
planatory variables.                                                flat prior generates an improper posterior.
                                                               ,  ,   

complements to simulation, because they provide impor-         rameters are not of intrinsic interest, researchers must
tant mathematical intuition or, in some cases, enable fi-      convert them into quantities such as predicted values, ex-
nite sample approximations. We briefly summarize some          pected values, and first differences. The algorithms above
of the leading computer-intensive and analytical alterna-      show how to make the conversion and are therefore es-
tives to simulation.                                           sential supplements. Indeed, our software, CLARIFY,
                                                               could easily be modified to interpret the parameters gen-
Computer-intensive alternatives. Our version of simula-        erated by these alternative approaches.
tion is not the only computer-intensive technique for ob-
taining quantities of interest and measures of uncer-          Analytical approaches. The main analytical (mathemati-
tainty. Fully Bayesian methods, using Markov-Chain             cal) alternative to simulation is the delta method, which
Monte Carlo techniques, are more powerful than our al-         uses the tools of calculus to approximate nonlinear func-
gorithms because they allow researchers to draw from           tions of random variables (van der Vaart 1998). Suppose
the exact finite-sample distribution, instead of relying on    that we are interested in the mean and variance of θ =
the central limit theorem to justify an asymptotic normal      g(Xc , β), where g is a nonlinear function. Assuming that g
approximation (Carlin and Louis 1996). Unfortunately           is approximately linear in a range where β has high prob-
these methods remain difficult to use, particularly since      ability, then a Taylor-series expansion of g about β̂ is of-
statisticians still disagree about appropriate criteria for    ten reasonable. To the first order, θ ≈ g (βˆ ) + g ′(βˆ )(β − βˆ ) ,
determining when a Markov chain has converged in dis-          where g ′(a ) = ∂g (a )/ ∂a . As a result, the maximum-
tribution to the true posterior (Cowles and Carlin 1996;       likelihood estimate of θ is approximately g (βˆ ) , and its
Kass et al. 1998). Nonetheless, this field has shown re-       variance is approximately g ′(βˆ )V (βˆ )g ′(βˆ )′. For example,
markable progress over the last decade and is well worth       in the exponential Poisson regression model (King 1989,
monitoring by political scientists.                            Chapter 5), where Y is Poisson with mean λ = E (Y | X )
     Another useful alternative is bootstrapping, a non-        = e X iβ, suppose we wish to compute the expected num-
parametric approach that relies on the logic of re-            ber of events given X = X0. In this case, the maximum
sampling to approximate the distribution of parameters         likelihood estimate of the expected number of events is
                                                                               ˆ
(Mooney and Duval 1993; Mooney 1996). In theory, the            g (βˆ ) = e X 0β and its variance is (X 0e X 0β )Vˆ (βˆ )(X 0e X 0β )′ .
sampling distribution of γ̂ can be viewed as a histogram       Note that this maximum-likelihood estimate still does
of an infinite number of γ̂’s, each estimated from a dif-      not reflect the uncertainty in β̂ , as done automatically by
ferent sample of size n from the same population. Boot-        simulation and the other computationally intensive
strapping mimicks this process by drawing many sub-            methods. To incorporate this additional uncertainty re-
samples (with replacement) from the original sample,           quires another level of mathematical complexity in that
                                                                                                                     ˆ
estimating γ̂ for each subsample, and then constructing        we must now approximate the integral ∫ e X 0βP (βˆ )dβˆ and
a histogram of the various γ̂ ’s. Bootstrapping has many       its variance. A detailed example is given by King and
advantages. It does not require strong distributional as-      Zeng (1999) in the case of logistic regression.
sumptions, and Monte Carlo studies have demonstrated                  Despite its utility for increasing computing speed
that it has superior small sample properties for some          and revealing statistical intuition through mathematical
problems. It also has the advantage of not requiring           analysis, the delta method suffers from two shortcomings
strong parametric distributional assumptions. Program-         that simulation can help overcome. First, the method is
ming a bootstrapped estimator is not difficult, although       technically demanding, since it requires researchers to
commercial software packages have not been quick to            calculate derivatives and compute the moments of lin-
adopt this method. The weakness of bootstrapping is            earized functions.4 Thus, it is not surprising that most
that it gives biased estimates for certain quantities of in-   scholars do not use the delta method, even when they ap-
terest, such as max(Y ).                                       preciate the importance of reporting uncertainty. Sec-
     For both Bayesian methods and bootstrapping, all of       ond, the Taylor series used in the delta method only ap-
the methods of interpretation we discuss in this article       proximates a nonlinear form. Although researchers can
can be used directly. The only change is that instead of       sometimes improve the approximation with additional
drawing the parameters from the multivariate normal in         terms in the Taylor series, this can be difficult, and find-
Equation 4, we would use MCMC-based simulation or
                                                               4 When g is linear there is obviously no need for a linearizing ap-
bootstrapping. Even our software, CLARIFY, could be
used without additional programming.                           proximation; an exact analytical solution exists for the mean and
                                                               variance of many quantities of interest that we described earlier.
     Like our method, MCMC and bootstrapping gener-            Simulation produces the same answer, however, and requires less
ate simulations of the parameters. In cases where the pa-      mathematical proficiency.
                                                                            

ing estimates of the additional terms is often impossible.      tion. Parameters have different logical statuses, such as
In practice most researchers stop after expanding the se-       the effect parameters β versus the ancillary parameters α,
ries to the first or second order, which can compromise         but our algorithms do not need to distinguish between
the accuracy of the approximation. With simulation one          the two: both are uncertain and should be simulated,
can achieve an arbitrarily high degree of precision simply      even if only one proves useful in later calculations. It may
by increasing M and letting the computer run longer.            be possible to accelerate our algorithms by excluding cer-
     Several general arguments weigh in favor of simula-        tain parameters from the simulation stage, but for the
tion. First, there is a simulation-based alternative to         vast majority of applications these tricks are unnecessary
nearly every analytical method of computing quantities          and could lead to errors. Researchers usually will risk
of interest and conducting statistical tests, but the reverse   fewer mistakes by following, without deviation, our algo-
is not true (Noreen 1989). Thus, simulation can provide         rithm for simulating the parameters.
accurate answers even when no analytical solutions exist.            In some statistical models, the elements of γ are or-
Second, simulation enjoys an important pedagogical ad-          thogonal, so software packages provide separate variance
vantage. Studies have shown that, no matter how well            matrices for each set. When implementing the algorithm
analytical methods are taught, students get the right an-       for simulating the parameters, researchers may want to
swer far more often via simulation (Simon, Atkinson,            create a bloc diagonal matrix by placing the separately es-
and Shevokas 1976). One scholar has even offered a              timated variance matrices on the diagonal and inserting
$5,000 reward for anyone who can demonstrate the su-            zeros everywhere else. Obviously, if the subsets of γ truly
periority of teaching analytical methods, but so far no         are orthogonal, equivalent draws from the two sets can
one has managed to earn the prize (Simon 1992). Of              be made from independent multivariate normal distri-
course, there are advantages to the insight the mathemat-       butions, but it may be easier to work with a single sam-
ics underlying the delta method can reveal and so, when         pling distribution.
feasible, we encourage researchers to learn both simula-             Researchers should reparameterize the elements of γ
tion and analytical methods.                                    to increase the likelihood that the asymptotic multivari-
                                                                ate normal approximation will hold in finite samples. In
                                                                general, all parameters should be reparameterized unless
                                                                they are already unbounded and logically symmetric, as a
                 Tricks of the Trade                            Normal must be. For instance, a variance parameter like
                                                                σ2 must be greater than zero, so it will pay to reparam-
The algorithms in the previous section apply to all statis-     eterize by using an expression like σ2 = eη. This allows re-
tical models, but they can be made to work better by fol-       searchers to estimate η, which is on the scale from –∞ to
lowing a few tricks and avoiding some common mis-               ∞, as one element of γ , which is assumed to be multi-
understandings.                                                 variate normal. When making reparameterizations, of
                                                                course, we add an extra step to the algorithm for simulat-
          Tricks for Simulating Parameters                      ing the parameters: after drawing γ from the multivariate
                                                                normal, we reparameterize back to the original scale by
Statistical programs usually report standard errors for         computing σ˜ 2 = e η˜ .5
parameter estimates, but accurate simulation requires                Several other reparameterizations may come in
the full variance matrix Vˆ (γˆ ) . The diagonal elements of    handy. A correlation parameter ρ, ranging from –1 to 1,
Vˆ (γˆ ) contain the squared standard errors, while the off-    can be reparameterized to η (reusing the same symbol)
diagonal elements express the covariances between one           on an unbounded scale with the inverse of Fisher’s Z
parameter estimate and another in repeated draws from           transformation: ρ = (e2η – 1)/(e2η + 1). Likewise, a
the same probability distribution. Simulating each pa-          parameter representing a probability π can be made
rameter independently would be incorrect, because this
procedure would miss the covariances among param-               5Reparameterization    also makes likelihood-maximization algo-
eters. Nearly all good statistical packages can report the      rithms easier to use by avoiding problems caused by the optimiza-
                                                                tion procedure choosing inadmissable parameter values (which of-
full variance matrix, but most require researchers to re-       ten result in the program terminating abnormally because of
quest it explicitly by setting an option or a global. The       attempts to divide by zero or logging negative numbers). Since
software described in the appendix fetches the variance         maximum likelihood estimates are invariant to reparameteri-
matrix automatically.                                           zation, the reparameterization has no effect except on the finite
                                                                sample distribution around the point estimate. For example, esti-
       One common mistake is to exclude some parameters         mating σ̂ 2 directly gives the same estimate as estimating η̂ and
when drawing from the multivariate normal distribu-             transforming to σ̂ 2 by using σˆ 2 = e ηˆ .
                                                                ,  ,   

unbounded using the logistic transformation, π =                 speed the approximation or make it more accurate for a
[1 + e–η]–1. These and other tricks should enhance the ef-       fixed number of simulations, they should be incorpo-
fectiveness of simulating the parameters.                        rated into computer programs for general use. In some
                                                                 cases, analytical computations are also possible and can
      Tricks for Simulating Quantities of Interest               get speedier results. But our algorithms provide social
                                                                 scientists with all they need to understand the funda-
When converting the simulated parameters into quanti-            mental concepts: which quantities are being computed
ties of interest, it is safest to simulate Y and use this as a   and how the computation is, or could be, done. More-
basis for obtaining other quantities. This rule is equiva-       over, as long as M and m are large enough, these and all
lent to incorporating all simulated parameters—and thus          other correct algorithms will give identical answers.
all information from the statistical model—into the cal-
culations. Of course, some shortcuts do exist. We have al-
ready mentioned that, in a logit model, one can obtain
 E˜(Y ) by stopping with π̃ , since drawing dichotomous                          Empirical Examples
Y ’s and averaging would yield exactly π̃ . If one is not
sure, though, it is helpful to continue until one has simu-      To illustrate how our algorithms work in practice, we in-
lations of the outcome variable.                                 clude replications or extensions of five empirical works.
      If some function of Y, such as ln(Y), is used as the de-   Instead of choosing the most egregious, we choose a
pendent variable during the estimation stage, the re-            large number of the best works, from our most presti-
searcher can simulate ln(Y) and then apply the inverse           gious journals and presses, written by some of our most
function exp(ln(Y)) to reveal Y. We adopt this procedure         distinguished authors. Within this group, we eliminated
in the first example, below, where we estimate a log-log         the many publications we were unable to replicate and
regression model. This sequence of simulation and trans-         then picked five to illustrate a diverse range of models
formation is crucial, since the usual procedure of calcu-        and interpretative issues. The procedures for model in-
lating E(ln(Y)) = µ̂ without simulation and then expo-           terpretation in all five were exemplary. If we all followed
nentiating gives the wrong answer: exp(µˆ ) ≠ Yˆ . With          their examples, reporting practices in the discipline
simulation, both Y and E(Y) can be computed easily, re-          would be greatly improved. For each article, we describe
gardless of the scale that the researcher used during the        the substantive problem posed and statistical model cho-
estimation stage.                                                sen; we also accept rather than evaluate their statistical
      Researchers should assess the precision of any simu-       procedures, even though in some cases the methods
lated quantity by repeating the entire algorithm and see-        could be improved. We then detail how the authors inter-
ing if anything of substantive importance changes. If            preted their results and demonstrate how our procedures
something does change, increase the number of simula-            advance this state of the art.
tions (M and, in the case of expected values, m) and try
again. In certain instances—particularly when the re-                              Linear Regression
searcher has misspecified a nonlinear statistical model—
the number of simulations required to approximate an             Following Tufte (1974), we estimated a log-log regression
expected value accurately may be larger than normal.             model of the size of government in the U.S. states. Our
Numerical estimates should be reported to the correct            dependent variable, Yi , was the natural log of the number
level of precision, so for instance if repeated runs with        of people (measured in 1000s) that the state government
the same number of simulations produce an estimate               employed on a full-time basis in 1990. Tufte was inter-
that changes only in the fourth decimal point, then—             ested (for a pedagogical example) in whether Yi would
assuming this is sufficient for substantive purposes—the         increase with state population; but consider another hy-
number reported should be rounded to two or three                pothesis that may be of more interest to political scien-
decimal points.                                                  tists: the number of employees might depend on the pro-
      The simulation procedures given in this article can        portion of Democrats in the state legislature, since
be used to compute virtually all quantities that might be        Democrats are reputed to favor bigger government than
of interest for nearly all parametric statistical models that    Republicans, even after adjusting for state population.
scholars might wish to interpret. As such, they can be           Thus, our two main explanatory variables were the log of
considered canonical methods of simulation. Numerous             state population Pi in 1000s and the logged proportion of
other simulation algorithms are available, however, in the       lower-house legislators who identified themselves as
context of specific models. When these alternatives could        Democrats Di.
                                                                                               

      We applied the predicted value algorithm to predict         FIGURE 1                              Probability of Voting by Age
the number of government employees in a state with six
million people and an 80 percent Democratic house.                                        1
                                                                                                                           ree
First, we used the statistical software described in the ap-                                                 e   deg
pendix to estimate the log-linear model and simulate one                                               l leg

                                                                  Probability of Voting
                                                                                          .8      co
set of values for the effect coefficients (β˜ ) and the ancil-                                                                    eg r
                                                                                                                                      ee
                                                                                                                              d
lary parameter (σ˜ ) . Next, we set the main explanatory                                                                   ol
                                                                                                                      ho
variables at Pc = ln(6000) and Dc = ln(0.8), so we could                                  .6                     sc
                                                                                                          gh
construct Xc and compute θ˜ c = X c β˜ . We then drew one

                                                                                                   hi
value of Y˜c from the normal distribution N (θ˜ c , σ 2 ) . Fi-                           .4
nally, we calculated exp(Y˜c ) to transform our simulated
value into the actual number of government employees,
                                                                                          .2
a quantity that seemed more understandable than its
natural logarithm. By repeating this process M = 1000                                          18 24 30 36 42 48 54 60 66 72 78 84 90 95
times, we generated 1000 predicted values, which we                                                                         Age of Respondent
sorted from lowest to highest. The numbers in the 25th
                                                                  Vertical bars indicate 99-percent confidence intervals
and the 976th positions represented the upper and lower
bounds of a 95-percent confidence interval. Thus, we
predicted with 95-percent confidence that the state gov-
ernment would employ between 73,000 and 149,000
                                                                  = {1, Ai , Ai2 , Ei , Ii , Ri}, where 1 is a constant and Ai2 is
people. Our best guess was 106,000 full-time employees,
                                                                  the quadratic term.
the average of the predicted values.
                                                                       In our logit model, the probability of voting in a
      We also calculated some expected values and first
                                                                  presidential election is E(Yi ) = πi , an intuitive quantity of
differences and found that increasing Democratic con-
                                                                  interest. We estimated this probability, and the uncer-
trol from half to two-thirds of the lower house tended to
                                                                  tainty surrounding it, for two different levels of educa-
raise state government employment by 7,000 people on
                                                                  tion and across the entire range of age, while holding
average. The 95-percent confidence interval around this
                                                                  other variables at their means. In each case, we repeated
first difference ranged from 3,000 to 12,000 full-time em-
                                                                  the expected value algorithm M = 1000 times to approxi-
ployees. Our result may be worth following up, since, to
                                                                  mate a 99-percent confidence interval around the prob-
the best of our knowledge, researchers have not ad-
                                                                  ability of voting. The results appear in Figure 1, which il-
dressed this relationship in the state-politics literature.
                                                                  lustrates the conclusions of Rosenstone and Hansen
                                                                  quite sharply: the probability of voting rises steadily to a
                       Logit Models                               plateau between the ages of 45 and 65, and then tapers
                                                                  downward through the retirement years. The figure also
The algorithms in the third section can also help re-
                                                                  reveals that uncertainty associated with the expected
searchers interpret the results of a logit model. Our ex-
                                                                  value is greatest at the two extremes of age: the vertical
ample draws on the work of Rosenstone and Hansen
                                                                  bars, which represent 99-percent confidence intervals,
(1993), who sought to explain why some individuals are
                                                                  are longest when the respondent is very young or old.6
more likely than others to vote in U.S. presidential elec-
tions. Following Rosenstone and Hanson, we pooled data
from every National Election Study that was conducted                                          A Time-Series Cross-Sectional Model
during a presidential election year. Our dependent vari-
                                                                  We also used our algorithms to interpret the results of a
able, Yi , was coded 1 if the respondent reported voting in
                                                                  time-series cross-sectional model. Conventional wisdom
the presidential election and 0 otherwise.
                                                                  holds that the globalization of markets has compelled
     For expository purposes we focus on a few demo-
                                                                  governments to slash public spending, but a new book
graphic variables that Rosenstone and Hanson empha-
                                                                  by Garrett (1998) offers evidence to the contrary. Where
sized: Age (Ai) and Education (Ei) in years, Income (Ii)
                                                                  strong leftist parties and encompassing trade unions
in 10,000s of dollars, and Race (coded Ri = 1 for whites
                                                                  coincide, Garrett argues, globalization leads to greater
and 0 otherwise). We also include a quadratic term to
test the hypothesis that turnout rises with age until the         6 Theconfidence intervals are quite narrow, because the large
respondent nears retirement, when the tendency re-                number of observations (N = 15,837) eliminated most of the esti-
verses itself. Thus, our set of explanatory variables is Xi       mation uncertainty.
                                                                       ,  ,   

government spending as a percentage of GDP, whereas                    TABLE 1                 Garrett’s Counterfactual Effects on
the opposite occurs in countries where the left and labor                                      Government Spending (% of GDP)
are weak.
     To support his argument, Garrett constructed a                                                   Trade                   Capital Mobility
panel of economic and political variables, measured an-                                           Low        High                Low        High
nually, for fourteen industrial democracies during the
period 1966–1990. He then estimated a linear-normal re-                Left-labor        Low      43.1       41.9               42.8        42.3
gression model where the dependent variable, Yi , is gov-               power
                                                                                        High      43.5       44.2               43.1        44.5
ernment spending as a percentage of GDP for each coun-
try-year in the data set. The three key explanatory
variables were Capital mobility, Ci (higher values indicate            Each entry is the expected level of government spending for given con-
                                                                       figurations left-labor power and trade or capital mobility, holding all other
fewer government restrictions on cross-border financial                variables constant at their means.
flows), Trade, Ti (larger values mean more foreign trade
as a percentage of GDP), and Left-labor power, Li (higher
scores denote a stronger combination of leftist parties
and labor unions).7                                                    ernment spending where countries were highly inte-
     To interpret his results, Garrett computed                        grated into the international economy than in more
                                                                       closed contexts.” Finally, “where left-labor power was low,
      a series of counterfactual estimates of government               government spending decreased if one moved from low
      spending under different constellations of domestic              to high levels of market integration, but the converse was
      political conditions and integration into global mar-            true at high levels of left-labor power” (1998, 83).
      kets. This was done by set ting all the other variables               Garrett’s counterfactuals go far beyond the custom-
      in the regression equations equal to their mean lev-             ary list of coefficients and t-tests, but our tools can help
      els and multiplying these means by their corre-                  us extract even more information from his model and
      sponding coefficients, and then by examining the                 data. For instance, simulation can reveal whether the dif-
      counterfactual impact of various combinations of                 ferences in values across the cells might have arisen by
      left-labor power and globalization. . . . (1998, 82)             chance alone. To make this assessment, we reestimated
                                                                       the parameters in Garrett’s regression equation9 and
In particular, Garrett distinguished between low and high              drew 1000 sets of simulated coefficients from their poste-
levels of Li , Ti , and Ci. For these variables, the 14th per-         rior distribution, using the algorithm for simulating the
centile in the dataset represented a low value, whereas the            parameters. Then we fixed Lc and Tc at their 14th percen-
86th percentile represented a high one.8                               tiles, held other variables at their means, and calculated
     The counterfactual estimates appear in Table 1,                   1000 (counterfactual) expected values, one for each set of
which Garrett used to draw three conclusions. First,                   simulated coefficients. Following the same procedure, we
“government spending was always greater when left-                     produced counterfactuals for the other combinations of
labor power was high than when it was low, irrespective                Lc, Tc, and Cc represented by the cells of Table 1. Finally,
of the level of market integration” (entries in the second             we plotted “density estimates” (which are smooth ver-
row in each table exceeded values in the first row). Sec-              sions of histograms) of the counterfactuals; these appear
ond, “the gap between the low and high left-labor power                in Figure 2. One can think of each density estimate as a
cells was larger in the high trade and capital mobility                pile of simulations distributed over the values govern-
cases than in the cells with low market integration,” im-              ment spending. The taller the pile at any given level of
plying that “partisan politics had more impact on gov-                 government spending, the more simulations took place
                                                                       near that point.
                                                                            Figure 2 shows that when globalization of trade or
7Garrett also focused on two interactions among the variables, C L
                                                                i i    capital mobility is low, leftist governments spend only
and TiLi , and he included a battery of business cycle and demo-       slightly more than rightist ones. More importantly, the
graphic controls, as well as the lagged level of government spend-
ing and dummy variables for countries and time.
8“So as not to exaggerate the substantive effects” of the relation-    9Our  coefficients differed from those in Garrett (1998, 80–81) by
ships he was studying, Garrett “relied on combinations of the 20th     only 0.3 percent, on average. Standard errors diverged by 6.8 per-
and 80th percentile scores” (1998, 82). Unfortunately, due to a mi-    cent, on average, apparently due to discrepancies in the method of
nor arithmetic error, the values he reports (1998, 84) correspond      calculating panel-corrected standard errors (Franzese 1996).
only to the 14th and 86th percentiles. To facilitate comparison with   None of the differences made any substantive difference in the
Garrett, we use the 14th and 86th percentiles in our simulations.      conclusions.
                                                                                                                        

FIGURE 2                         Simulated Levels of Government Spending

                         Low Exposure to Trade                                       High Exposure to Trade
                   2.0                                                         2.0
Density Estimate

                                                            Density Estimate
                   1.5                                                         1.5

                   1.0                                                         1.0

                   0.5                                                         0.5

                   0.0                                                         0.0
                      40       42          44          46                         40       42          44          46
                        Government Spending (% of GDP)                              Government Spending (% of GDP)

                           Low Capital Mobility                                       High Capital Mobility
                   2.0                                                         2.0
Density Estimate

                                                            Density Estimate

                   1.5                                                         1.5

                   1.0                                                         1.0

                   0.5                                                         0.5

                   0.0                                                         0.0
                      40       42          44          46                         40       42          44          46
                        Government Spending (% of GDP)                              Government Spending (% of GDP)

These panels contain density estimates (smooth versions of histograms) of expected govern-
ment spending for countries where left-labor power is high (the solid curve) and low (the dotted
curve). The panels, which add uncertainty estimates to the concepts in Table 1, demonstrate
that left-labor power has a distinguishable effect only when exposure to trade or capital mobil-
ity is high.

density estimates overlap so thoroughly that it is difficult                                          presidential candidates: Carlos Salinas (from the ruling
to distinguish the two spending patterns with much con-                                               PRI), Manuel Clouthier (representing the PAN, a right-
fidence. (Another way to express one aspect of this point                                             wing party), and Cuauhtémoc Cárdenas (head of a leftist
is that the means of the two distributions are not statisti-                                          coalition). The election was historically significant, be-
cally distinguishable at conventional levels of signifi-                                              cause for the first time all three presidential candidates
cance.) In the era of globalization, by contrast, domestic                                            appeared to be highly competitive. Domínguez and
politics exerts a powerful effect on fiscal policy: leftist                                           McCann used a multinomial logit model to explain why
governments outspend rightist ones by more than two                                                   some voters favored one candidate over the others. The
percent of GDP on average, a difference we can affirm                                                 following equations summarize the model, in which Yi
with great certainty, since the density estimates for the                                             and πi are 3 × 1 vectors:
two regime-types are far apart. In summary, our simula-
tions cause us to question Garrett’s claim that left-labor                                                   Yi ~ Multinomial(πi)
governments always outspend the right, regardless of the
                                                                                                                        e X iβ j
level of market integration: although the tendency may                                                       πi =                        where j = 1, 2, 3 candidates.   (5)
                                                                                                                    ∑k =1 e X β
                                                                                                                        3
be correct, the results could have arisen from chance                                                                              i k

alone. The simulations do support Garrett’s claim that
globalization has intensified the relationship between                                                The effect parameters can vary across the candidates, so
partisan politics and government spending.                                                            β1, β2 , and β3 are distinct vectors, each with k × 1
                                                                                                      elements.10
                                Multinomial Logit Models                                                   The book focuses on individual voting behavior, as
                                                                                                      is traditional in survey research, but we used simulation
How do citizens in a traditional one-party state vote when                                            to examine the quantity of interest that motivated
they get an opportunity to remove that party from office?
Domínguez and McCann (1996) addressed this question                                                   10Domínguez    and McCann included thirty-one explanatory vari-
by analyzing survey data from the 1988 Mexican presi-                                                 ables in their model. For a complete listing of the variables and
dential election. In that election, voters chose among three                                          question wording, see Domínguez and McCann (1996, 213–216).
                                                                 ,  ,   

Domínguez and McCann in the first place: the electoral           FIGURE 3            Simulated Electoral Outcomes
outcome itself. In particular, if every voter thought the
                                                                                                 Cardenas
PRI was weakening, which candidate would have won
the presidency? To answer this question, we coded each
voter as thinking that the PRI was weakening and let
other characteristics of the voter take on their true
values. Then we used the predicted value algorithm to
simulate the vote for each person in the sample and used                                                            .
                                                                                                       ...................
the votes to run a mock election. We repeated this exer-                                           . ..........
                                                                                                             .....
cise 100 times to generate 100 simulated election out-
comes. For comparison, we also coded each voter as
thinking the PRI was strengthening and simulated 100                          Salinas                                        Clouthier
election outcomes conditional on those beliefs.
     Figure 3 displays our results. The figure is called a       Coordinates in this ternary diagram are predicted fractions of the vote
“ternary plot” (see Miller 1977; Katz and King 1999), and        received by each of the three candidates. Each point is an election out-
coordinates in the figure represent predicted fractions of       come drawn randomly from a world in which all voters believe Salinas’
                                                                 PRI party is strengthing (for the “o”’s in the bottom left) or weakening (for
the vote received by each candidate under a different            the “·”’s in the middle), with other variables held constant at their means.
simulated election outcome. Roughly speaking, the closer
a point appears to one of the vertices, the larger the frac-
tion of the vote going to the candidate whose name ap-
pears on the vertex. A point near the middle indicates that      question by estimating a censored Weibull regression (a
the simulated election was a dead heat. We also added            form of duration model) on a dataset in which the de-
“win lines” to the figure that divide the ternary diagram        pendent variable, Yi , measures the number of years that
into areas that indicate which candidate receives a plural-      leader i remains in office following the onset of war. For
ity and thus wins the simulated election (e.g., points that      fully observed cases (the leader had left office at the time
appear in the top third of the triangle are simulated elec-      of the study), the model is
tion outcomes where Cárdenas receives a plurality).
     In this figure, the o’s (all near the bottom left) are                      Yi ~ Weibull(µi ,σ)
simulated outcomes in which everyone thought the PRI
was strengthening, while the dots (all near the center)                          µ i ≡ E (Yi X i ) = (e X iβ )− σ Γ (1 + σ)              (6)
correspond to beliefs that the PRI was weakening. The
figure shows that when the country believes the PRI is           where σ is an ancilliary shape parameter and Γ is the
strengthening, Salinas wins hands down; in fact, he wins         gamma function, an interpolated factorial that works for
every one of the simulated elections. If voters believe the      continuous values of its argument. The model includes
PRI is weakening, however, the 1988 election is a toss-up,       four explanatory variables: the leader’s pre-war tenure in
with each candidate having an equal chance of victory.           years, an interaction between pre-war tenure and democ-
     This must be a sobering thought for those seeking to        racy, the number of battle deaths per 10,000 inhabitants,
end PRI dominance in Mexico. Hope of defeating the               and a dummy variable indicating whether the leader won
PRI, even under these optimistic conditions, probably re-        the war.12 The authors find that leaders who waged for-
quires some kind of compromise between the two oppo-             eign wars tended to lose their grip on power at home, but
sition parties. The figure also supports the argument            authoritarian leaders with a long pre-war tenure were
that, despite much voter fraud, Salinas probably did win         able to remain in office longer than others.
the presidency in 1988. He may have won by a lesser mar-              Bueno de Mesquita and Siverson discuss the mar-
gin than reported, but the figure is strong evidence that        ginal impact of their explanatory variables by computing
he did indeed defeat a divided opposition.11                     the “hazard rate” associated with each variable. Hazard
                                                                 rates are the traditional method of interpretation in the
        Censored Weibull Regression Models                       literature, but understanding them requires considerable
                                                                 statistical knowledge. Simulation can help us calculate
How do wars affect the survival of political leaders?
                                                                 more intuitive quantities, such as the number of months
Bueno de Mesquita and Siverson (1995) examine this
                                                                 that a leader could expect to remain in office following
11See Scheve and Tomz (1999) on simulation of counter factual
predictions and Sterman and Wittenberg (1999) on simulation of
predicted values, both in the context of binary logit models.    12 The   first three variables are expressed in logs.
You can also read