Bootstrap method for the estimation of measurement uncertainty in spotted dual-color DNA microarrays

Page created by Marshall Webb
 
CONTINUE READING
Bootstrap method for the estimation of measurement uncertainty in spotted dual-color DNA microarrays
Anal Bioanal Chem (2007) 389:2125–2141
DOI 10.1007/s00216-007-1617-0

 ORIGINAL PAPER

Bootstrap method for the estimation of measurement
uncertainty in spotted dual-color DNA microarrays
Tobias K. Karakach & Robert M. Flight &
Peter D. Wentzell

Received: 10 July 2007 / Accepted: 7 September 2007 / Published online: 27 September 2007
# Springer-Verlag 2007

Abstract DNA microarrays permit the measurement of                    Introduction
gene expression across the entire genome of an organism,
but the quality of the thousands of measurements is highly            The introduction of DNA microarray technology in recent
variable. For spotted dual-color microarrays the situation            years has revolutionized the study of molecular cell biol-
is complicated by the use of ratio measurements. Studies              ogy, making it possible to assess genome-wide changes
have shown that measurement errors can be described by                in gene expression in a single experiment [1, 2]. In many
multiplicative and additive terms, with the latter dominating         ways, DNA microarrays approach an ideal analytical sensor
for low-intensity measurements. In this work, a measure-              array platform, exhibiting good specificity through selective
ment-error model is presented that partitions the variance            base pairing, high sensitivity through fluorescence mea-
into general experimental sources and sources associated              surements, and generally complete coverage of the analyte
with the calculation of the ratio from noisy pixel data.              domain in cases where the genome has been mapped. Des-
The former is described by a proportional (multiplicative)            pite their widespread use, however, there are still a number
structure, while the latter is estimated using a statistical          of challenges to be overcome if microarrays are to achieve
bootstrap method. The model is validated using simulations            their full potential [3]. These include issues related to ex-
and three experimental data sets. Monte-Carlo fits of the             perimental design, data quality, normalization, and data
model to data from duplicate experiments are excellent, but           analysis, among others. From the beginning, measurement
suggest that the bootstrap estimates, while proportionately           quality has been a particular focus of microarray research,
correct, may be underestimated. The bootstrap standard er-            since all conclusions are based on the reliability of the
ror estimates are particularly useful in determining the reli-        primary data. Typically, the quality of measurements can
ability of individual microarray spots without the need for           vary greatly across the spots on a DNA microarray due to
replicate spotting. This information can be used in screen-           variations in the amount of DNA spotted or hybridized,
ing or weighting the measurements.                                    changes in spot morphology, the presence of contaminants
                                                                      such as dust, and other factors. Therefore, an assessment of
Keywords DNA microarrays . Measurement errors .                       measurement error variance for individual spots is impor-
Bootstrap . Gene expression . Transcriptomics .                       tant for determining the utility of the calculated ratios. This
Measurement quality . Uncertainty estimation                          variance estimate can be used, for example, to filter mea-
                                                                      surements of low quality or to weight the measurements
                                                                      appropriately in subsequent data-analysis steps [4, 5].
                                                                         Although there are a variety of DNA microarray plat-
                                                                      forms in common use, one of the most widely employed for
                                                                      gene-expression studies is the spotted dual-color micro-
T. K. Karakach : R. M. Flight : P. D. Wentzell (*)
                                                                      array. With these arrays, the mRNA from expressed genes
Department of Chemistry, Dalhousie University,
Halifax, Nova Scotia B3H 4J3, Canada                                  is first extracted from two samples, which will be referred
e-mail: peter.wentzell@dal.ca                                         to here as the “test” and “reference” samples. For both sam-
Bootstrap method for the estimation of measurement uncertainty in spotted dual-color DNA microarrays
2126                                                                                        Anal Bioanal Chem (2007) 389:2125–2141

ples, the mRNA is reverse transcribed to single-stranded            that uses biological replicates and then subtracts the com-
cDNA and labeled with a fluorescent dye. Different dyes,            bined effects of the other three terms from the overall
typically the cyanine dyes Cy3 (green) and Cy5 (red), are           variance.
used to label the test and the reference samples, which are             The second term on the right, σslide   2
                                                                                                                  , arises from the
then co-hybridized on the microarray, where each spot               technical variations from one microarray slide to another
contains DNA complementary to a specific cDNA sequence.             and includes sources of variability related to the preparation
For each spot, intensities measured for the two different           (extraction, labeling, hybridization) and normalization of
wavelength channels can be converted into a ratio (test/            the responses from the two channels. This contribution can
reference) that indicates (after appropriate normalization)         be evaluated through technical replication, i.e. replicated
when the expression of the corresponding gene is up or              microarrays for the same biological source material. This
down-regulated in the test sample relative to the reference         removes the contribution of σbiol   2
                                                                                                           , and the remaining two
sample. The statistical significance of such findings, how-         terms can be subtracted from the overall variance to give
ever, is critically dependent on the reliability of the measured    σslide
                                                                      2
                                                                           .
ratio, so the ability to estimate measurement uncertainty is            The third contribution, σspot
                                                                                                   2
                                                                                                      , is due to spot-to-spot varia-
essential.                                                          tions within a slide and is most easily assessed by spotting
   The purpose of the present work is to develop a boot-            replicate DNA material at several locations on one slide.
strap method for estimating the uncertainty in individual           Although the best estimates are obtained when replicate
ratio measurements on a microarray. In particular, the pro-         spots are distributed in a random fashion across the whole
posed method is used to estimate the component of the               slide, this design is not efficient from a microarray produc-
overall uncertainty that is derived from the measurement            tion perspective, so replicated spots often occur side-by-side.
process itself, that is, the part of the uncertainty arising from       The final term on the right-hand side of Eq. (1), σmeas
                                                                                                                              2
                                                                                                                                  , is
the calculation of a ratio from pixel intensities corrupted with    the one of particular interest to this work and relates to
instrument noise. For a given microarray spot, this compo-          the actual determination of the ratio from individual pixel
nent may be either a minor or major contributor to the overall      intensities on each of the two channels. Irrespective of the
uncertainty, and therefore its assessment is critical. The          other contributions to the overall variance, which can be
methods developed here are of greatest utility when there           regarded as systematic effects at this level, the error in the
is no replication of spots within a microarray design, but          ratio measurement will depend on factors such as the inten-
are generally useful in evaluating the reliability of spots in      sity and noise of the fluorescence signals, the morphology
any design.                                                         of the spot, the spatial alignment of the wavelength chan-
                                                                    nels, the manner of the ratio calculation, the background
                                                                    levels, and the presence of outlying pixels due to saturation
Background                                                          or contamination. Often, especially for low to moderate
                                                                    intensity signals, σmeas
                                                                                           2
                                                                                                is the dominant source of error
Measurement error models                                            variance, and therefore its assessment is extremely useful.
                                                                    This idea is illustrated in Fig. 1, which shows a map of
Measurement errors in spotted dual-color microarray experi-         pixel intensities for the red and green channels for two
ments can arise from a variety of sources and these can be          different spots. Both spots give essentially the same cal-
combined or decomposed in a number of different ways. One           culated ratio of unity, but given the high level of intensity
simple general representation is:                                   of signals for the spot on the left, the ratio calculated is
                                                                    expected to be much more precise than for the spot on
σR2 ¼ σbiol
       2
            þ σslide
               2
                     þ σspot
                        2
                             þ σmeas
                                2
                                                             ð1Þ    the right, where the intensities are near background noise.
                                                                    Because σmeas2
                                                                                     is defined according to the characteristics of
In this equation, σR2 represents the overall variance in the        an individual spot, by definition it cannot, strictly speaking,
ratio measurement for a given spot. The first term on               be determined through replication, since each spot will have
the right-hand side of the equation, σbiol2
                                             , represents the       its own unique features. In practice, a close approximation
biological variation of the system under study and, depend-         to σmeas2
                                                                                can be achieved from side-by-side replicates,
ing on the experiment, is often the largest component of the        assuming that the spot morphologies and other character-
overall variance. This component arises from the fact that          istics are very similar. This assumption is usually valid,
there is a natural variation in gene expression levels among        since side-by-side replicates would be printed with the
similar or identical organisms or populations due to differ-        same pin and have a high spatial and temporal correlation
ences in genetic makeup and/or environment, or due to               in the printing and hybridization process. Nevertheless,
simple stochastic effects. This part of the variance can be         exceptions can occur. Moreover, for reliable estimation of
determined experimentally through a simple nested design            the variance, several replicate spots should be printed for
Anal Bioanal Chem (2007) 389:2125–2141                                                                                    2127

Fig. 1 Pixel intensity maps of
red and green channels for two
microarray spots, with combined
images inset. Note that both
spots give the same ratio mea-
surement, but the one on the
right would be expected to be
more uncertain

each gene, and this redundancy often runs counter to the         are associated with σmeas
                                                                                        2
                                                                                            , since this contribution becomes
efficient use of limited space on the microarray.                important for low-intensity signals, while the multiplicative
   The measurement error characteristics of DNA micro-           errors associated with the other three terms should dis-
arrays have been extensively studied in recent years and         appear at low intensities. Therefore we can combine the
some fairly consistent properties have emerged that are          first three terms in Eq. (1) into an uncertainty associated
reproducible across different laboratories and even different    with the experiment, σexpt
                                                                                          2
                                                                                              as opposed to the measurement
platforms. Although a variety of different models have been      step. This term will encompass the multiplicative error con-
proposed [6–12], it is generally observed that the intensities   tribution, so we can write:
measured on each channel follow a mixed model with a
multiplicative and additive term, with the latter dominating     σR2 ¼ σexpt
                                                                        2
                                                                             þ σmeas
                                                                                2
                                                                                     ¼ c2 R2 þ σmeas
                                                                                                2
                                                                                                                           ð4Þ
for low-intensity signals that become corrupted with back-
ground noise. Ideker et al. [7] expressed this model as:         Here, R is the ratio and c is the proportionality constant for
                                                                 the multiplicative error, i.e. the RSD for high-intensity
x ¼ mx þ mx "x þ d x                                      ð2Þ    measurements. Like the intensity models, the multiplicative
where x is the background-corrected signal intensity on a        component of the error represented by c should be fixed for
given channel for a particular spot, μx represents the true      a given slide, but unlike these models, the second term will
mean intensity, and ɛx and δx are normally distributed           not be fixed, since it depends on how the individual mea-
random variables with zero means and standard deviations         surement errors combine in the ratio calculation. Therefore,
of s "x and s dx . Rocke and Durbin [8] made somewhat dif-       a method is needed to estimate σmeas2
                                                                                                        for each spot.
ferent distributional assumptions and employed the model:
                                                                 Current approaches
x ¼ m x ehx þ d x                                         ð3Þ
where the definitions are the same and ηx is normally            The measurement error model described above for ratio
distributed with a mean of zero and a standard deviation of      measurements with spotted two-color microarrays presents
s hx . With these models, it is assumed that the model pa-       some difficulties from a data-analysis perspective in that it
rameters (ɛ, δ, η) are constant for a given slide and a given    leads to a heteroscedastic error structure, i.e. non-uniform
channel. For either model, when the additive error term (δx)     error variance in the measurements. There are two compo-
is negligible, the errors in the intensities will be propor-     nents to this problem. The first difficulty arises from the
tional to the signal magnitude, so the relative standard         multiplicative component of the uncertainty in both the in-
deviation (RSD) of the measured intensities is expected to       tensity and ratio measurement domains, which means larger
be constant. It is easily shown through propagation of errors    uncertainties for larger measurements. A common approach
that, under these conditions, the RSD in the measured ratio      to dealing with this problem in microarray data analysis is to
of the two channels is also constant.                            carry out a logarithmic transformation of the data. For purely
    To reconcile these models with that given in Eq. (1), it     proportional errors, it is easily shown by propagation of
can be assumed that the additive contributions to the error      errors that a log transform will homogenize the error var-
2128                                                                                       Anal Bioanal Chem (2007) 389:2125–2141

iance, leading to a homoscedastic error structure that is          Ratio calculation methods
statistically more tractable.
    The second contributor to heteroscedasticity in the ratio      An integral element in the statistical behavior of any ratio
measurement arises from the contribution of the σmeas 2
                                                           term.   measurement will be the manner in which the ratio is
Often, this term will be negligible compared to the mul-           computed from the raw data. The fundamental problem is
tiplicative error component, but when it is not (typically         one of taking intensity measurements from (typically) a few
for low to moderate intensity signals) it can destroy the          hundred pixels on two channels and computing a single
proportional error structure so that logarithmic transforms        representative ratio of intensities. Complicating factors in-
are ineffective for homogenizing the variance. The most            clude the fact that spots are rarely uniform, the pixels may
common way to treat this problem is to eliminate spots             not be perfectly aligned, outliers may be present, and back-
where the σmeas2
                    term becomes a significant or dominant         ground subtraction normally needs to be carried out. Five
contributor by flagging spots with low intensities or dubious      methods are commonly employed for ratio calculation:
shapes as bad. This process is generally known as data
                                                                   1.   ratio of means,
filtering. A variety of strategies can be employed to this end,
                                                                   2.   ratio of medians,
the most basic being a visual inspection of the spots. This
                                                                   3.   mean of ratios,
process is labor-intensive, however, and quite subjective, so
                                                                   4.   median of ratios, and
a number of automated procedures based on various quality
                                                                   5.   regression.
measures have been proposed [13–22] for use independently
or in conjunction with manual methods. Although more                  These methods can be employed for both the foreground
efficient and objective than visual inspection, automated          and the background regions, as designated by the gridding
procedures are less flexible and face the challenge of re-         procedure.
ducing all of the contributors to poor spot quality to a              One of the simplest and most popular methods for ratio
numerical indicator. Perhaps more importantly, data filter-        calculation is the ratio-of-means method, where the mean of
ing methods result in a binary classification of good or bad,      pixel intensities is calculated for each channel and, after
while measurement uncertainties follow a continuum of              background subtraction, these are used to determine the
magnitudes. Setting an arbitrary threshold runs the risk of        ratio. In essence, this method integrates the signal intensity
excluding measurements that may contain important infor-           values across the spot and, in doing so, should have a good
mation or corrupting the data with excessive noise. Clearly,       signal-to-noise ratio (S/N) and a low sensitivity to mor-
a method that could quantify the uncertainty associated with       phology or small channel misalignment. The biggest draw-
each measured ratio would be useful.                               back to this method is a high sensitivity to outliers which
    One strategy that has been suggested for estimation of         can adversely affect the calculation of the mean.
measurement uncertainty in ratio calculations is propaga-             Another widely used method is the ratio-of-medians, which
tion of errors [13, 14]. In principle, if one knows the un-        is similar to the ratio-of-means except that the calculation is
certainties in the two intensities used to calculate the ratio,    carried out using the median intensity on each channel. This
the uncertainty in the ratio is easily determined. In practice,    approach is more robust in terms of sensitivity to outliers, but
however, models employed to do this are overly simplistic          can be sensitive to spot morphology. Specifically, if a spot
and do not account for the complex correlation structures          exhibits significant regions of low intensity, as can be the case
of the signals and noise in the pixelated data. Furthermore,       for “doughnut” or “crescent” shaped spots, for example, there
reliable estimates of the measurement uncertainty for the          is a good chance that the median intensities will fall in this
intensities are difficult to obtain. This is especially true for   region. This can be a problem because low-intensity signals
background intensities, which are normally subtracted from         are more likely to exhibit high noise.
the raw intensity values. For these reasons, error propaga-           For the mean-of-ratios and median-of-ratios methods,
tion generally gives poor estimates of measurement noise           intensity ratios are first calculated on a pixel-by-pixel basis
and has not been widely employed with microarrays.                 following background subtraction for each spot. The mean
    Another approach that has been used with microarray data       or median of these pixel ratios is then taken to be the spot
is the application of a variance stabilizing transformation,       ratio, with the latter providing a more robust estimate. One
such as the generalized log transform [10, 23–25]. These           appealing feature of this approach is the potential to
methods attempt to homogenize the variance while incorpo-          evaluate the dispersion of the calculated ratios as an
rating both the multiplicative and additive terms. They re-        estimate of uncertainty. In practice, however, if there are
quire some estimation of the transform parameters, however.        significant variations in the pixel intensities included in the
Moreover, the implementation of any transform runs the risk        calculations, the low intensity pixels will result in much
of altering the structure that was present in the original data,   noisier ratios and so the uncertainty estimates may be high
which may be undesirable in certain applications [5].              and the ratio calculation may be unreliable.
Anal Bioanal Chem (2007) 389:2125–2141                                                                                      2129

   The regression method for ratio determination is not as        classical approaches are not practical. The possibility of
widely used as some of the other methods, but has certain         extending the application of this method to estimate the
advantages and is the method employed in this work. With          measurement uncertainty component for microarray spots,
this method, the intensities for pairs of pixels across a spot    σmeas, as defined above, was therefore explored. In this
are plotted against one another. In principle, this should        work, the bootstrap method was implemented in conjunc-
lead to a straight line with a slope equal to the ratio. In       tion with the regression method of ratio calculation, but it
practice, orthogonal regression should be used instead of         could equally be applied with other ratio-calculation
ordinary least squares (OLS), since errors are observed on        methods also. In fact, Brody et al. [11] employed the
both axes (channels) and this can lead to problems for OLS        bootstrap method with a median-of-ratios calculation, but
when high slopes are obtained. In addition, reliable              did not carry out a rigorous evaluation of its reliability,
estimation requires some low-intensity pixels in order to         providing data for only three genes. In the present work,
define the line. These can come from the edges of the             interpretation of the regression parameters and their
foreground region or within the spot itself. The regression       standard errors, in the context of two-color microarrays, is
method works best for spots which exhibit substantial             that the slope corresponds to the ratio while the intercept
variation in intensity across the foreground region as            relates to differential background for the two channels
opposed to a high degree of uniformity, but, in the authors’      underneath each spot. Emphasis here is placed on estimat-
experience, the former is more common than the latter. The        ing the standard error of the ratio alone.
regression method is perhaps most similar to the ratio-of-           The main idea behind the bootstrap is that many new
means method and, like that method, will be sensitive to          samples (referred to as bootstrap samples) are “created”
outliers, especially for high leverage pixels. However, a         from the original population by re-sampling (with replace-
subtle but important advantage of the regression method is        ment), hence circumventing the need for extensive replica-
that it eliminates the need for background correction since       tion. For example, given a spot with 300 pixels on both red
the intercept of the regression automatically accounts for        and green channels, one draws a sub-population of 300
this. With other methods, background correction can               pixels (referred to as a bootstrap sample) at random (with
present difficulties because it requires that a background        replacement) from the initial or original population and
region of sufficient size be defined around the spot that         carries out a regression using these pixels to obtain a slope
does not impinge on other spots, leading to irregularly           (ratio) and intercept which are then stored. Note that,
shaped regions. The selection of background regions is            although the bootstrap sample also contains 300 pixels,
algorithm-specific and often proprietary in commercial            some of the pixels from the original population will be
software, so background intensities may not be very               represented multiple times and others not at all. A second
reproducible from one package to the next. Moreover, there        round of re-sampling from the original population is then
is always the risk that the calculated background is not          carried out, followed by a regression to obtain a second
representative due to contamination or spatial variations in      slope and intercept which are also stored. Figure 2 provides
the region of the spot. It has also been demonstrated [26]        a conceptual illustration of this approach. This process is
that the background under the spot (spot-localized back-          repeated as many times as necessary to obtain a reasonable
ground) may not be the same as the background around the          estimate of the standard error in the parameter:
spot, leading to errors in the ratio calculation. To define the            vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
                                                                           u               B                     2
background, the regression method requires only a few                      u 1 X
                                                                  ðseÞB ¼ t                        b
                                                                                                   q i  b  q            ð5Þ
low-intensity pixels near or within the spot. In the latter                   B  1 i¼1
case, this method can, in principle, compensate for spot-
localized background effects. For these reasons, the              In this equation, (se)B is the standard error estimate with B
regression method was chosen for the ratio calculation in         bootstrap samples, b qi is the parameter estimate (slope) for
this work, but the methodology developed can also be              the ith bootstrap sample, and b    q is the mean parameter
applied to the other calculation methods.                         estimate for all of the bootstrap samples. The standard error
                                                                  calculated in this way is taken to be an estimate of σmeas.
Bootstrap uncertainty estimates                                      The measurement component of the variance, σmeas     2
                                                                                                                             , is
                                                                  estimated by the bootstrap method for each individual spot.
Bootstrap methods are well established in statistics and          This can then be combined with the multiplicative
engineering fields where they have mainly gained recogni-         component of the variance, σexpt 2
                                                                                                      , to give the overall error
tion and popularity as approaches for variance estimation in      variance in the ratio. The multiplicative component, or
the absence of replicate data [27–29]. In statistics, boot-       more specifically, the value of the proportionality constant c
strapping is widely used for estimating standard errors or        in Eq. (4), should be the same for all spots on a given
confidence intervals for parameters in cases where the            microarray. It should be possible to estimate this value from
2130                                                                                    Anal Bioanal Chem (2007) 389:2125–2141

Fig. 2 Conceptual illustration
of bootstrap procedure. The k
bootstrap samples are created
from the sampled population of
size N by extracting N measure-
ments at random (with replace-
ment) in each case. The
parameter b   θ is the estimate of
the true population parameter, θ,                       Pick N points
and bθ1 ; :::; b
                θk are the bootstrap
estimates of the parameter                             at random with
                                                        replacement

                                        Sampled Population                                 Bootstrap Samples
                                               θ̂                                θ̂1*        θ̂2*       θ̂3*              θ̂B*

replicate experiments, especially if the estimation of σmeas
                                                        2
                                                                ranging from 1% to 50% were examined, and typical
enables elimination of spots where this component of the        images for spots with 5% and 40% noise are shown in
noise dominates.                                                Fig. 3c–f.

                                                                Experimental data sets
Experimental
                                                                Three experimental microarray data sets were employed in
Simulated data                                                  this work to try to validate the bootstrap error estimation
                                                                procedure. The first was part of a time-course study
Since perfect experimental replication of spots with exactly    investigating exit from stationary phase by baker’s yeast,
the same characteristics is not possible, some validation of    Saccharomyces cerevisiae. These data were collected in the
the bootstrap method was carried out using simulated            laboratory of Professor Margaret Werner-Washburne (Uni-
microarray spots. As it is impossible to generate represen-     versity of New Mexico, Albuquerque, NM, USA) and
tations of every possible spot morphology, two fairly           involved approximately 6,300 yeast genes. A complete
common shapes were used as models. The noise-free               discussion of these experiments has appeared elsewhere
intensity maps of these are shown in Fig. 3a and b. The         [30]. The entire time course involved 19 microarrays, but
first model spot had a morphology which is characterized        for the purposes of this work, the interest was in replicate
by somewhat uniform intensity, with sloping edges and a         slides prepared at time points: 0, 1, 10, 20, and 35 min,
slant equal to about 40% of the maximum signal on the           which allowed an assessment of the multiplicative contri-
plateau. The second model spot exhibited a doughnut shape       butions to the uncertainty in the ratio. Duplicate slides were
typical of many microarray spots, with a center region that     available for each of these time points, except time point 0
dropped to about 30% of the maximum signal around the           where triplicate measurements were made. Duplicate slides
outside. In each case, the spots were generated on a 21×21      were hybridized and scanned by two different individuals
pixel grid and had no background present. For the               and on different days, incorporating additional components
simulations, normally distributed random noise was as-          in the experimental variability. In addition to the availabil-
sumed with a standard deviation of 100 intensity units. In a    ity of multiple replicates, this experiment had some other
given simulation, noise was specified as a percentage of the    features that were useful in this study. First, as a time-
maximum signal on the red channel, so the model spot            course study, there was a wide range of ratios (unlike what
profile was scaled accordingly to give the appropriate          would be anticipated for simple comparator experiments),
maximum for the error-free measurements on that channel.        which allowed error models to be tested over a large span
The ratio (red/green) was taken to be 2, so the error-free      of values. Second, due to slide production inexperience, the
spot image for the green channel was taken to be half that      quality of these slides was not optimal and included an
of the red channel. To generate the simulated data, the noise   extensive representation of different spot morphologies and
was added to the error-free spot images and the resulting       qualities.
values were rounded to the nearest integer to simulate the         The second data set employed was another time-course
effects of quantization noise introduced through digitiza-      study involving the intraerythrocytic developmental cycle
tion, although these were expected to be small. Noise levels    of the parasite Plasmodium falciparum, responsible for the
Anal Bioanal Chem (2007) 389:2125–2141                                                                                   2131

Fig. 3 Intensity maps of simu-
lated microarray spots. Two spot   a                                            b
morphologies were employed,
the uniform spot with a sloped
top (left) and the doughnut
shape (right). The noise-free
spots are shown in (a) and (b),
while (c) and (d) represent 5%
noise and (e) and (f) represent
40% noise on the red channel

                                   c                                            d

                                   e                                            f

majority of the cases of human malaria worldwide. This           presented here, the side-by-side replicates were used to
data set has been widely studied as it was a contest data set    obtain an experimental estimate of the measurement
for the CAMDA (Critical Assessment of Microarray Data)           uncertainty.
Conference in 2004, and is publicly available. In this work
it will be referred to as the CAMDA data set. Experimental       Ratio calculation
details are available in the original reference [31]. Briefly,
the transcriptome was measured at hourly intervals over a        As already noted the regression ratio method was used to
48-h period for a total of 55 slides (including replicates)      calculate the ratio, but additional details are provided here
with about 7,300 features each. A common reference was           that are especially relevant for experimental measurements.
used for all of the measurements. Triplicate measurements        The first step in the ratio calculation for a given spot was
were made for the first time point at 1 h, and duplicate         extraction of the paired pixel intensities. This was done
measurements were made at 7, 11, 14, 18, 20, 27 and 31 h.        using the spot location and size information in the “grid”
It was primarily these replicates that were used in this work.   file. Pairs of pixels for which either channel was saturated
   The third data set utilized microarrays from an experi-       were then removed from this list. Next, the pairs containing
ment designed to study the development of the Atlantic           upper fifth percentile of pixel intensities on each channel
halibut (Hippoglossus hippoglossus). This experiment was         were also removed from the data set (five to ten percent of
conducted in the laboratory of Dr Susan Douglas at the           pairs in total, depending on redundancy). This was done to
NRC Institute for Marine Biosciences in Halifax, Nova            reduce the chances of retaining in the data set outliers
Scotia, Canada. The experiment involved triplicate hybrid-       arising from dust spikes. The designation of the ninety-fifth
izations to measure gene expression in juvenile halibut at       percentile was somewhat arbitrary, but seemed sufficient to
each of five developmental stages, for a total of 15 slides.     eliminate most dust spikes without having a significant
Measurements were made against a common reference and            effect on the regression. Depending on the characteristics of
each triplicate set included one dye swap. Each slide            the slide, this number could be adjusted. Orthogonal
consisted of 38,500 spots which included four side-by-side       regression [32] was then carried out on the remaining pairs
replicates for each of the 9,625 unique features. In the work    of pixel intensities, extracting the slope as the spot ratio.
2132                                                                                     Anal Bioanal Chem (2007) 389:2125–2141

Results and discussion                                            the bootstrap estimates also increases, as would be
                                                                  expected. For the case of 40% noise in Fig. 4c and d, the
In order to gain confidence in the measurement uncertain-         extent of variation in the standard deviation estimates is
ties calculated for the specific ratios, it was necessary to      quite large, ranging between 0.2 (10% uncertainty in the
determine how accurately the bootstrap-calculated value of        nominal ratio) to 1 (50% uncertainty). However, it is im-
σmeas
  2
       reflected the true uncertainty in the measurements.        portant to recognize that the primary purpose of the boot-
This is difficult to do, since there is no way to generate        strap estimation in this application is to obtain a rough
perfect experimental replicates of a given spot. In order to      estimate of σmeas that can be used for data filtering and
provide some validation for the results obtained, three           weighting, and not for rigorous statistical testing.
approaches were employed. First, simulated data were used             An important consideration in the application of the
in which the spot morphology could be carefully controlled        bootstrap method is the number of bootstrap samples used.
and reproduced. In the second approach, Monte-Carlo               Errors in this procedure can be attributed to fundamental
modeling of data from the yeast and CAMDA microarrays             statistical errors, which cannot be improved by increasing
was employed. Finally, side-by-side replicates of spots in        the number of bootstrap samples, and “Monte-Carlo”
the halibut microarray were used to generate an experi-           errors, which disappear as the number of samples goes to
mental estimate of σmeas
                      2
                          that could be directly compared to      infinity. In this application, it is important to minimize
bootstrap estimates.                                              variations in the estimates due to Monte-Carlo errors while
                                                                  at the same time keeping the number of bootstrap samples
Simulated data                                                    low to minimize the computational time needed for thou-
                                                                  sands of microarray spots. Figure 5 shows a plot of the
To evaluate the bootstrap method for the simulated micro-         standard deviation in the standard error estimates and the
array spots shown in Fig. 3, an estimate of the “true”            root mean square of the bias as a function of the number of
measurement uncertainty was first obtained for each spot/         bootstrap samples for the case of the uniform/sloped spot
noise-level combination. This was done by generating              with 20% noise. To generate this plot, 100 runs were
1,000 replicate spots with different noise realizations,          carried out at each level of bootstrap sampling (B) and the
followed by calculation of the red/green ratio for each of        standard deviations of the measurement error estimates
these replicates. The standard deviation of these ratios was      were recorded. This was repeated ten times at each level to
taken as the true value of σmeas. Following this, 100             give the mean and error bars shown in the plot. Similar
additional replicates were generated, each with a different       calculations were carried out for the bias to evaluate its
noise realization. For each of these, bootstrap estimates of      stability, except in this case a root-mean-square value was
σmeas were obtained based on 200 bootstrap samples. These         calculated to account for its dispersion around zero rather
estimates are plotted for two noise levels (5% and 40%) and       than around a mean. Although such a plot will vary some-
both spot morphologies in Fig. 4, along with the “true”           what as conditions are changed, it was generally found that
value of σmeas (horizontal line). Also shown in each sub-         both features leveled off fairly quickly above 100 bootstrap
figure is an estimate of the bias (dashed line) for each of the   samples. For the algorithms used in this work, 200 boot-
100 cases. The bias is the deviation of the estimated ratio       strap samples were used.
from the true value and can be estimated as:                          The validity of the results obtained from these simu-
                                                                  lations is, of course, predicated on the assumption of inde-
ðbiasÞB ¼ b
          q  b
                q                                          ð6Þ    pendent and uniform errors in the pixel intensities. Such an
                                                                  assumption is not likely to be valid, but it is difficult to
Here, bq is the mean value of the ratio for the bootstrap       develop models for pixel errors which would be accurate
samples and b   q is the ratio estimated from the original        and universal. A proportional or shot-noise error structure is
population. Since the bias relates to the accuracy of the         reasonable, likely in combination with an additive contri-
ratio estimate, it should ideally be considerably smaller than    bution. Correlated noise on adjacent pixels is also likely,
the standard error. Although a bias correction can be made        including effects that may arise from slight channel
in the estimate of the uncertainty, this can also increase the    misalignment. It is not possible to simulate all of these
variance in that estimate and such a correction was not           scenarios, but a simple set of simulations was carried out
performed in this work.                                           that included a proportional error term in addition to the
   The results in Fig. 4 show good general agreement              uniform noise. Results were essentially the same as those
between the bootstrap-estimated uncertainties and the             shown in Fig. 4, although it was noticed that there was a
standard deviation in the ratio measurement as estimated          slight bias in the estimate of the ratio, as might be antic-
from many replications. In all cases, the bias is compara-        ipated. However, it is clear that a full validation of the
tively low. As the level of noise is increased, the variance in   bootstrap approach needs to include testing with experi-
Anal Bioanal Chem (2007) 389:2125–2141                                                                                                                 2133

Fig. 4 Bootstrap estimates of                                 0.04                                              0.04
standard deviation in the ratio                                                      a                                                b
measurement (R/G=2) for sim-
                                                              0.03                                              0.03
ulated microarray spots. The

                                      Standard Deviation
horizontal line is the estimate of
the “true” standard deviation                                 0.02                                              0.02
based on 1,000 replicates of the
spot, the solid blue line is the
                                                              0.01                                              0.01
bootstrap estimate of the stan-
dard deviation for each of 100
replicates, and the dashed red                                   0                                                 0
line is the corresponding esti-
mate of the bias. The panels on                               -0.01                                             -0.01
the left are for the uniform spot                                     0   20    40       60       80      100           0   20   40       60     80      100
                                                                 1                                                 1
with the sloped top and those on
the right are for the doughnut-                                                      c                                                d
shaped spot. Plots (a) and (b)                                 0.8                                               0.8
                                         Standard Deviation

correspond to 5% noise in the
red channel, while (c) and (d)                                 0.6                                               0.6
correspond to 40% noise
                                                               0.4                                               0.4

                                                               0.2                                               0.2

                                                                 0                                                 0

                                                                  0       20    40       60       80      100       0       20   40       60     80      100
                                                                                Run Number                                       Run Number

mental data as well as simulated data. The next two sections                                  served, allowing a variety of measurement conditions to be
address this issue.                                                                           examined. Likewise, these microarrays exhibited spots of
                                                                                              varying intensity and quality, again permitting the robust-
Yeast microarray data                                                                         ness of the model to be explored. Finally, replicate mea-
                                                                                              surements were conducted for a number of time points (0,
The microarray data from the yeast exit from stationary                                       1, 10, 20, and 35 min), allowing the experimental variance
phase time-course study was chosen for experimental                                           to be assessed.
validation of the bootstrap method for a number of reasons.                                      In the first part of this study, the use of the bootstrap-
First, since this was a time-course study involving large                                     estimated σmeas for screening unreliable microarray spots
changes in gene expression relative to a common reference                                     was investigated. To do this, ratios from duplicate micro-
(log-phase yeast cells) a wide range of ratios were ob-                                       array experiments can be compared to one another in the
                                                                                              form of a log-log plot. Logarithmic plots are normally
                                                                                              preferred for such comparisons because the proportional
         0.05
                                                                                              error structure commonly observed for microarrays reduces
                           Standard deviation of bootstrap                                    to a uniform (homoscedastic) error structure upon logarith-
         0.04              uncertainty estimates                                              mic transformation. Ideally, if duplicate experiments were
                           Root mean square of bootstrap                                      in perfect agreement, the log-ratio plot should be a straight
                           bias estimates                                                     line with a slope of unity and an intercept of zero. However,
SD/RMS

         0.03
                                                                                              a non-zero intercept is often observed in these plots due to
                                                                                              a required normalization of the two experiments arising
         0.02                                                                                 from differences in laser intensity, detector sensitivity, dye
                                                                                              labeling efficiency, the amount of RNA extracted, and so
         0.01                                                                                 on. Moreover, experimental noise related to σexpt and σmeas
                                                                                              will cause deviations from the line, as will measurements
                                                                                              considered “bad” due to anomalous shape, background
           0                                                                                  problems, optical interferences, or other factors. By elim-
            0       200         400                            600        800    1000
                Number of Bootstrap Replicates (B)                                            inating spots with excessively high measurement variance,
Fig. 5 Effect of the number of bootstrap samples used (B) on the                              σmeas, the reliability and reproducibility of the spots that
precision of the bootstrap error estimate and the bias estimate                               remain should be improved.
2134                                                                                                                Anal Bioanal Chem (2007) 389:2125–2141

    To illustrate this, Fig. 6 shows log-ratio plots for dupli-                 similar in magnitude to the RSD expected for σexpt. This
cate microarray slides at time zero (other duplicate sets are                   censoring resulted in the retention of 5,463 spots. Of the
similar). Fig. 6a shows a plot where all of the points have                     766 spots rejected on the basis of σmeas, 624 had been
been retained except for those with a negative ratio on                         flagged, representing 42.3% of the flagged spots. Figure 6c
either slide, which are obviously erroneous. This results in                    again shows significant improvement over Fig. 6a and has
the retention of 6,229 spots out of the original 6,307. The                     characteristics similar to Fig. 6b, although fewer spots have
line plotted through the points represents the best fit by                      been removed. By reducing the cutoff below 30%, the
orthogonal least squares and the slope of this line is given                    quality can be further improved, but with a commensurate
in the figure. It can be seen that using this very unrestrictive                increase in the number of censored points.
filtering criterion results in a substantial spread in the ratios                  When both flags and measurement uncertainty are used
from the duplicate samples and an improved screening                            as censoring criteria in generating the log-ratio plot, as
method is desirable.                                                            shown in Fig. 6d, the number of spots is further reduced to
    In Fig. 6b, spots that have been flagged by an operator as                  4,612. In this case, all but a few of the extreme outlying
“bad” (on either slide) have been removed, reducing the                         points have been eliminated, resulting in greater reliability
number of spots to 4,754. This flagging is a manual and                         in the data.
subjective procedure that generally happens when spot                              It is clear from these observations that neither flagging
grids are set up for the microarray image. This can result in                   nor censoring on the basis of a 30% cutoff in σmeas results
censoring of spots because of unusual morphology, unreli-                       in the elimination of all of the unreliable microarray spots.
able background, smearing, interferences, or other reasons                      This is not unexpected, since flagging is subjective and
at the discretion of the operator. It is clear that this cen-                   prone to human error, while filtering on the basis of the
soring resulted in improved data quality and better cor-                        bootstrap-estimated σmeas does not necessarily capture all of
relation between the two experiments, although there still                      the undesirable spot characteristics, such as anomalous
appear to be some outlying points in the plot.                                  background characteristics. In addition, it appears that flag-
    In Fig. 6c, the operator flags were ignored and censoring                   ging may unnecessarily eliminate a substantial number of
was based solely on the value of the measurement un-                            spots with useful information. The best censoring strategy
certainty, σmeas, removing any spots where the relative                         would appear to be one with a combination of the two
standard deviation from this source (RSDmeas=σmeas/R) was                       methods, with a more relaxed flagging criterion to min-
greater than 30%. This cutoff was somewhat arbitrary, but                       imize the rejection of spots which may be valid. More

Fig. 6 Log-ratio plots for dupli-
                                         10                                                                 6
cate slides at time zero in the                    a                                                            b
yeast data set with various cri-
                                                                                                            4
teria used for screening mea-
surements: (a) only spots with                 5
                                    log2(R2)

                                                                                                            2
ratios less than zero are re-
                                                                                                 log2(R2)

moved, (b) spots with a ratio                                                                               0
less than zero and operator-                   0
flagged spots are removed; (c)                                                                         -2
spots with a ratio less than zero
or a bootstrap-estimated RSD               -5                              N = 6229                    -4                                 N = 4754
greater than 30% are removed,                                          Slope = 0.900                                                  Slope = 0.898
and (d) spots meeting any of the                                                                       -6
                                                        -5       0              5           10                      -5          0           5          10
three criteria (ratio30%) are removed                           6
                                                   c                                                        6   d
                                               4
                                                                                                            4
                                               2
                                                                                                            2
                                                                                                 log2(R2)
                                    log2(R2)

                                               0
                                                                                                            0

                                          -2
                                                                                                       -2

                                          -4                                N = 5463                   -4                                   N=4612
                                                                        Slope = 0.904                                                  Slope = 0.921
                                          -6                                                           -6
                                                   -5        0              5          10               -6           -4   -2     0     2    4     6     8
                                                                 log2(R1)                                                        log2(R1)
Anal Bioanal Chem (2007) 389:2125–2141                                                                                                                                                 2135

specifically, the flagging strategy should not focus on spots                                          log-ratio plot and the calculated value of c is still useful as a
with low intensities, which are likely to be detected through                                          composite estimate. Figure 7b shows a histogram of the
σmeas, but rather on spots with anomalous characteristics                                              absolute orthogonal residuals from Fig. 7a, which appear to
that may not be censored on the basis of measurement                                                   exhibit a high degree of normality. The half-Gaussian curve
uncertainty alone.                                                                                     overlaid on the histogram was obtained by minimizing the
    If censoring on the basis of σmeas is carried further, it can                                      χ2 value for count statistics. The value obtained for χ2 was
be argued that if only spots with a small RSDmeas are                                                  29.2 with 29 degrees of freedom (30 bins), a value con-
retained, then the dominant source of error in those re-                                               sistent with a normal distribution given the critical value of
maining spots should be σexpt, which should exhibit a pro-                                             42.6 (α=0.05). The estimated standard deviation of the
portional error structure. Figure 7a shows a log-ratio plot,                                           residuals was 0.168, corresponding to a proportional error
again with time zero duplicates, where censoring is based                                              contribution of 8.2% (c=0.082). This proportional error
on flags and RSDmeas > 5% (in this instance, the second                                                structure is clearly visible in a ratio (instead of log-ratio)
criterion captures 98.4% of the flagged spots). This reduces                                           plot of the censored measurements in Fig. 7c. This analysis
the number of spots to 1,294, but the figure clearly shows                                             was carried out for all seven sets of duplicates and, although
the high correlation and a slope that is closer to the ideal of                                        there was some variation in the number of outliers detected
unity at 0.969. Assuming σexpt follows a proportional error                                            and the level of proportional noise observed, the general
structure (i.e. σexpt = cR as given in Eq. 4), then it can be                                          behavior was very similar to the case shown.
shown by propagation of error that the logarithmic trans-                                                  If censoring to remove spots with a large σmeas reveals
formation of the ratio should lead to a uniform variance                                               the proportional error structure, then, conversely, including
when σmeas can be ignored:                                                                             those spots may degrade the normality of the residuals in
                                                                                                     the log-ratio plots. This is the case as illustrated in Fig. 8a,
                  dðlog2 RÞ 2 2
σ ðlog2 RÞ ¼
  2
                                σexpt                                                                 which shows a histogram analogous to that in Fig. 7b, but
                     dR                                                                                with the cutoff set to RSDmeas >50% (the Gaussian fit is
                            1                          c2                                              shown in red). Although the visual quality of the fit does
                    ¼               2
                                         c 2R 2 ¼                                           ð7Þ       not appear to be much different here, the spread of the
                        R2 ðln 2Þ                    ðln 2Þ2
                                                                                                       points is substantially larger (σresid =0.25) and the χ2 of 150
Based on this, and assuming that both microarray slides                                                indicates a lack of fit to a normal distribution. This was the
have the same proportional error structure (i.e. the same                                              typical trend for all of the duplicate pairs as indicated in
value of c), it can be shown that the orthogonal residuals of                                          Fig. 8b, which shows that the χ2 values generally increase
the fit in Fig. 7a should be normally distributed with a                                               as the RSDmeas cutoff value increases. This suggests that
standard deviation given by:                                                                           increasing the proportion of ratios with a significant con-
          pffiffiffi                                                                                         tribution from σmeas corrupts the proportional error structure
            2c                                                                                         associated with σexpt, which is consistent with the error
s resid ¼                                                 ð8Þ
          ln 2                                                                                         model.
In reality, there will likely be differences in the proportional                                           Although this approach provides some support for the
error factor, c, from one slide to another, but this does not                                          model and indicates that the bootstrap estimates are asso-
invalidate the normality of the observed residuals in the                                              ciated with the pure measurement uncertainty, it does not

                                                                            120                                                                  2
           4   a                                                                                   b                                                  c
                                                                            100
           2                                                                                         χ2 = 29.2                              1.5
                                                                Number of Spots

                                                                                                                              Ratio on Slide 2

                                                                                  80
                                                                                                     ( ν = 29)
log2(R2)

           0
                                                                                  60                 σresid= 0.168                               1

       -2
                                                                                  40
                                                                                                                                            0.5
       -4                           N = 1294                                      20
                                Slope = 0.969
       -6                                                                         0                                                              0
               -5                   0                       5                      0   0.1     0.2       0.3    0.4     0.5                       0       0.5          1         1.5     2
                           log2(R1)                                                     Orthogonal Residual                                                     Ratio on Slide 1
Fig. 7 (a) Log-ratio plot for the yeast data in Fig. 6 with a bootstrap-estimated RSD cutoff of 5%. (b) Histogram of the orthogonal residuals from
the fit in (a) along with a fit to a Gaussian distribution. (c) Ratio plot of the data in (a) showing the proportional error structure of the ratios
2136                                                                                                                       Anal Bioanal Chem (2007) 389:2125–2141

                                          500                                                          400
                                                                 a                                                         b             1,2
                                                                                                       350                               1,3
                                          400                Observed Residuals                                                          2,3
                                                                                                       300

                        Number of Spots
                                                             Gaussian Fit                                                                4,5
                                                             Monte-Carlo Fit                           250                               7,8

                                                                                           χ 2 Value
                                          300                                                                                            10,11
                                                                                                       200                               14,15
                                          200                                                          150

                                                                                                       100
                                          100
                                                                                                        50

                                            0                                                            0
                                             0   0.2       0.4       0.6   0.8         1                  0   10      20       30      40        50
                                                  Orthogonal Residual                                          %RSDmeas Cutoff
Fig. 8 (a) Histogram of orthogonal residuals for the log-ratio plot of                         distribution of orthogonal residuals of log-ratio plots of duplicate
yeast time zero duplicates with a bootstrap-estimated RSD cutoff of                            yeast data sets to a Gaussian curve for various levels of RSD cutoff.
50%. The red line (plus symbols) shows a fit of the distribution to a                          The numbers in the legend refer to duplicate slide pairs. Note that the
Gaussian curve, while the blue line (crosses) is a Monte-Carlo fit of                          quality of the fit degrades as the cutoff is increased
the distribution to Eq. (10). (b) The χ2 values for fits of the

allow a direct quantitative and independent assessment of                                       approach resulted in only a marginally improved fit with
σmeas. An indirect assessment is possible, however, through                                     only a slightly lower χ2 value.
the use of Monte-Carlo simulations. Given a duplicate set                                          It was postulated that the poor fit of the Monte-Carlo-
of slides with specified ratios and their associated errors, it                                 simulated data might be the result of consistent under or
is possible to generate a set of simulated data with the same                                   over-estimation of the bootstrap errors. Based on this,
distributional characteristics. To do this, projected ratios                                    Eq. (9) was modified to include a scale factor adjustment
calculated from the linear fit of the log-ratio plot for a pair                                 for the bootstrap error, designated as b in Eq. (10):
of slides were taken to be the “true” values. Simulated                                                    qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
noisy measurements were obtained by adding normally                                             σR ði; jÞ ¼ c 2 Rði; jÞ2 þ b 2 σboot        2 ði; jÞ       ð10Þ
distributed random values to each set of “true” values. The
                                                                                                The optimization was then carried out to minimize the χ2
standard deviation associated with the error in each spot
                                                                                                value by adjustment of both b and c. This resulted in very
ratio was calculated from:
                                                                                                good fits of the observed distribution of orthogonal
              qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi                                        residuals to the distributions obtained from the Monte-
σR ði; jÞ ¼    c 2 Rði; jÞ2 þ σboot       2 ði; jÞ                               ð9Þ            Carlo simulations, as shown by the blue curve in Fig. 8a.
                                                                                                These calculations were repeated for each of the duplicate
where σR(i,j) is the error standard deviation for spot j on                                     slide sets in the data set and the results are summarized in
slide i, c is the proportional error component (RSD), R(i,j)                                    Table 1, which includes the estimates of the proportional
is the estimated “true” ratio, and σboot is the bootstrap-                                      error term (c), the scale factor for the bootstrap error (b), the
estimated measurement standard deviation. The simulated                                         optimized χ2 value, and the uncertainties in each of these.
data generated in this way were carried through the same                                        The parameters given are the mean values from five Monte-
calculations for the log-ratio plot as the experimental data,                                   Carlo simulation runs with different random number seeds
using the same cutoff of 50% RSD for the bootstrap error                                        and the uncertainties quoted are the corresponding standard
estimates. If the error model and bootstrap error estimates                                     deviations.
are correct, this should lead to a distribution of residuals                                       Table 1 reveals several interesting characteristics of the
that is similar to that for the experimental data. The two                                      models. First of all, all of the χ2 values are quite rea-
distributions were compared using a χ2 statistic, which was                                     sonable, well below critical values in most cases, and much
minimized by adjusting the value of the parameter c in                                          improved over earlier models. This supports the validity of
Eq. (9); that is, the simulated (expected) distribution was                                     the model used. The proportional errors extracted show a
fit to the experimental (observed) distribution by adjusting                                    significant range over the set of duplicate slides, with a
the level of proportional error. To ensure reliability in the                                   mean value around 14%. As already noted, these estimates
expected distribution, it was generated by calculating the                                      assume that the proportional error contributions are the
average distribution of 50 sets of simulated measurements,                                      same from each slide, although in reality this is not likely to
resulting in a relatively smooth curve. Unfortunately, this                                     be the case, so the number is likely a root-mean-square
Anal Bioanal Chem (2007) 389:2125–2141                                                                                         2137

Table 1 Results of Monte-Carlo fitting of orthogonal residuals for   diminished, but some adjustment may be needed if a more
duplicate slide pairs in the yeast data set
                                                                     quantitatively accurate estimate of the measurement uncer-
Time      Slide      % Proportional     Scale factor     χ2          tainty is required.
(min)     pair       error (100c)       (b)
                                                                     CAMDA microarray data
0         1,2        18.9±0.5           4.24±0.11        23.7±0.6
0         1,3        21.7±0.8           4.33±0.14        30.5±1.7
0         2,3        9.5±0.5            1.93±0.07        19.4±0.8    Given the somewhat unexpected results for the yeast micro-
1         4,5        37.2±1.6           3.61±0.40        44.0±2.5    array data, a second data set was investigated using the
10        7,8        0.14±0.11          4.20±0.04        57.2±1.6    same procedures. The Plasmodium falciparum time course
20        10,11      5.1±0.6            3.40±0.06        44.6±2.8    study available through the CAMDA project was a suitable
35        14,15      7.6±0.6            2.24±0.12        21.8±1.1    candidate because of its similarities to the yeast microarray
                                                                     study. As for the yeast study, a wide range of ratios was
                                                                     observed as a consequence of the large changes in gene
composite of the two contributions. The largest proportion-          expression over time and replicate slides were available at
al error contribution is 37% observed for the duplicates at          several time points, including triplicate measurements at the
1 min, which is not surprising since this is where the most          first time point. Aside from these design similarities, how-
rapid changes in gene expression were observed to occur,             ever, the two experiments were conducted in completely
likely leading to the poorest experimental reproducibility.          separate laboratories using slides prepared on different micro-
What is quite surprising, however, is the very low pro-              arrayers for different organisms.
portional error contribution for the measurements at 10 min.             Despite the experimental differences between the two
Although this time point coincides with a relatively flat            studies, the results obtained from the CAMDA data set
region for changes in gene expression, so a lower pro-               were remarkably similar to those for the yeast and are only
portional error contribution might be anticipated, the virtual       briefly summarized here. In terms of data filtering, the use
absence of any proportional error was quite unexpected.              of flags or a 30% RSDmeas cutoff produced similar im-
Nevertheless, this result was very consistent and the fits           provements in the log-ratio plots for duplicate slides, with
obtained were still quite satisfactory.                              fewer rejected measurements in the latter case. The best
    Another surprising feature of the models is the magnitude        results were obtained with the use of both criteria. These
of the scaling factors on the bootstrap estimates needed to          observations are consistent with those made for the yeast
obtain a good fit. Although only minor adjustments with              data set. When a cutoff of 3% RSDmeas was used, the pro-
values close to unity were anticipated, the values here range        portional error structure became apparent, with a χ2 value
between about 2 and 4 with a mean of 3.4. To ensure that             of 53.8 for a Gaussian fit to the orthogonal residuals of the
these estimates were not an algorithmic artifact, the fitting        log-ratio plot for duplicates at the first time point (1,706
procedures were checked using simulated distributions and            points retained). The cutoff for this set was lower than that
no significant bias was discovered. In addition, the distribu-       used for the yeast data in Fig. 7a (5%) and the fit was not as
tion of the bootstrap estimates was examined for several             good as in Fig. 7b because of the generally lower pro-
representative spots to check for skewness, but the distribu-        portional error for these data (see below). At a 2% cutoff,
tions appeared symmetric with Gaussian character. The need           the χ2 value was 34.2 (428 points retained) and at 5% it
for the scaling factor suggests that, although the bootstrap         was 96.7 (3,990 points).
estimates were found to be accurate for the simulated spots              Figure 9 is the CAMDA equivalent to Fig. 8 for the
described in the previous section, they are underestimated           yeast data and employs duplicate measurements from the
by a factor of 2 to 4 for the experimental measurements,             first time point. The histogram of the orthogonal residuals
indicating that there are some elements of the error structure       when the cutoff was 50% RSDmeas shows a very poor fit
that were unaccounted for in the simulations. Nevertheless, it       (red curve) to a Gaussian distribution, as expected, with a
was encouraging that a simple linear transformation was              χ2 value of 726. As before, Fig. 9b shows that the quality
sufficient to provide a good fit to the observed distributions.      of the Gaussian fit decreases as the cutoff is increased for
This is especially noteworthy for the duplicate slides at            all replicate slide pairs. Also as before, the Monte-Carlo fit
10 min. Here the proportional error contribution is essential-       to the observed distribution in Fig. 9a (blue curve) is much
ly zero, which means that the fit of the distribution is based       improved over the Gaussian fit, with a χ2 value of 14.9.
almost solely on the bootstrap error estimates. Although the             Table 2, which is equivalent to Table 1 for the yeast data,
χ2 value in this case is the highest in the group, the fit is        shows the proportional errors and bootstrap error scaling
still remarkably good in the circumstances. This means that          factors resulting from the Monte-Carlo fit for each of the
the utility of the bootstrap error estimates for distinguishing      duplicate pairs in the CAMDA data set. As before, a range
reliable from unreliable measurements is not substantially           of values was observed for both the proportional error
You can also read