Theory and Practice in Quantitative Genetics - University of ...

Page created by Dean Weber
 
CONTINUE READING
Theory and Practice
                            in Quantitative Genetics
                            Daniëlle Posthuma1, A. Leo Beem1, Eco J. C. de Geus1, G. Caroline M. van Baal1, Jacob B. von Hjelmborg 2,
                            Ivan Iachine3, and Dorret I. Boomsma1
                            1
                              Department of Biological Psychology,Vrije Universiteit Amsterdam,The Netherlands
                            2
                              Institute of Public Health, Epidemiology, University of Southern Denmark, Denmark
                            3
                              Department of Statistics, University of Southern Denmark, Denmark

                        ith the rapid advances in molecular biology, the near                       and phenotypic data from eight participating twin registries
                  W     completion of the human genome, the development of
                  appropriate statistical genetic methods and the availability of
                                                                                                    will be simultaneously analysed.
                                                                                                        In this paper the main theoretical foundations underly-
                  the necessary computing power, the identification of quantita-
                  tive trait loci has now become a realistic prospect for                           ing quantitative genetic analyses that are used within
                  quantitative geneticists. We briefly describe the theoretical bio-                the genomEUtwin project will be described. In addition,
                  metrical foundations underlying quantitative genetics. These                      an algebraic translation from theoretical foundation
                  theoretical underpinnings are translated into mathematical                        to advanced structural equation models will be made that
                  equations that allow the assessment of the contribution of                        can be used in generating scripts for statistical genetic soft-
                  observed (using DNA samples) and unobserved (using known
                  genetic relationships) genetic variation to population variance in
                                                                                                    ware packages.
                  quantitative traits. Several statistical models for quantitative
                  genetic analyses are described, such as models for the classi-                    Observed, Genetic, and Environmental Variation
                  cal twin design, multivariate and longitudinal genetic analyses,                  The starting point for gene finding is the observation of
                  extended twin analyses, and linkage and association analyses.
                                                                                                    population variation in a certain trait. This “observed”, or
                  For each, we show how the theoretical biometrical model can
                  be translated into algebraic equations that may be used to gen-                   phenotypic, variation may be attributed to genetic and
                  erate scripts for statistical genetic software packages, such as                  environmental causes. Genetic and environmental effects
                  Mx, Lisrel, SOLAR, or MERLIN. For using the former program                        interact when the same variant of a gene differentially
                  a web-library (available from http://www.psy.vu.nl/mxbib) has                     affects the phenotype in different environments.
                  been developed of freely available scripts that can be used to                         About 1% of the total genome sequence is estimated to
                  conduct all genetic analyses described in this paper.
                                                                                                    code for protein and an additional but still unknown per-
                                                                                                    centage of the genome is involved in regulation of gene
                                                                                                    expression. Human individuals differ from one another by
                  “Genetic factors explain x% of the population variance in
                                                                                                    about one base pair per thousand. If these differences occur
                  trait Y” is an oft heard outcome of quantitative genetic
                                                                                                    within coding or regulatory regions, phenotypic variation
                  studies. Usually this statement derives from (twin) family
                                                                                                    in a trait may result. The different effects of variants
                  research that exploits known genetic relationships to esti-
                                                                                                    (“alleles”) of the same gene is the basis of the model that
                  mate the contribution of unknown genes to the observed
                                                                                                    underlies quantitative genetic analysis.
                  variance in the trait. It does not imply that any specific genes
                  that influence the trait have been identified. Given the rapid                    Quantifying Genetic and Environmental Influences:
                  advances made in molecular biology (Nature Genome Issue,                          The Quick and Dirty Approach
                  February 15, 2001; Science Genome Issue, February 16,                             In human quantitative genetic studies, genetic and environ-
                  2001), the near completion of the human genome and the                            mental sources of variance are separated using a design that
                  development of sophisticated statistical genetic methods                          includes subjects of different degrees of genetic and envi-
                  (e.g., Dolan et al., 1999a, 1999b; Fulker et al., 1999;                           ronmental relationship (Fisher, 1918; Mather & Jinks,
                  Goring, 2000; Terwilliger & Zhao, 2000), the identification                       1982). A widely used design compares phenotypic resem-
                  of specific genes, even for complex traits, has now become a                      blance of monozygotic (MZ) and dizygotic (DZ) twins.
                  realistic prospect for quantitative geneticists. To identify                      Since MZ twins reared together share part of their environ-
                  genes, family studies, specifically twin family studies, again                    ment and 100% of their genes (but see Martin et al.,
                  appear to have great value, for they allow simultaneous                           1997), any resemblance between them is attributed to these
                  modelling of observed and unobserved genetic variation. As
                  a “proof of principle”, genomEUtwin will perform genome-
                  wide genotyping in twins to target genes for the complex
                                                                                                    Address for correspondence: Daniëlle Posthuma, Vrije Universiteit,
                  traits of stature, body mass index (BMI), coronary artery                         Department of Biological Psychology, van der Boechorststraat 1,
                  disease and migraine. To increase power, epidemiological                          1081 BT, Amsterdam, The Netherlands. Email: danielle@psy.vu.nl

                                                                         Twin Research Volume 6 Number 5 pp. 361–376                                                                 361
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Daniëlle Posthuma, A. Leo Beem, Eco J. C. de Geus, G. Caroline M. van Baal, Jacob B. von Hjelmborg, Ivan Iachine, and Dorret I. Boomsma

             two sources of resemblance. The extent to which MZ twins                           can be obtained by subtracting the MZ correlation from
             do not resemble each other is ascribed to unique, non-                             unit correlation (e 2 = 1 – rMZ).
             shared environmental factors, which also include                                       These intuitively simple rules are described in textbooks
             measurement error. Resemblance between DZ twins reared                             on quantitative genetics and can be understood without
             together is also ascribed to the sharing of the environment,                       knowledge of the relative effects and location of the actual
             and to the sharing of genes. DZ twins share on average                             genes that influence a trait, or the genotypic effects on phe-
             50% of their segregating genes, so any resemblance                                 notypic means. These point estimates, however, depend on
             between them due to genetic influences will be lower than                          the accuracy of the MZ and DZ correlation estimates and
             for MZ pairs. The extent to which DZ twins do not resem-                           the true causes of variation of a trait in the population. For
             ble each other is due to non-shared environmental factors                          small sample sizes (i.e., most of the time) they may be
             and to non-shared genetic influences.                                              grossly misleading. Knowledge of the underlying biometri-
                 Genetic effects at a single locus can be partitioned into                      cal model becomes crucial when one wants to move beyond
             additive (i.e., the effect of one allele is added to the effect of                 these twin-based heritability estimates, for instance, to add
             another allele) or dominant (the deviation from purely                             information of multiple additional family members or
             additive effects) effects, or a combination. The total                             simultaneously estimate from a number of different rela-
             amount of genetic influence on a trait is the sum of the                           tionships the magnitude of genetic variance in the
             additive and dominance effects of alleles at multiple loci,                        population.
             plus variance due to the interaction of alleles at different
             loci (epistasis; Bateson, 1909). The expectation for the phe-                      The Classical Biometrical Model:
             notypic resemblance between DZ twins due to genetic                                From Single Locus Effects on a Trait Mean
             influences depends on the underlying (and usually                                  to the Decomposition of Observed Variation
             unknown) mode of gene action. If all contributing alleles                          in a Complex Trait
             act additively and there is no interaction between them
                                                                                                Although within a population many different alleles may
             within or between loci, the correlation of genetic effects in
                                                                                                exist for a gene (e.g., Lackner et al., 1991), for simplicity
             DZ twins will be on average 0.50. However, if some alleles
                                                                                                we describe the biometrical model assuming one gene with
             act in a dominant way the correlation of genetic dominance
                                                                                                two possible alleles, allele A1 and allele A2. By convention,
             effects will be 0.25. The presence of dominant gene action
                                                                                                allele A1 has a frequency p, while allele A2 has frequency q,
             thus reduces the expected phenotypic resemblance in DZ
                                                                                                and p + q = 1. With two alleles there are three possible
             twins relative to MZ twins. Epistasis reduces this similarity
                                                                                                genotypes: A1A1, A1A2, and A2A2 with genotypic freq-
             even further, the extent depending on the number of loci
                                                                                                uencies p 2, 2pq, and q 2, respectively, under random mating.
             involved and their relative effect on the phenotype (Mather
                                                                                                The genotypic effect on the phenotypic trait (i.e., the geno-
             & Jinks, 1982). Depending on the nature of the types of
                                                                                                typic value) of genotype A1A1, is called “a” and the effect
             familial relationships within a dataset, additive genetic,
                                                                                                of genotype A2A2 “-a”. The midpoint of the phenotypes of
             dominant genetic, and shared and non-shared environmen-
                                                                                                the homozygotes A1A1 and A2A2 is by convention 0, so a
             tal influences on a trait can be estimated. For example,
                                                                                                is called the increaser effect, and –a the decreaser effect.
             employing a design including MZ and DZ twins reared
                                                                                                The effect of genotype A1A2 is called “d”. If the mean
             together allows decomposition of the phenotypic variance
             into components of additive genetic variance, non-shared                           genotypic value of the heterozygote equals the midpoint of
             environmental variance, and either dominant genetic vari-                          the phenotypes of the two homozygotes (d = 0), there is no
             ance or shared environmental variance. Additive and                                dominance. If allele A1 is completely dominant over allele
             dominant genetic and shared environmental influences are                           A2, effect d equals effect a. If d ≠ 0 and the two alleles
             confounded in the classical twin design and cannot be esti-                        produce three discernable phenotypes of the trait, d is
             mated simultaneously. Disentangling the contributions of                           unequal to a. This is also known as the classical biometrical
             shared environment and genetic dominance effects requires                          model (Falconer & Mackay, 1996; Mather & Jinks, 1982)
             additional data from, for example, twins reared apart, half-                       (see Figure 1).
             sibs, or non-biological relatives reared together.                                      Statistical derivations for the contributions of single and
                 Similarity between two (biologically or otherwise                              multiple genetic loci to the population mean of a trait are
             related) individuals is usually quantified by covariances or                       given in several of the standard statistical genetic textbooks
             correlations. Twice the difference between the MZ and DZ                           (e.g., Falconer & Mackay, 1996; Lynch & Walsh, 1998;
             correlations provides a quick estimate of the proportional                         Mather & Jinks, 1982), and some of these statistics are
             contribution of additive genetic influences (a2) to the phe-                       summarized in Table 1.
             notypic variation in a trait (a 2 = 2[r M Z – r D Z ]). The                             The genotypic contribution of a locus to the popula-
             proportional contribution of the dominant genetic influ-                           tion mean of a trait is the sum of the products of the
             ences (d2) is obtained by subtracting four times the DZ                            frequencies and the genotypic values of the different geno-
             correlation from twice the MZ correlation (d 2 = 2r MZ                             types. Complex traits such as height or weight, are assumed
             – 4rDZ). An estimate of the proportional contribution of the                       to be influenced by the effects of multiple genes. Assuming
             shared environmental influences (c2) to the phenotypic vari-                       only additive and independent effects of all of these loci,
             ation is given by subtracting the MZ correlation from twice                        the expectation for the population mean (µ) is the sum of
             the DZ correlation (c2 = 2rDZ – rMZ). The proportional con-                        the contributions of the separate loci, and is formally
             tribution of the non-shared environmental influences (e2)                                               Σ
                                                                                                expressed as µ = a(p – q) + 2 dpq        Σ
    362                                                                             Twin Research October 2003
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Genotyping in GenomEUtwin

                                          A2 A2                                                 O                   A1 A2                          A1 A1

                  Figure 1
                  Graphical illustration of the genotypic values for a diallelic locus.

                  Table 1
                  Summary of Genotypic Values, Frequencies, and Dominance Deviation for Three Genotypes A1A1, A1A2, and A2A2

                  Genotype                                       A1A1                             A1A2                               A2A2
                  Genotypic value                                a                                d                                  –a
                  Frequency                                      p2                               2pq                                q2
                  Frequency x value                              a p2                             2dpq                               –a q2
                  Deviation from the population mean             2q(a – dp)                       a(q – p) + d(1 – 2pq)              –2p(a + dq)
                  Dominance deviation                            –2q2d                            2dpq                               –2p2d

                  Decomposition of Phenotypic Variance                                                variance component contains the effect of d. Even if a = 0,
                  Although Figure 1 and Table 1 lack environmental effects,                           VA is usually greater than zero (except when p = q). Thus,
                  quantitative geneticists assume that the individual pheno-                          although VA represents the variance due to the additive
                  type (P) is a function of both genetic (G) and                                      influences, it is not only a function of p, q, and a, but also
                  environmental effects (E): P = G + E, where E refers to the                         of d. The consequences are that, except in the rare situation
                  environmental deviations, which have an expected average                            where all contributing loci are diallelic with p = q, VA is
                  value of zero. This equation does not include the term                              usually greater than zero. Models that decompose the phe-
                  GxE, and thereby assumes no interaction between the                                 notypic variance into components of VD and VE only, are
                  genetic effects and the environmental effects.                                      therefore biologically implausible. When more than one
                       The variance of the phenotype, which itself is defined                         locus is involved and it is assumed that the effects of these
                  by G + E, is given by VP = VG + VE + 2covGE where VP rep-                           loci are uncorrelated and there is no interaction (i.e., no
                  resents the variance of the phenotypic values, VG represents                        epistasis), the VGs of each individual locus may be summed
                  the variance of the genotypic values, VE represents the vari-                       to obtain the total genetic variances of all loci that influ-
                  ance of the environmental deviations, and covGE represents                          ence a trait (Fisher, 1918; Mather, 1949). In most human
                  the covariance between G and E. GE-covariance or GE-                                quantitative genetic models the observed VP of a trait is not
                  correlation can be modelled in a twin design that includes                          modelled directly as a function of p, q, a, d and environ-
                  the parents of twins (e.g., Boomsma & Molenaar, 1987a ;                             mental deviations (as all of these are usually unknown), but
                  Fulker, 1988) or in a design that includes actual measure-                          instead is modelled by comparing the observed resemblance
                  ments of the relevant genetic and environmental factors.                            between pairs of differential, known genetic relatedness,
                  For simplicity we assume that VP = VG + VE. Statistically the                       such as MZ and DZ pairs. Ultimately p, q, a, d and envi-
                  total genetic variance (VG) can be obtained by the standard                         ronmental deviations are the parameters quantitative
                  formula for the variance: σ2 =               Σ
                                                          fi(xi – µ)2, where fi                       geneticists hope to “quantify”.
                                                                                                           Comparing the observed resemblance in MZ twins and
                  denotes the frequency of genotype i, xi denotes the corre-
                  sponding mean of that genotype (as given in Table 1) and                            DZ twins allows decomposition of observed variance in a
                  µ denotes the population mean. Thus, VG = p2[2q(a – dp)] 2                          trait into components of VA, VD (or VC, for shared environ-
                  + 2pq[a(q – p) + d(1 – 2pq)] 2 + q2 [–2p(a + dq)] 2. Which                          mental variation which is not considered at this point), and
                  can be simplified to VG = 2pq[a + d(q – p)] 2 + (2pqd)2 = VA                        VE . As MZ twins share 100% of their genome, the expecta-
                  + VD (see e.g., Falconer & Mackay, 1996).                                           tion for their covariance is COVMZ = VA + VD .
                       If the phenotypic value of the heterozygous genotype                                The expectation for DZ twins is less straightforward: as
                  lies midway between A1A1 and A2A2 (i.e., the effect of d                            DZ twins share on average 50 per cent of their genome as
                  in Figure 1 equals zero), the total genetic variance simplifies                     stated earlier, they share half of the genetic variance that is
                  to 2pqa2. If d is not equal to zero, the “additive” genetic                         transmitted from the parents (i.e., 1/2 VA). As VD is not

                                                                         Twin Research October 2003                                                                                  363
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Daniëlle Posthuma, A. Leo Beem, Eco J. C. de Geus, G. Caroline M. van Baal, Jacob B. von Hjelmborg, Ivan Iachine, and Dorret I. Boomsma

             transmitted from parents to offspring it is less obvious                           matrices X, Y, and Z of dimensions 1 × 1, containing the
             where the coefficient of sharing for the dominance devia-                          path coefficients x, y, and z, respectively. The matrix algebra
             tions (i.e., the 0.25 mentioned earlier) derives from. If two                      notation for VP is XX T + YY T + ZZ T, where T denotes the
             members of a DZ twin pair share both of their alleles at a                         transpose of the matrix (and corresponds to tracing for-
             single locus they will have the same coefficient for d. If they                    wards through a path, see Neale & Cardon, 1992). The
             share no alleles or just one parental allele they will have no                     expectation for the MZ covariance is XXT + YYT and the
             similarity for the effect of d. The probability that two                           expectation for the DZ covariance is 0.5XXT + 0.25YYT.
             members of a DZ pair have received two identical alleles                           Including additional siblings in this design is straightfor-
             from both parents is the coefficient of similarity for d                           ward, and in this diagram the expectation for sib pair
             between them. The probability is 1/2 that two siblings (or                         covariance is also 0.5XXT + 0.25YYT (but one should note
             DZ twins) receive the same allele from their father, and the                       that it is not necessary to assume the same model for sib-sib
             probability is 1/2 that they have received the same allele
                                                                                                covariance as for DZ covariance).
             from their mother. Thus, the probability that they have
                                                                                                    The inclusion of non-twin siblings (if available) will not
             received the same two ancestral alleles is 1/2 × 1/2 = 1/4 , and
                                                                                                only enhance statistical power (Dolan et al., 1999b;
             the expectation for the covariance in DZ twins is COVDZ =
             1                                                                                  Posthuma & Boomsma, 2000), but also provides the
              /2 VA + 1/4 VD .
                                                                                                opportunity to test several assumptions, such as whether
             Path Analysis and                                                                  the covariance between DZ twins equals the covariance
             Structural Equation Modelling                                                      between non-twin siblings (which is often assumed, but
                                                                                                can now be tested), whether the means and variances in
             The expectations for variances and covariances of MZ
                                                                                                twins are similar to the means and variances observed in
             twins and DZ twins or sib pairs reared together may also be
                                                                                                siblings, or whether twin-sib covariance is different from
             inferred from a path diagram (Wright, 1921, 1934), which
             is an often convenient non-algebraic representation of                             sib-sib covariance, across males and females.
             models such as are discussed here (see e.g., Neale &                                   As in practice some families may for example consist
             Cardon, 1992, for a brief introduction into path analysis)                         of a twin pair and six additional siblings while other fami-
             and which can be translated directly into structural equa-                         lies consist of twins, the correlational method cannot
             tions. The parameters of these equations can be estimated                          be applied to estimate genetic and environmental contribu-
             by widely available statistical software (e.g., Mx and Lisrel).                    tions to the variance. Fortunately, these so-called non-
             Structural Equation Modelling (SEM) has several advan-                             rectangular (or unbalanced nested) data structures can be
             tages over merely comparing the MZ and DZ correlations                             handled with ease using a SEM approach.
             (Eaves, 1969; Jinks & Fulker, 1970). If the model assump-
                                                                                                Threshold Models for Categorical Twin Data
             tions are valid, SEM produces parameter estimates with
             known statistical properties, while the correlational method                       So far we have considered quantitative traits. Several
             merely allows parameter calculation. SEM thus also allows                          observed traits, however, are measured on a non-continu-
             determination of confidence intervals and of standard                              ous scale, such as dichotomous traits (e.g., disease vs. no
             errors of parameter estimates and quantifies how well the                          disease; smoking vs. non-smoking) or ordinal phenotypes
             specified model describes the data. One can either directly                        (e.g., underweight/normal weight/overweight/obesity/
             derive structural equations from a theoretical model, or use                       severe obesity), yielding summary counts in contingency
             path analysis as a non-algebraic intermediary to derive the                        tables instead of means and variances/covariances.
             structural equations.                                                              Reducing continuous scores like BMI to a categorical score
                                                                                                like obese/non-obese should be avoided, as the statistical
             Extended Twin Design
                                                                                                power to detect significant effects is much lower in categor-
             A convenient feature of SEM is the flexible handling of                            ical analyses (Neale et al., 1994).Contingency tables
             unbalanced data structures. This enables the relative easy                         typically contain the number of (twin) pairs (for each
             incorporation of data from a variable number of family
                                                                                                zygosity group) for each combination (e.g., concordant
             members. In Figure 2 a path diagram is drawn for a uni-
                                                                                                non-smokers, concordant smokers, discordant on smoking).
             variate trait measured in families consisting of a twin pair
                                                                                                Because of the inherent polygenic background of complex
             and one additional sibling. As additive, dominant and
                                                                                                traits, these data are often treated by assuming that an
             shared environmental effects are confounded in the twin
             design, the path diagram includes additive genetic influ-                          underlying quantitative liability exists with one or more
             ences (A), dominant genetic influences (D) and non-shared                          thresholds, depicting the categorization of subjects.
             environmental influences (E), but not shared environmental                         Although the liability itself cannot be measured, a stan-
             influences (C). Note that a path diagram for shared environ-                       dard-normal distribution is assumed for the liability. The
             mental influences can be obtained by substituting 0.25 (for                        thresholds (z-values in the standard normal distribution)
             the DZ correlation for dominant genetic influences) for 1.00                       are chosen in such a way that the area under the standard
             (for the DZ correlation for shared environmental influences)                       normal curve between two thresholds (or from minus infin-
             and replacing the D with C for a latent shared environmen-                         ity to the first threshold, and from the last threshold to
             tal factor.                                                                        infinity) reflects the prevalence of that category.
                  To rewrite the model depicted in Figure 2 into struc-                             For one variable measured on single subjects, the preva-
             tural equations using matrix algebra we introduce three                            lence of category i is given by:

    364                                                                             Twin Research October 2003
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Theory and Practice in Quantitative Genetics

                  Figure 2
                  Path diagram representing the resemblance between MZ or DZ twins and an additional sibling, for additive genetic influences (A), dominant
                  genetic influences (D), and non-shared environmental influences (E), for a univariate trait.
                  The latent factors A, D, and E have unit variance, and x, y, and z represent the respective path coefficients from A, D, and E to the phenotype P.
                  The path coefficients can be regarded as standardized regression coefficients. The phenotypic variance (VP) for the trait (the same for both
                  members of a twin pair) equals x2 + y2 + z2 which equals VA + VD + VE. By applying the tracing rules of path analysis, the covariance between
                  DZ twins (and sib pairs) is traced as 0.50x2 + 0.25y2, which equals 1/2 VA + 1/4 VD. The covariance between MZ twins is traced as x2 + y2 which equals
                  VA + VD.

                                  ti                                                                                ti    si
                                  φ(v)dv                                                                                φ(v)dv
                                 ti–1                                                                               ti–1 si–1

                  were ti-1 = –∞ for i = 0 and ti = ∞ for i = p for p categories,                   with ti-1, si-1 = –∞ for i = 0 and ti , si = ∞ for i = p. In this
                  and φ(v) is the normal probability density function                               case,
                                                                                                                                                 1
                                        2                                                                          φ(v) = ⏐2πΣ⏐–n/2 × e – —
                                                                                                                                          2
                                                                                                                                            v        i
                                                                                                                                                      T
                                                                                                                                                          × Σ–1 × vi ,
                                 e –0.5v ,
                                 —
                                 2π  
                                                                                                    where Σ is the predicted correlation matrix.
                                 where π = 3.14.                                                        Although the underlying bivariate distribution cannot
                                                                                                    be observed, its shape depends on the correlation between
                                                                                                    the two liability distributions. A high correlation results in
                  This can easily be extended to twin data, where one vari-                         a relatively low proportion of discordant twin pairs,
                  able is available for both members of a twin pair (e.g.,                          whereas a low correlation results in a much higher propor-
                  obesity yes/no in twin 1 and twin 2). In this case, the                           tion. Based on the contingency tables one may calculate
                  underlying bivariate normal probability density function                          tetrachoric (for dichotomous traits) or polychoric (for
                  will be characterized by two liabilities (that can be con-                        ordinal traits) twin correlations between the trait measured
                  strained to be the same) and by a correlation between them:                       in twin 1 and the trait measured in twin 2 for each zygosity

                                                                         Twin Research October 2003                                                                                  365
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Daniëlle Posthuma, A. Leo Beem, Eco J. C. de Geus, G. Caroline M. van Baal, Jacob B. von Hjelmborg, Ivan Iachine, and Dorret I. Boomsma

             group. Subsequently, genetic analyses can be conducted                             data when the same subject is assessed repeatedly in time.
             that use these correlations in the same way as is done with                        Figure 3 is a path diagram for a bivariate design (two mea-
             continuous data. The practical extension of this method to                         surements per subject; four measures for a pair of twins or
             the analysis of more than two categorical variables may not                        siblings). The corresponding matrix algebra expressions for
             be straightforward, especially if many variables are involved.                     the expected MZ or DZ variances and covariances are the
             The method then needs an asymptotic weight matrix (see                             same as for the univariate situation, except that the dimen-
             Neale & Cardon, 1992) and the tetrachoric or polychoric                            sions of matrices X, Y, and Z are no longer 1×1. An often
             correlations should be estimated simultaneously, not pair-                         convenient form for those matrices is lower triangular of
             wise, as the latter method often produces improper                                 dimensions n × n (where n is the number of variables
             correlation matrices. The computations may then become                             assessed on a single subject; in Figure 3, n = 2). The sub-
             very time consuming. The preferred method for categorical                          scripts of the path coefficients correspond to matrix
             multivariate designs is to conduct analyses directly on all
                                                                                                elements (i.e., xij denotes the matrix element in the i-th row
             available raw data, fitting each twin or sib pair individually
                                                                                                and j-th column of matrix X). The path coefficients sub-
             using the multivariate normal probability density functions
                                                                                                scripted by 21 reflect the variation that both measured
             given above. Again, the underlying correlation matrix of
                                                                                                phenotypes have in common. For example, if the path
             family data can be decomposed into genetic and environ-
             mental influences in the same way as can be done for                               denoted by x21 is not equal to zero, this suggests that there
             continuous data.                                                                   are some genes that influence both phenotypes.
                                                                                                     Thus, multivariate genetic designs allow the decompo-
             Multivariate Analysis of Twin Data                                                 sition of an observed correlation between two variables into
             The univariate model can easily be extended to a multivari-                        a genetic and an environmental part. This can be quantified
             ate model when more than one measurement per subject is                            by calculating the genetic and environmental correlations
             available (Boomsma & Molenaar, 1986, 1987b; Eaves &                                and the genetic and environmental contributions to the
             Gale, 1974; Martin & Eaves, 1977) or for longitudinal                              observed correlation.

             Figure 3
             Path diagram representing the resemblance between MZ or DZ twins, for additive genetic influences (A), dominant genetic influences (D), and
             non-shared environmental influences (E), in a bivariate design.

    366                                                                             Twin Research October 2003
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Genotyping in GenomEUtwin

                      The additive, dominance and environmental (co)vari-                            trait that in turn influences the second trait. Or, there may
                  ances can be represented as elements of the symmetric                              be genes that act in a pleiotropic way (i.e., they influence
                  matrices A = XXT, D = YYT, and E = ZZT. They contain                               both traits but neither trait influences the other). Genetic
                  the additive genetic, dominance, and non-shared environ-                           correlations do not distinguish between these situations,
                  mental variances respectively on the diagonals for variables                       but merely provide information on the nature of the causes
                  1 to n. X, Y and Z are known as the Cholesky decomposi-                            of covariation between two traits.
                  tion of the matrices A, D and E, which assures that these
                  matrices are nonnegative definite. The latter is required for                      Longitudinal Analysis of Twin Data
                  variance-covariance matrices.                                                      The aim of longitudinal analysis of twin data is to consider
                      The genetic correlation between variables i and j (rgij) is                    the genetic and environmental contributions to the dynam-
                  derived as the genetic covariance between variables i and j                        ics of twin pair responses through time. In this case the
                  (denoted by element ij of matrix A; aij) divided by the                            phenotype is measured at several distinct time points for
                  square root of the product of the genetic variances of vari-                       each twin in a pair. To analyse such data one must take the
                  ables i (aii) and j (ajj):                                                         serial correlation between the consequent measurements
                                          aij                                                        of the phenotype into consideration. The classical genetic
                               rgij = ——           .                                                 analysis methods described in previous sections are aimed
                                      ai
                                          i ×ajj
                                                                                                     at the analysis of a phenotype measured at a single point
                  Analogously, the environmental correlation (reij) between                          in time and provide a way of estimating the time-specific
                  variables i and j is derived as the environmental covariance                       heritability and variability of environmental effects. How-
                  between variables i and j divided by the square root of the                        ever, these methods are not able to handle serially correlated
                  product of the environmental variances of variables i and j:                       longitudinal data efficiently.
                                        eij                                                               To deal with these issues the classic genetic analysis
                              reij = ——          .
                                                                                                     methods have been extended to investigate the effects of
                                     eii
                                          ×ejj
                                                                                                     genes and environment on the development of traits over
                  The phenotypic correlation r is the sum of the product of                          time (Boomsma & Molenaar, 1987b; McArdle 1986).
                  the genetic correlation and the square roots of the standard-                      Methods based on the Cholesky factorization of the covari-
                  ized genetic variances (i.e., the heritabilities) of the two                       ance matrix of the responses treat the multiple phenotype
                  phenotypes and the product of the environmental correla-                           measurements in a multivariate genetic analysis framework
                  tion and the square roots of the standardized environmental                        (as discussed under “Multivariate Analysis of Twin Data”).
                  variances of the two phenotypes:                                                   “Markov chain” (or “Simplex”) model methods (Dolan,
                                                                                                     1992; Dolan et al., 1991) provide an alternative account of
                              r = rgij × —    
                                             aii
                                         (aii + eii)
                                                     × —   ajj
                                                       (ajj + ejj)
                                                                                                     change in covariance and mean structure of the trait over
                                                                                                     time. In this case the Markov model structure implies that
                                                                                                     future values of the phenotype depend on the current trait
                                 + reij ×   
                                             — × 
                                                e
                                             (a + e )
                                                ii
                                                     ii
                                                      —
                                                         e
                                                      (a + e )
                                                          ii     jj
                                                                      jj

                                                                           jj
                                                                                                     values alone, not on the entire past history. Methods of
                                                                                                     function-valued quantitative genetics (Pletcher & Geyer,
                                                                                                     1999) or the genetics of infinite-dimensional characters
                                 (i.e., observed correlation is the sum of the genetic
                                                                                                     (Kirkpatrick & Heckman, 1989) have been developed for
                                 contribution and the environmental contribution).
                                                                                                     situations where it is necessary to consider the time variable
                                                                                                     on a continuous scale. The aim of these approaches is to
                  The genetic contribution to the observed correlation                               investigate to what extent the variation of the phenotype at
                  between two traits is a function of the two sets of genes that                     different times may be explained by the same genetic and
                  influence the traits and the correlation between these two                         environmental factors acting at the different time points
                  sets. However, a large genetic correlation does not imply a                        and to establish how much of the genetic and environmen-
                  large phenotypic correlation, as the latter is also a function                     tal variation is time-specific.
                  of the heritabilities. If these are low, the genetic contribu-                          An alternative approach for the analysis of longitudinal
                  tion to the observed correlation will also be low.                                 twin data is based on random growth curve models (Neale
                       If the genetic correlation is 1, the two sets of genetic                      & McArdle, 2000). The growth curve approach to genetic
                  influences overlap completely. If the genetic correlation is                       analysis was introduced by Vandenberg & Falkner (1965)
                  less than 1, at least some genes are a member of only one of                       who first fitted polynomial growth curves for each subject
                  the sets of genes. A large genetic correlation, however, does                      and then estimated heritabilities of the components. These
                  not imply that the overlapping genes have effects of similar                       methods focus on the rate of change of the phenotype (i.e.,
                  magnitude on each trait. The overlapping genes may even                            its slope or partial derivative) as a way to predict the level at
                  act additively for one trait and show dominance for the                            a series of points in time. It is assumed that the individual
                  second trait. A genetic correlation less than 1 therefore                          phenotype trajectory in time may be described by a para-
                  cannot exclude that all of the genes are overlapping                               metric growth curve (e.g., linear, exponential, logistic etc.)
                  between the two traits (Carey, 1988). Similar reasoning                            up to some additive measurement error. The parameters of
                  applies to the environmental correlation.                                          the growth curve (e.g., intercept and slope, also called
                        Genetic correlations do not provide information on the                       latent variables) are assumed to be random and individual-
                  direction of causation. In fact, genes may influence one                           specific. However, the random intercepts and slopes may be

                                                                                Twin Research October 2003                                                                           367
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Daniëlle Posthuma, A. Leo Beem, Eco J. C. de Geus, G. Caroline M. van Baal, Jacob B. von Hjelmborg, Ivan Iachine, and Dorret I. Boomsma

             dependent within a pair of twins because of genetic and                            if L denotes the vector of the random growth curve para-
             shared environmental influences on the random coeffi-                              meters, the matrix form of the linear growth curve model
             cients. The basic idea of the method is that the mean and                          is: Y = DL + E                                   ε11
                                                                                                                α1

                                                                                                                                                               ()
             covariance structure of the latent variables determines the                             where

                                                                                                                   ()                          ()

                                                                                                                                                                 …
                                                                                                                                     1 1
             expected mean and covariance structure of the longitudinal
             phenotype measurements and one may therefore estimate
                                                                                                                β
                                                                                                                                 ( )
                                                                                                           L = 1 , D = F 0 , F = 1 2 , E = 1n
                                                                                                                α2        0 F
                                                                                                                                                 ε
                                                                                                                                                 ε21

                                                                                                                                                …
                                                                                                                                                …
             the characteristics of the latent variable distribution based                                      β2

                                                                                                                                                                 …
                                                                                                                                     1 n
             on the longitudinal data.                                                                                                           ε2n
                 The random growth curve approach shifts focus of the
             genetic analysis towards the new phenotypes — the para-                            Note, that the linear growth curve model actually repre-
             meters of the growth curve model. This framework permits                           sents a Structural Equation Model with latent variables αi
             to investigate new questions concerning the nature of
                                                                                                and βi (and εit’s) with loadings of the latent variables on the
             genetic influence on the dynamic characteristics of the phe-
                                                                                                observed responses Yit given by either 1 or t . This implies,
             notype, such as the rate of change. If the random
                                                                                                that this model may be analysed using the general SEM
             parameters of the growth curve would be observed, they
                                                                                                techniques. In particular, parameter estimation may be
             might have been analysed directly using the classical
                                                                                                carried out via the Maximum Likelihood method under
             methods of multivariate genetic analysis. However, their
             latent nature requires a more elaborate statistical approach.                      multivariate normality assumptions using the fact that the
             Since the growth curve model may be formulated in terms                            moment structure (mY, ΣY) of Y can be expressed in terms
             of the mean and covariance structure of the random para-                           of m, Σ and Var(εit), where m = E(L), Σ = Cov(L, L): mY =
             meters one might simply take the specification of the mean                         Dm, ΣY = D Σ DT + Σε, where Σε = Cov(E, E).
             and covariance structure of a multivariate phenotype as pre-                           As described previously in the section on multivariate
             dicted by the classical methods of multivariate genetic                            genetic analysis, the two-dimensional phenotype (αi , βi)
             analysis and plug it in into the growth curve model. The                           may be analysed by modelling the covariance matrix Σ for
             resulting two-level latent variable model would then allow                         MZ and DZ twins using the Cholesky factorisation
             for multivariate genetic analysis of the random coefficients.                      approach. The two-level model construction leads to a
                 In the following sections we consider the bivariate                            parameterisation of the joint likelihood for the trait in
             linear growth curve model applied to longitudinal twin                             terms of the variance components, the respective mean
             data using age as timescale. The approach may be extended                          vectors and residual variances. This yields estimates of the
             to other parametric growth curves (e.g., exponential, logis-                       two heritability values of αi and βi (and respective variabili-
             tic etc.) using first-order Taylor expansions and the                              ties of the environmental effects) and also estimates of
             resulting mean and covariance structure approximations                             correlations between the genetic and environmental com-
             (Neale & McArdle, 2000). We also describe a method for                             ponents of α i and β i , as described earlier under
             obtaining the predicted individual random growth curve                             “Multivariate Analysis of Twin Data”.
             parameters using the empirical Bayes estimator. These pre-
                                                                                                Predicting the Random Intercepts and Slopes
             dictions may be useful for selection of most informative
             pairs for subsequent linkage analysis of the random inter-                         In the following, we briefly describe a method for obtaining
             cepts and slopes.                                                                  the predicted individual random growth curve parameters.
                                                                                                These predictions may be useful for selection of most infor-
             Linear Growth Curve Model                                                          mative pairs for subsequent linkage analysis of the random
             A simple implementation of the random effects approach                             intercepts and slopes.
             may be carried out using linear growth curve models. In                                The prediction of the individual random growth curve
             this case each individual is characterized by a random inter-                      parameters may be given by the empirical Bayes estimates,
             cept and a random slope, which are considered to be the                            that is, the conditional expectation of the random growth
             new phenotypes. In a linear growth curve model the con-                            curve parameters given the measurement Y = y. As noted
             tinuous age-dependent trait (Y1t , Y2t) for sib 1 and sib 2 are                    above, Y and L are assumed multivariate normal, i.e., Y ~
             assumed to follow a linear age-trajectory given the random                         MVN(mY; ΣY) and L ~ MVN(m; Σ). The joint distribution
             slopes and intercepts with some additive measurement                               of Y and L is given by
             error: Yit = αi + βi t + εit , for sibling i, where i = 1, 2, t
             denotes the timepoint (t = 1, 2, …, n), αi and βi are the
             individual (random) intercept and slope of sib i, respec-
             tively, and εit is a zero-mean individual error residual, which
             is assumed to be independent from αi and βi. The aim of
                                                                                                             []L ∼MVN
                                                                                                               Y            {[   m
                                                                                                                                 mY ][
                                                                                                                                    ,
                                                                                                                                      Σ ΣDT
                                                                                                                                      DΣ ΣY           ]}   .

             the study is then the genetic analysis of the individual inter-                    By multivariate Gaussian theory, the predicted random
             cepts ( α i ) and slopes ( β i ). The model may easily be                          growth curve parameters of a pair with measurement Y = y
                                                                                                                      ^
             extended to include covariates.                                                    may then be given by L(y) = m + ΣDT Σ–1Y (y – mY) for esti-
                 Assume the trait (Yit ) is measured for the two sibs at t =                    mated parameters. The estimator is the best linear
             1, 2, …, n. The measurements on both twins at all time                             unbiased predictor of the individual random growth curve
             points may be written in vector form as Y = (Y11,…., Y1n,                          parameters (see Harville, 1976). Furthermore, an assess-
             Y21, …, Y2n)T (where T denotes transposition). Furthermore,                        ment of the error in estimation is provided by the variance

    368                                                                             Twin Research October 2003
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Genotyping in GenomEUtwin

                                         ^                          ^
                  of the difference L – L given by: Var(L – L|Y) = Σ – ΣDT                          ment (cultural transmission) relevant for a certain trait
                  Σ–1Y DΣ.                                                                          (Eaves et al., 1977). Effects of cultural transmission can be
                                                                                                    measured using a twin design that also includes the parents
                  Additional Components of Variance                                                 of twins (Boomsma & Molenaar, 1987a; Fulker, 1988).
                  In the aforementioned models the absence of effects of                            Active GE-correlation is the situation where subjects of a
                  genes × environment interaction, of a genes–environment cor-                      certain genotype actively select environments that are corre-
                  relation and of assortative mating was assumed. Using the                         lated with that genotype. Reactive GE-correlation refers to
                  appropriate design these effects can relatively easy be incor-                    the effects of reactions from the environment evoked by an
                  porated in structural equation models.                                            individual’s genotype. The presence of a positive GE-corre-
                       G×E interaction occurs when the effects of the environ-                      lation leads to an increase in the phenotypic variance. It is
                  ment are conditional on an individual’s genotype, such as                         difficult to measure GE-correlation, however, as active and
                  when some genotypes are more sensitive to the environ-                            reactive GE-correlation necessitate the direct measurement
                  ment than other genotypes. Genetic studies on crops and                           of these influences (Falconer & Mackay, 1996). Falconer
                  animal breeding experiments have shown that G×E interac-                          and Mackay (1996) state that GE-correlation is best
                  tion is extremely common (see summary in Lynch &                                  regarded as part of the genetic variance because “… the
                  Walsh, 1998. pp. 657–686). However, in general G×E                                non-random aspects of the environment are a consequence
                  interaction accounts for less than 20% of the variance of a                       of the genotypic value …”
                  trait in the population (Eaves, 1984; Eaves et al., 1977).                             Assortative mating refers to a correlation between the
                       G×E interaction can be modelled according to two                             phenotypic values of spouses (e.g., Willemsen et al., 2002).
                  main methods. In the first method the presence of G×E                             Assortative mating can be based on social homogamy (i.e.,
                  interaction is explored by studying the same trait in two                         preferential mating within one’s own social class) or may
                  environments (or at two time points). A genetic correlation                       occur when mate selection is based on a certain phenotype
                  between the two measurements that is less than one indi-                          (P; which in turn is a function of G and E). Phenotypic
                  cates the presence of G×E interaction (Boomsma &                                  assortative mating tends to increase additive genetic varia-
                  Martin, 2002; Falconer, 1952). However, a genetic correla-                        tion (and therefore in the overall phenotypic variation;
                  tion equal to one does not need to imply the absence of                           Lynch & Walsh, 1998) and consequently increases the
                  G×E interaction (see Lynch & Walsh, 1998). In human                               resemblance between parents and offspring as well as the
                  quantitative genetic analyses it is often not possible to                         resemblance among siblings and DZ twins. Statistically, it
                  control the environmental or genetic influences, unless the                       may conceal the presence of non-additive genetic effects and
                  specific genotype and specific environmental factors are                          overestimate the influence of additive genetic factors (Eaves
                  explicitly measured (see Boomsma et al., 1999; Dick et al.,                       et al., 1989; Cardon & Bell, 2000; Carey, 2002; Heath et
                  2001; Kendler & Eaves, 1986; Rose et al., 2001; but see                           al., 1984; Posthuma et al., in press). Assortative mating is
                  Molenaar et al., 1990, 1999).                                                     known to exist for traits such as intelligence, exercise behav-
                       In a second method the presence of G×E interaction is                        ior and body height and weight (e.g., Aarnio et al., 1997;
                  explored by correlating the MZ intra-pair differences and                         Boomsma, 1998; Vandenberg, 1972), but is to a large extent
                  MZ pair sums (Jinks & Fulker, 1970). Assuming that MZ                             an unexplored topic in most human populations.
                  twin similarity is purely genetic, a relation between MZ
                  means and standard deviations suggests the presence of                            Gene Finding
                  G×E interaction.                                                                  In addition to the estimation of the effects of unmeasured
                       G×E interaction is often not included in quantitative                        genes and environmental influences on traits, SEM can
                  genetic models. However, if the true world does include                           also be used to test the effects of measured genetic and envi-
                  GxE interaction, assuming its absence may lead to biased                          ronmental factors. In the following sections we will
                  estimates of G and E (Eaves et al., 1977). For example if                         concentrate on the detection of the actual genes that influ-
                  G×E interaction was truly gene by non-shared environment                          ence a trait. Two methods are currently employed:
                  interaction, a model without G×E interaction will result in                       Quantitative trait loci (QTL) linkage analysis and associa-
                  overestimation of the effects of the non-shared environ-                          tion analysis.
                  ment. If, however, G×E interaction was interaction
                  between genes and shared environmental influences,                                QTL Linkage Analysis
                  assuming its absence will result in overestimation of the                         Quantitative trait loci (QTL) linkage analysis establishes
                  effect of genes on the phenotype, as well as in overestima-                       relationships between dissimilarity or similarity in a quanti-
                  tion of the influence of the shared environmental on the                          tative trait in genetically related individuals and their
                  phenotype. The separate detection of these two biased                             dissimilarity or similarity in regions of the genome. If such
                  effects in the presence of genes by shared environmental                          a relationship can be established with sufficient statistical
                  interaction necessitates the inclusion of twin pairs reared                       confidence, then one or more genes in those regions are
                  apart (Eaves et al., 1977; Jinks & Fulker, 1970).                                 possibly involved in trait (dis)similarity among individuals.
                      GE-correlation occurs when the genotypic and environ-                              Linkage analysis depends on the co-segregation (i.e., a
                  mental values are correlated. Three different forms of                            violation of Mendel’s law of independent assortment which
                  GE-correlation have been described (Plomin et al., 1977;                          applies only to inheritance of different chromosomes) of
                  Scarr & McCartney, 1983). Passive GE-correlation occurs                           alleles at a marker and a trait locus (Ott, 1999). If a pair of
                  for example when parents transmit both genes and environ-                         offspring has received the same haplotype from a parent in

                                                                         Twin Research October 2003                                                                                  369
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Daniëlle Posthuma, A. Leo Beem, Eco J. C. de Geus, G. Caroline M. van Baal, Jacob B. von Hjelmborg, Ivan Iachine, and Dorret I. Boomsma

             a certain region of the genome, the pair is said to share the                     unambiguously known, it is usually estimated probabilisti-
             parent’s alleles in that region identical by descent (IBD).                       cally from the specific allele pattern across chromosomes of
             Since offspring receive their haplotypes from two parents,                        two or more siblings (Abecasis et al., 2002; Kruglyak et al.,
             the pair can share 0, 1 or 2 alleles IBD at a certain locus in                    1996). The estimate of π is referred to as π, ^
                                                                                                                                                 and can be cal-
             a region. The IBD status of a pair is usually estimated for a                     culated as (Sham, 1998): π = /2 pIBD1 + pIBD2.
                                                                                                                           ^    1

             number of markers with (approximately) known location                                  We call the correlation between the dominance values
             along the genome and is then used as the measure of                               within a population of sib pairs δ, which in the population
                                                                                                                       ^
             genetic similarity at the marker. The IBD status at a marker                      can be estimated as: δ = pIBD2, where pIBD2 is the proportion
             is informative for the IBD status at any other locus along                        of sib pairs that share two alleles IBD in this population.
             the chromosome as long as the population recombination                            This can be incorporated in a path diagram (Figure 4).
             fraction between the marker and the locus is less than 0.5.                            The path coefficients v and w (Figure 4) and the rela-
             In that case the IBD status at the marker and the locus are                       tive contribution of the factors Am and Dm to the
             correlated in the population and hence similarity at the                          phenotypic variation are a function of the recombination
             marker is informative for similarity at the locus. The locus                      fraction between the marker and the trait locus and the
             may be a gene or may be located near a gene. If variation in                      magnitude of the genetic effects of the trait locus.
             the gene and variation in the trait are related (i.e., the gene                   Relatively small effects of the factors Am and Dm can thus
             is a QTL), then variation in the IBD status at the locus and                      either reflect a situation with small effects at the trait locus
             thus also at the marker will be related to variation in trait                     and a small recombination fraction (close to zero) or may
             similarity. Identity by descent is distinct from identity is                      reflect large effects at the trait locus in combination with a
             distinct from identical by state (IBS), which denotes the                         large (close to 0.5) recombination fraction between the
             number of physically identical alleles that a pair of off-
                                                                                               marker and the trait locus.
             spring has received, and that have not necessarily been
                                                                                                    The expectation for the variance is in algebraic terms x2
             inherited from the same parent. Table 2 gives all possible
                                                                                               + y2 + v2 + w2 + z2 , the expectation for the covariance
             sib pairs from a A1A2 by A1A2 mating and their IBD /
                                                                                               among MZ twins is x2 + y2 + v2 + w2 , and for the covari-
             IBS status.                                                                                                                                  ^
                                                                                               ance among DZ twins is 1/2 x 2 + 1/4 y 2 + πv       ^ 2
                                                                                                                                                        + δw2 .
                  A commonly made distinction is between parametric
                                                                                               Translating this into matrix algebraic terms we introduce
             and nonparametric models for linkage analysis. Parametric
             models, which are discussed at length by Ott (1999),                              matrix Q (i.e., the product of matrices V and VT, represent-
             require a fairly detailed specification of population charac-                     ing the additive genetic influences on the phenotype at the
             teristics of the gene, such as allele frequencies and                             marker site) and matrix R (i.e., the product of matrices W
             penetrances. Nonparametric models are not nonparametric                           and WT, representing the dominant genetic influences on
             in the usual statistical meaning of nonparametric, but these                      the phenotype at the marker site). Written in matrix
             are parametric models that require fewer assumptions than                         algebra the expectation for the variance equals XXT + YYT +
             nonparametric linkage models. Parametric models use a                             ZZT + VVT + WWT (or A + D + E + Q + R), the expecta-
             fairly simple relationship between the contribution of a                          tion for the covariance of MZ twins equals XXT + YYT +
             QTL to the covariance of the trait values of pairs of indi-                       VVT + WWT (or A + D + Q + R), and the expectation for
             viduals, the pairs’ IBD status at a certain location on the                       the covariance of DZ twins equals 0.5XXT + 0.25YYT +
                                                                                                         ^                                     ^
             genome and the recombination fraction between the locus                           πVVT + δ WWT (or 0.5A + 0.25D + πQ
                                                                                               ^                                         ^
                                                                                                                                            + δ R).
             and the QTL. This relationship is usually expressed as a                               Testing whether the elements of matrices V and W are
             function of the proportion πi of alleles shared identical by                      statistically different from zero provides a test for linkage at
             descent, π i = i/2 for i = 0, 1, 2. For sibling pairs, for                        a particular marker position. In a genome screen this test is
             instance, the contribution to the covariance given the pro-                       conducted for each marker along the genome. Those
             portion πi is f(θ) πi σ 2, where σ 2 is the QTL’s contribution                    marker positions for which the  2 difference exceeds a
             to the population trait variance, θ is the recombination                          certain critical value are believed to be linked to a QTL.
                                                                                                                                     ^
             fraction between the locus and the QTL, and the monoto-                           Apart from calculating π's  ^
                                                                                                                               and δ's to model the linkage
             nic function f(θ) equals 0 and 1 for recombination fractions                      component, one may also apply a mixture model. In this
             0.5 and 0, respectively. Since IBD status is not always                           model for each sib pair three models (for IBD = 0, IBD =

             Table 2
             IBD / IBS Status from all Possible Sib Pairings from Parental Mating Type A1A2 (father)  A1A2(mother)

                                                                                                           Sib 1
                                                                 A1A1                        A1A2                               A2A1                     A2A2
             Sib 2
                  A1A1                                           2/2                         1/1                                1/1                      0/0
                  A1A2                                           1/1                         2/2                                0/2                      1/1
                  A2A1                                           1/1                         0/2                                2/2                      1/1
                  A2A2                                           0/0                         1/1                                1/1                      2/2
             Note: A1A2 (Father in bold) × A1A2 (Mother in normal text).

    370                                                                             Twin Research October 2003
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
Genotyping in GenomEUtwin

                  Figure 4
                  Path diagram representing the resemblance between MZ or DZ twins, for background additive genetic influences (A), background dominant
                  genetic influences (D), additive genetic influences due to the marker site (Am), dominant genetic influences due to the marker site (Dm), and non-
                  shared environmental influences (E), in a univariate design.

                  1, IBD = 2) are fitted to the data that are weighted by their                          By a simple rewriting of the likelihood of the joint dis-
                  relative probabilities.                                                           tribution under normality, Wright (1997) demonstrated
                      Apart from these variance components methods for                              that the difference and sum together carry all the informa-
                  linkage analyses, other statistical methods for conducting a                      tion in the variance and covariance of the joint
                  QTL linkage analysis have been proposed. Haseman and                              distribution. He suggested regression approaches using
                  Elston (1972) developed a now classical model for QTL                             both the difference and the sum. Subsequently several
                  linkage analysis. In the HE model the squared difference of                       authors (Drigalenko, 1998; Forrest, 2001; Sham & Purcell,
                  the trait values y1 and y2 of siblings 1 and 2 is regressed on                    2001; Sham et al., 2002; Visscher & Hopper, 2001; Xu et
                  the value of πi at a particular location on the genome. The                       al., 2000) proposed regression methods that use the infor-
                  average of the squared difference in a given population
                                                                                                    mation in both the squared sum and the squared difference
                  equals var (y1 ) + var (y2 ) – 2 cov (y1, y2 ). As the assignment
                                                                                                    for inference about a QTL effect. These methods were
                  of individuals’ trait values to y1 or y2 is usually arbitrary this
                  becomes 2var (y) – 2 cov (y1, y2 ). With a few additional                         shown to be nearly as powerful as the likelihood-based vari-
                  assumptions only cov (y1, y2) varies as a function of the IBD                     ance component methods. The methods generally have the
                  status at the locus, although it does also depend on the vari-                    advantage that the computations are easy and fast, which is
                  ances of other QTL’s. Then the regression equation for the                        of some importance if the models are fitted for many loca-
                  squared difference is µ - f (θ)2 σ 2πi, where πi is the regres-                   tions along the genome. In contrast, the variance
                  sor and the constant µ contains variances associated with                         components methods can be quite time consuming and
                  the total environmental and genetic effects, including the                        may, moreover, yield estimates that do not maximize the
                  QTL. A statistically significant negative estimated regres-                       likelihood, which is required for the validity of the distribu-
                  sion weight is suggestive of a QTL at or near the locus.                          tion theory on which inferences about the QTL are based.

                                                                         Twin Research October 2003                                                                                  371
Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
https://doi.org/10.1375/twin.6.5.361
You can also read