Genome evolution and the evolution of exon-shuZing - a review

Genome evolution and the evolution of exon-shuZing - a review
Gene 238 (1999) 103–114

 Genome evolution and the evolution of exon-shuffling — a review k
                                                              László Patthy *
                      Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest, Hungary
                               Received 4 February 1999; received in revised form 5 May 1999; accepted 1 June 1999


    Recent studies on the genomes of protists, plants, fungi and animals confirm that the increase in genome size and gene number
in different eukaryotic lineages is paralleled by a general decrease in genome compactness and an increase in the number and size
of introns. It may thus be predicted that exon-shuffling has become increasingly significant with the evolution of larger, less
compact genomes. To test the validity of this prediction, we have analyzed the evolutionary distribution of modular proteins that
have clearly evolved by intronic recombination. The results of this analysis indicate that modular multidomain proteins produced
by exon-shuffling are restricted in their evolutionary distribution. Although such proteins are present in all major groups of
metazoa from sponges to chordates, there is practically no evidence for the presence of related modular proteins in other groups
of eukaryotes. The biological significance of this difference in the composition of the proteomes of animals, fungi, plants and
protists is best appreciated when these modular proteins are classified with respect to their biological function. The majority of
these proteins can be assigned to functional categories that are inextricably linked to multicellularity of animals, and are of
absolute importance in permitting animals to function in an integrated fashion: constituents of the extracellular matrix, proteases
involved in tissue remodelling processes, various proteins of body fluids, membrane-associated proteins mediating cell–cell and
cell–matrix interactions, membrane associated receptor proteins regulating cell–cell communications, etc. Although some basic
types of modular proteins seem to be shared by all major groups of metazoa, there are also groups of modular proteins that
appear to be restricted to certain evolutionary lineages.
    In summary, the results suggest that exon-shuffling acquired major significance at the time of metazoan radiation. It is
interesting to note that the rise of exon-shuffling coincides with a spectacular burst of evolutionary creativity: the Big Bang of
metazoan radiation. It seems probable that modular protein evolution by exon-shuffling has contributed significantly to this
accelerated evolution of metazoa, since it facilitated the rapid construction of multidomain extracellular and cell surface proteins
that are indispensable for multicellularity. © 1999 Elsevier Science B.V. All rights reserved.

Keywords: Genome compactness; Introns; Modular proteins; Modules; Metazoan evolution

1. Introduction                                                               explained by two types of hypotheses. The ‘introns early’
                                                                              hypotheses assumed that introns and RNA splicing are
   Shortly after the discovery of split genes, it was                         the relics of the RNA world and the ‘genes in pieces’
realized that the existence of introns may have dramatic                      organization of the eukaryotic genome is the original,
consequences on protein evolution (Gilbert, 1978). It                         ancestral form (Darnell, 1978; Doolittle, 1978; Darnell
was pointed out that recombination within introns could                       and Doolittle, 1986; Gilbert, 1986). According to this
assort exons independently, and middle repetitious                            view, eukaryotes retained introns and the genetic plastic-
sequences in introns may create hotspots for recombina-                       ity of the primitive ancestors of all cells. On the other
tion to shuffle the exonic sequences.                                         hand, bacteria gained increased efficiency by eliminating
   The presence of introns in most eukaryotic protein-                        their introns. Supporters of the introns-early hypotheses
coding genes and their absence from prokaryotes was                           assume that the introns of all protein-coding genes
                                                                              reflect the assembly of these genes from pieces; that
  k                                                                           exons do indeed correspond to building blocks
    Presented at the Symposium on Evolutionary Genomics,
Puntarenas, Costa Rica, 11–15 January 1999.                                   (a-helices, b-sheets, etc.) from which all the genes were
  * Tel.: +36-1-4665-633. fax: +36-1-4665-465.                                assembled by intronic recombination (Gilbert and
  E-mail address: (L. Patthy)                                 Glynias, 1993).

0378-1119/99/$ – see front matter © 1999 Elsevier Science B.V. All rights reserved.
PII: S0 3 7 8 -1 1 1 9 ( 9 9 ) 0 0 22 8 - 0
Genome evolution and the evolution of exon-shuZing - a review
104                                             L. Patthy / Gene 238 (1999) 103–114

   In contrast with this, the ‘introns late’ theories suggest       exon-shuffling became significant at the time of the
that the prokaryotic genes resemble the ancestral ones              appearance of the first multicellular animals, and that
and that the introns were inserted later in genes of                the rise of exon-shuffling could in fact contribute to the
eukaryotes (Crick, 1979; Orgel and Crick,1980; Cavalier-            explosive nature of metazoan radiation.
Smith, 1985; Cech, 1985; Sharp, 1985). In fact, it is now
obvious that the exon–intron structure of eukaryotic
protein-coding genes is not static: introns are continually         2. Results and discussion
inserted into (as well as removed from) genes. The actual
mechanisms of insertion, propagation of some self-                  2.1. Introns and evolution of genome compactness
splicing introns have been analyzed in detail and the
mechanisms responsible for the insertion of spliceosomal                In the past few years, the complete sequences of the
introns are also becoming clear (Dujon, 1989;                       genomes of several Eubacteria (Escherichia coli, Bacillus
Lambowitz and Belfort, 1989; Perlman and Butow,                     subtilis, Haemophilus influenzae, Borrelia burgdorferi,
1989; Morl and Schmelzer, 1990; Belfort, 1991, 1993;                Mycoplasma pneumoniae, Mycoplasma genitalium, etc.),
Lambowitz, 1993; Mueller et al., 1993; Grivell, 1994;               Archaea (Methanococcus jannaschii, Archaeoglobus ful-
Patthy, 1995).                                                      gidus, etc.), a unicellular eukaryote (Saccharomyces cere-
   Since introns themselves are subject to evolution, it            visiae), and a multicellular animal (Caenorhabditis
is clear that exon-shuffling has been evolving parallel             elegans) have been determined, and significant progress
with the evolution of introns. We have argued previously            has also been made on the genome of a protozoan
that the introns suitable for exon-shuffling appeared at            parasite (Plasmodium falciparum), the plant, Arabidopsis
a relatively late stage of evolution; therefore, exon-              thaliana, the fruitfly, Drosophila melanogaster and vari-
shuffling could not play a major role in the construction           ous vertebrates (e.g., the fish Fugu rubripes, mouse,
of ancient proteins (Patthy, 1987, 1991a,b). The self-              human). As a result of these studies it has become clear
splicing introns of the RNA world that could be present             that there are clear correlations between genome size,
at the time the first proteins were formed are practically          genome compactness and the proportion of the genome
unsuitable for exon-shuffling by intronic recombination:            that is occupied by introns and repetitive elements.
such self-splicing introns encode an essential function,                At one extreme we find the small, compact genomes
therefore their sequence is not tolerant to intronic recom-         of Eubacteria and Archaea (with limitingly high coding
bination (Patthy, 1987, 1991a,b, 1994). Exon-shuffling              density) which are practically devoid of introns and
could become significant only with the appearance of                contain very little repetitive sequences. Vertebrate
spliceosomal introns: these introns play a negligible role          genomes represent the other extreme: they have large
in their own excision, therefore intronic recombination             genomes with a drastically reduced coding-density,
is less likely to produce recombinant introns that are              their genomes are rich in repetitive sequences and their
deficient in splicing. Furthermore, the nonessential parts          genes are characterized by a high intron to exon ratio.
of spliceosomal introns could accommodate large seg-                    Studies on the complete genomes of H. influenzae
ments of middle repetitious sequences, further increasing           ( Fleischmann et al., 1995), M. genitalium ( Fraser et al.,
the chances of intronic recombination. Since spliceoso-             1995), M. pneumoniae (Himmelreich et al., 1996, 1997),
mal introns evolved relatively recently from group II               B. subtilis( Kunst et al., 1997), E. coli(Blattner et al.,
self-splicing introns (Cech, 1986; Jacquier, 1990;                  1997), Borrelia burgdorferi (Fraser et al. 1997) have
Cavalier-Smith, 1991; Copertino and Hallick, 1993;                  revealed that the genomes of these eubacteria are very
Saldanha et al., 1993; Sharp, 1994) and are restricted in           compact. The major part of their genomes (about 90%)
their evolutionary distribution (Cavalier-Smith, 1991;              is dedicated to protein-coding genes (830–976 genes per
Palmer and Logsdon, 1991; Logsdon, 1991) exon-                      Mbp) and they do not contain large quantities of
shuffling could play a major role only in the construction          intergenic DNA, introns or repetitive elements, reflecting
of ‘younger’ proteins (Patthy, 1987; 1991a,b, 1994,                 the fact that there is strong selective pressure to eliminate
1995, 1996).                                                        nongenic DNA.
   In the present review I wish to emphasize that the                   A limitingly high coding density also seems to hold
significance of exon-shuffling increased parallel with the          for Archaea, genes covering about 90% of their compact
evolution of less compact genomes. The basis of this                genomes. For example, in the case of the genome of M.
correlation is that the number and size of introns and              jannaschii (Bult et al., 1996) coding density is similar to
the proportion of repetitive sequences in introns                   that of eubacteria: 1022 genes per Mbp. Similarly, in
increases parallel with the decrease of genome compact-             the genome of another Archaeon, A. fulgidus ( Klenk
ness, therefore the chances of exon-shuffling by intronic           et al., 1997) there are 1107 genes per Mbp, genes
recombination also increase. Analysis of the evolution-             covering 92.2% of the genome.
ary distribution of proteins that were clearly assembled                The genome of the unicellular eukaryote, S. cerevisiae
from modules by intronic recombination suggests that                is significantly less compact than those of eubacterial or
Genome evolution and the evolution of exon-shuZing - a review
L. Patthy / Gene 238 (1999) 103–114                                         105

archaeal prokaryotes: open reading frames occupy ‘only’           human genome, which is estimated to have only about
68% of the yeast genome (Bassett et al., 1996; Clayton            20 genes per Mbp. The compactness of the C. elegans
et al., 1997; Dujon, 1997; Oliver, 1997). Thus, compared          genome (relative to vertebrate genomes) is due to the
with eubacterial or Archaeal genomes, where ORFs                  fact that introns and intergenic distances are significantly
occupy about 90% of their genomes, yeast seems to be              shorter. More than half of the C. elegans introns are
under weaker selective pressure to reduce genome size.            shorter than 60 bp, the most common length being only
( Whereas in Archaea and Eubacteria there are about               48 bp. This tendency may be best illustrated by the fact
830–1100 genes per Mbp, in the case of yeast this                 that the size of C. elegans genes is usually much smaller
number is only 446.) Nevertheless, introns of yeast               than that of homologous vertebrate genes and they
protein-coding genes are still few (233), and short: less         usually contain fewer introns. For example, the nema-
than 4% of the genes contain introns, and introns                 tode a1(IV ) and a2(IV ) collagen genes are about 9 kb
account for less than 1% of the entire genome. Compared           long with only 11 and 19 introns, whereas the homolo-
with higher eukaryotes, the yeast genome is compact               gous human type a2(IV ), a5( IV ) and a6( IV ) collagen
due to the short size of intergenic regions, that introns         genes are 100–200 kb long with 46, 50 and 44 introns
are few and small, and that repetitive sequences are              (Sibley et al., 1993; Oohashi et al., 1995), the C. elegans
infrequent.                                                       osteonectin gene spans 3.6 kb and has five introns,
    The genome of the protozoan parasite, Plasmodium              whereas the mammalian homolog spans 26 kb and has
falciparum is 30 Mb, about twice as large as that of              nine introns (McVey et al., 1988; Schwarzbauer and
yeast. Analysis of chromosome 2 of P. falciparum                  Spencer, 1993). As a reflection of the compactness of
revealed that it contains 0.95 Mbp and encodes 209                the C.elegans genome, tandem repeats account for only
protein-coding genes (Gardner et al., 1998). The esti-            2.7% of the genome, and inverted repeats account
mated gene density is thus less than half of that found           for 3.6%.
in the case of yeast: 221 genes per Mbp. As compared                 The genome of D. melanogaster is about 170 Mbp
with the yeast genome, the decrease in genome compact-            and contains about 12 000 protein-coding genes (Rubin,
ness is also reflected in a significantly increased propor-       1998). Although the estimated gene density (71 genes
tion of introns: 43% of the protein-coding genes contain          per Mbp) is significantly lower than in yeast, the genome
at least one intron.                                              is more compact than most vertebrate genomes. The D.
    The 100–120 Mbp genome of A. thaliana, a small                melanogaster genes are rather compact, 50% of the
crucifer weed, is significantly less compact than the yeast       introns are
Genome evolution and the evolution of exon-shuZing - a review
106                                           L. Patthy / Gene 238 (1999) 103–114

mammalian homologs, the majority of introns being                 greater compactness thus have fewer and shorter introns
between 60 and 150 bp. An illustrative example is the             than genes located in the least compact genomic regions.
Huntington’s disease gene, which is ‘only’ 22 kb in Fugu             These observations thus suggest that the evolutionary
as compared with the 180 kb human gene (i.e. an                   forces controlling genome compactness have a direct
eightfold difference in size). Importantly, the exon–             influence on the frequency and size of introns, as well
intron organization of this gene is identical to that of          as the relative abundance of repetitive elements in
the human homolog: reduction occurred at the expense              genomes. It may thus be predicted that the evolution of
of noncoding regions and introns. The gene structure of           less compact genomes was paralleled by an increase in
complement component C9 of the pufferfish also                    the potential for intronic recombination and exon-
illustrates these features of the Fugu genome ( Yeo et al.,       shuffling. To test this prediction we have analyzed the
1997). The 11 exons of the Fugu C9 gene span 2.9 kb               evolutionary distribution of protein-coding genes con-
of genomic DNA, whereas the 11 exons of human C9                  structed by exon-shuffling via intronic recombination.
span 90 kb, representing a 30-fold difference in size at
the expense of intron size. The compactness of the Fugu           2.2. Identification of cases of exons-shuffling
genome (relative to the human genome) is due to the
fact that it contains significantly less nongenic DNA.               In order to establish a case of modular protein
There are no abundant classes of dispersed repeats in             evolution by exon-shuffling, one has to show (1) that
Fugu and all forms of repetitive DNA (including                   two unquestionably homologous modules (i.e., modules
telomeric repeats, ribosomal RNAs) constitute less than           derived from a common ancestor) are present in other-
10% of the Fugu genome. The average gene density in               wise nonhomologous protein environments and (2) that
F. rubripes is quite similar to that in C. elegans: #150          the transposition of the module was mediated by intronic
genes per Mbp (in human this value is 20).                        recombination. In this review I will discuss only cases
    In summary, the evidence from genome projects                 where both these criteria are satisfied.
suggests that prokaryotes are characterized by small                 It must be emphasized that frequently it is not a
compact genomes with little space for intergenic DNA,             trivial task to prove homology of modules or to prove
introns or repetitive sequences. In eukaryotes multiple           a role of introns in module-shuffling. Introns-early ver-
replication origins have permitted the evolution of               sions of the exon-shuffling theory that assume that all
larger, less compact genomes which can accommodate                primordial proteins were assembled by intronic recombi-
increasing amounts of intergenic regions, introns and             nation from short exons that encoded various secondary
repetitive sequences. The inverse relationship between            structural elements (Dorit et al., 1990) face the formida-
genome compactness and intron number and intron size              ble task of proving that in two otherwise unrelated
of protein-coding genes is valid not only for entire              proteins two structurally similar a-helices, membrane
genomes, but seems to hold even for different isochores           spanning domains or b-sheets, are similar due to
of a given genome. In vertebrate genomes, genes are not           common ancestry rather than due to convergence.
evenly distributed in different isochores: for example,           Obviously, such basic structural units were invented
the most GC-rich H3 isochore of the human genome                  independently several times during evolution (Patthy,
contains about 28% of the genes, although H3 accounts             1991a).
for only 3–5% of the genome. Conversely, only 34% of                 Even if criterion (1) is satisfied, we cannot take it for
the human genes are found in the GC-poor L1+L2,                   granted that exon-shuffling was responsible for module-
although they represent 62% of the genome                         shuffling. Although exon-shuffling by intronic recombi-
(Mouchiroud et al., 1991; Bernardi, 1993). In other               nation is by far the most powerful mechanism of modu-
words, GC-poor isochores appear to be gene deserts as             lar protein evolution, this does not mean that it is the
compared to the most GC-rich isochores: the gene                  only way to exchange domains among protein-coding
density is four to eight times higher in H3 than in               genes (Patthy, 1996). Some recent examples show how
H1+H2 and about 10–20 times higher than in L1+L2                  modular proteins of bacteria may be constructed without
(Mouchiroud et al., 1991; Bernardi, 1993). Significantly          the assistance of introns. For example, a modular protein
for our present discussion, protein-coding genes of               of Peptostreptococcus magnus has been shown to be the
GC-poor isochores contain more introns per kb of                  product of a recent intergenic recombination of two
coding sequence than those in GC-rich isochores, and              different types of streptococcal surface proteins (de
the intervening sequences are, on average, three times            Chateau and Bjorck, 1994). The recent transfer of a
longer for genes located in L1+L2 than for those found            fragment of a prokaryotic gene to another shows that
in H3 isochores (Duret et al., 1995). The ratio of                introns are not absolutely essential for exchange of gene
intervening sequence to coding sequences is thus strik-           fragments. These studies have also shown that gene
ingly different for L1+L2 and H3: genes in H3 are on              rearrangements by exonic recombination may be facili-
average 2.5 times more compact than those in L1+L2                tated by the presence of special recombinogenic DNA
(Duret et al., 1995). The genes of genomic regions of             sequences in intermodule linker regions, and that antibi-
Genome evolution and the evolution of exon-shuZing - a review
L. Patthy / Gene 238 (1999) 103–114                                                     107

                                                                           Fig. 1. Schematic representation of the structures of the genes of some
                                                                           modular proteins assembled from the class 1-1 C-type lectin-module
                                                                           (LN ), epidermal growth factor-module (G), complement B-type
                                                                           module (B), immunoglobulin-module (Ig), link protein-module (LK ),
                                                                           follistatin-module (FS ) and Kunitz-type inhibitor module (INH ). The
                                                                           numbers indicate the position and phase class of the introns. Phase 1
                                                                           introns found at the boundaries of modules are highlighted in red. The
                                                                           black boxes indicate signal peptide domains, vertical black bars repre-
                                                                           sent transmembrane domains. The boxes designating the different
                                                                           domains are drawn to scale.

                                                                           otics may provide the selective pressure for the creation
                                                                           of advantageous chimaeras (de Chateau and Bjorck,
                                                                              To provide unambiguous evidence for exon-shuffling,
                                                                           one has to show that shuffling of the module was
                                                                           mediated by flanking introns that belonged to the same
                                                                           phase, i.e., the module was ‘symmetrical’ in accordance
                                                                           with the phase-compatibility-rules of exon-shuffling
                                                                           (Patthy, 1987). Many protein-coding genes produced
                                                                           by exon-shuffling could be recognized by a striking
                                                                           correlation between the exon-structure of the genes and
                                                                           the domain-organization of proteins (Patthy, 1985, 1987,
                                                                           1991b). As a consequence of the phase-compatibility
                                                                           rule, the introns of these modular proteins also show a
                                                                           marked intron-phase bias: e.g., in the genes of modular
                                                                           proteins produced by exon-shuffling of class 1-1 mod-

Fig. 2. Comparison of the exon–intron structures of genes of some modular vertebrate proteins with those of invertebrate homologs. (A) Comparison
of the structure of the Drosophila and human tolloid genes. (B) Comparison of the structure of the genes of the Caenorhabditis and human netrin
receptors. The numbers indicate the position and phase class of the introns. Phase 1 introns found at the boundaries of modules are highlighted
in red. The black boxes indicate signal peptide domains, vertical black bars represent transmembrane domains. The boxes designating the different
domains are drawn to scale.
Genome evolution and the evolution of exon-shuZing - a review
108                                            L. Patthy / Gene 238 (1999) 103–114

ules, the introns that participated in the assembly process        to each other and inserted into new locations by recom-
are all phase 1 (Patthy, 1987, 1991a). The genes of                bination in phase 1 introns, then we may rightfully
selectins (e.g., granule membrane protein 140 of plate-            assume that they also arose by exon shuffling, even
lets, endothelial-leukocyte adhesion molecule 1, lympho-           though the ‘original’ introns are already missing from
cyte homing receptor), interleukin-2 receptor, factor              their genes.
XIIIb subunit, cartilage link protein, follistatin, lipopro-          In this respect, the tolloid genes provide illustrative
tein-associated coagulation inhibitor provide typical              examples. The tolloid gene of D. melanogaster encodes
examples of such a correlation: their C-type lectin-,              a modular astacin-type metalloprotease containing five
growth factor-, complement B-type modules,                         class 1-1 complement C1r type (CUB) modules and two
immunoglobulin-, link-, follistatin modules, Kunitz-type           class 1-1 EGF-like (G) modules ( Fig. 2). The 3.4 kb
inhibitor modules are encoded by discrete class 1-1                gene of Drosophila is rather compact, contains only six
exons, i.e., all the intermodule introns are phase 1               short (49–142 bp) introns (Childs and O’Connor, 1994);
(Fig. 1.). The fact that the structure of the gene of a            only two of the module-boundaries are marked by phase
modular protein conforms to these rules may actually               1 introns, and four of the ‘expected’ intermodule phase
be used as evidence that it has evolved by exon-shuffling          1 introns are missing ( Fig. 2.). The gene (6 kb) of a
(Patthy, 1988).                                                    related protease (BP10) from sea urchin blastula with
    There are many cases where the correlation between             a different set of six introns also failed to show a clear
modular structure of a multidomain protein (consisting             correlation between modular structure of the protein
of class 1-1 modules) and exon–intron structure of its             and exon–intron structure of the gene, therefore it was
gene is less perfect, since some of the ‘expected’ introns         suggested that ‘‘it is unlikely that exon-shuffling had a
are missing from the module boundaries (Patthy, 1994;              role in the evolution of the astacin/EGF/CUB subfamily
1995; 1996). The most plausible explanation for the                of proteases’’ (Lhomond et al.,1996). Studies on the
weaker correlation between exon–intron structure of the            genomic organization of the mammalian tolloid gene
gene and domain structure of the protein is that the               suggest that this conclusion is unwarranted. The gene
original genomic organization has been obscured by                 of human Bone Morphogenetic Protein 1 (BMP1)
removal and insertion of introns (Patthy, 1994). As a              encodes a protein with a domain structure identical to
result of continual intron insertion and intron removal,
                                                                   that of the Drosophila Tolloid protein, but its gene
the correlation between modular protein structure and
                                                                   structure is strikingly different from that of the
gene structure may get weaker and weaker, and may
                                                                   Drosophila homolog (Takahara et al., 1995). Consistent
eventually lead to an exon–intron organization that has
                                                                   with the differences in genome compactness of fly and
little or nothing to do with the one that existed at the
                                                                   human, the human tolloid gene is more than tenfold
time of the formation of the gene. Since erosion of the
                                                                   larger (46 kb) since it contains a greater number and
original genomic structure progresses with time, it is not
                                                                   significantly longer introns than the fly homolog.
surprising that old genes usually show less perfect corre-
                                                                   Importantly, all boundaries separating the class 1-1
lation with protein structure than recently assembled
ones. This point may be illustrated by the exon–intron             CUB and G modules are marked by the ‘expected’ phase
structures of laminin genes. Laminins are among the                1 introns ( Fig. 2.). It thus appears that, in this respect,
oldest modular proteins unique to metazoa, inasmuch                the human tolloid gene is more similar to the original
as they are already present in Hydrozoa (Sarras et al.,            gene structure than the fly homolog. The difference in
1994), and their domain organization was practically               gene structure may be explained by assuming that in
unchanged during subsequent evolution. Consistent with             the Drosophila lineage (characterized by a more compact
the great age of these genes, many of the original phase           genome) there was greater selective pressure to eliminate
1 introns are already missing from the boundaries of               introns than in the chordate lineage (characterized by a
the class 1-1 laminin B-type modules of Drosophila and             less compact genome).
vertebrate laminin B1 and B2 chain genes ( Vuolteenaho                There are other cases which suggest that the gene-
et al., 1990; Chi et al., 1991; Kallunki et al., 1991; Gow         structures of chordate homologs are more likely to have
et al., 1993).                                                     preserved the original introns. As another example we
    It must be pointed out that the absence of the                 may mention the case of human and C. elegans netrin
expected introns from some module boundaries of pro-               receptor genes. The gene of the human netrin receptor
teins is frequently interpreted as evidence that these             DCC (Deleted in Colorectal Cancer) encodes a trans-
proteins did not evolve by exon shuffling. Such a conclu-          membrane protein with an extracellular part containing
sion is unjustified. If a modular protein is composed of           four class 1-1 immunoglobulin (Ig) modules and six
various class 1-1 modules ( EGF-like modules, comple-              class 1-1 fibronectin type III ( FN3) modules (Fig. 2.).
ment B-type modules, C-type lectin modules, laminin                The human DCC gene contains 28 introns and spans
B-type modules, CUB modules, etc.) that have clearly               approx. 1400 kbp: it is the largest tumor suppressor
been shown in most other cases to be duplicated, joined            gene identified to date (Cho et al., 1994). Significantly,
L. Patthy / Gene 238 (1999) 103–114                                         109

with one exception, the phase 1 introns have been                  that proteins composed of class 1-1 modules familiar
preserved at the boundaries separating the class 1-1 Ig            from vertebrates have been found in sponges, hydrozoa,
and FN3 modules ( Fig. 2). In contrast with this, in the           nematodes, molluscs, arthropods, and echinoderms, etc.,
case of the C.elegans homolog, 6 of the 11 expected                indicates that this machinery of exon-shuffling was avail-
phase 1 introns are missing from the module boundaries             able before the divergence of these metazoan phyla.
(Fig. 2.). This difference in genomic organization may             There can be no doubt that the mechanism of the
be also explained by assuming that in the Caenorhabditis           construction of these modular proteins was the same as
lineage (characterized by a more compact genome) there             that of vertebrate genes, since there are several cases of
was greater selective pressure to eliminate introns than           invertebrate genes where the class 1-1 modules are
in the chordate lineage (characterized by a less compact           flanked by the original phase 1 introns (Patthy, 1994,
genome). Importantly, the gene of the C. elegans homo-             1995). It is noteworthy that modular proteins assembled
log of DCC/netrin receptor is dramatically more com-               from class 1-1 modules have already been found in
pact (4.7 kb) than the human homolog (1400 kb), and                sponges and hydrozoa, although only a tiny fraction of
it has fewer and shorter introns (Chan et al., 1996).              their genes have been sequenced so far. In contrast with
    In summary, since the genomic organization of a                this, there is practically no evidence for related modular
gene does not necessarily reflect the structure that existed       proteins in plants or fungi. In addition to metazoa, a
at the time of its formation, in many cases the original           few modular proteins homologous with class 1-1 mod-
structure can be reconstructed only through a complex              ules were found in some animal viruses and parasitic
analysis of the evolutionary history of the constituent            protozoa (Patthy, 1994, 1995). However, it seems likely
domains. So far, our analyses have identified more than            that these viruses and protists acquired the modular
five dozen class 1-1 module-types that have been used              proteins from their multicellular hosts by horizontal
to build clan 1 modular proteins by exon-shuffling. As             gene transfer; these proteins assist them in the invasion
has been discussed previously, there are much fewer                process. Thus the surface protein of P. falciparum con-
class 0-0 or class 2-2 modules (Patthy, 1994, 1995).               taining four EGF-domains facilitates infection by bind-
According to the modularization hypothesis, the expla-             ing to receptors on mosquito epithelial cells, the
nation for this is that the formation of class 1-1 modules         thrombospondin-homolog proteins of malarial parasites
is strongly favoured (Patthy, 1994, 1995, 1999).
                                                                   assist in their entry into hepatocytes by mediating their
                                                                   adherence to these cells, the vaccinia virus protein
                                                                   containing four complement B-type modules could
2.3. Evolutionary distribution of proteins produced by
                                                                   counteract host immune defences by interfering with the
                                                                   complement cascade. The evolutionary distribution of
                                                                   modular proteins that have clearly evolved by exon-
   Analysis of the evolutionary distribution of modular
                                                                   shuffling is thus consistent with the suggestion that exon-
proteins produced by exon-shuffling may permit a clear
                                                                   shuffling became significant at the time of metazoan
definition of the time of their formation, and thus may
provide an insight into changes in the significance of             radiation, parallel with the spread of large spliceoso-
exon-shuffling. Thanks to various genome projects,                 mal introns.
sufficient amount of sequence information has already                 Our analysis has also shown that the vast majority
accumulated on plants, protists, fungi and diverse                 of metazoan modular proteins produced by exon-
groups of animals to perform a meaningful analysis.                shuffling is extracellular or they are extracellular parts
   Using the collection of modules that were clearly               of membrane-associated proteins (Patthy, 1995). A
spread by exon-shuffling, we have searched databases to            major group of these modular proteins consists of
define the evolutionary distribution of modular proteins           various constituents of the extracellular matrix ( lami-
that were constructed from these modules. In the present           nins, modular collagens, fibronectin, etc.), modular
work, modular proteins produced by exon-shuffling were             metalloproteinases involved in the remodelling of the
defined as those which contain at least one of the                 extracellular matrix (e.g., type IV collagenases, bone
modules spread by exon-shuffling and this module is                morphogenetic proteins). Another large group of modu-
joined to at least another protein-domain. Note that               lar proteins contains a variety of membrane associated
according to this definition, a protein consisting of a            or transmembrane proteins with extracellular parts con-
single module and a membrane anchoring segment                     structed from modules: receptor tyrosine kinases, recep-
and/or signal peptide is not counted as a modular                  tor tyrosine phosphatases, proteins involved in cell–cell
protein, but is considered to represent the protomodule            or cell–matrix interactions, complement receptors, LDL-
stage of the modularization pathway (Patthy, 1994,                 receptor, cytokine receptors, etc. A third major group
1995, 1999).                                                       of modular extracellular proteins comprises various
   The search for modular proteins has brought many                plasma proteins: the modular proteases of the blood
examples from all major groups of metazoa. The fact                coagulation, fibrinolytic and complement cascades, the
110                                          L. Patthy / Gene 238 (1999) 103–114

non-protease factors regulating blood coagulation and            the C. elegans proteins found in current databases were
complement activation, immunoglobulins, etc.                     predicted with a rather high error rate.)
   In addition to these extracellular or membrane-associ-            In principle, differences in the recurrence of different
ated modular proteins, there are some intracellular              modules-types may reflect differences in the time of the
modular proteins that have also evolved by exon-                 appearance of the module (those protomodules that
shuffling in the metazoan lineage (Patthy, 1995). For            were formed earlier had more time to spread by exon-
example, there is clear evidence that myomesin/skelemin          shuffling), or may reflect differences in the chances of
of the contractile apparatus has been constructed by             the survival of the newly formed modular protein-coding
exon-shuffling from class 1-1 immunoglobulin and                 gene. Survival of the newly formed gene requires that
fibronectin type III modules. In the gene of this protein,       the protein it encodes should be able to fold efficiently
the start and end of each immunoglobulin and fibronec-           into a stable multidomain protein, therefore folding-
tin type III domain are still defined by phase 1 introns         efficiency of the protein-module is a critical aspect of its
(Steiner et al., 1999).                                          evolutionary success. The significance of this require-
   There seem to be striking differences in the distribu-        ment may be illustrated by the fact that the majority of
tion of modular proteins in different metazoan lineages.         modules used in the construction of multidomain pro-
Although modular receptor tyrosine kinases have                  teins display remarkable folding autonomy: the isolated
already been found in most metazoan lineages, including          domains can fold very efficiently (Patthy, 1993).
sponges (Schäcke et al., 1994) and there is molecular           Obviously, folding autonomy of domains in multido-
evidence for the presence of laminins from most meta-            main proteins is of utmost importance since this mini-
zoan phyla including Hydrozoa (Sarras et al., 1994),             mizes the influence of neighboring domains and ensures
there are modular proteins that appear to be restricted          stability in the extracellular environment. Folding auton-
to certain evolutionary lineages. For example, the rich          omy can ensure that folding of the domain is not
variety of modular proteases of blood coagulation,               deranged when inserted into a novel protein environ-
fibrinolysis, complement activation and other extracellu-        ment. It thus seems likely that the most widely used
lar proteolytic cascades (Fig. 3.) have probably been            modules have been selected for folding autonomy and
formed in the chordate lineage, since proteases with a           fold stability.
similar domain organization are missing from non-                    It is interesting to note in this respect that the modules
chordate genomes, including the completely sequenced             used most frequently in the construction of modular
genome of C. elegans. Conversely, most basic constitu-           proteins are not random representatives of the protein
ents of the extracellular matrix, the majority of proteins       universe. Analysis of protein structures of the
involved in cell–cell or cell–matrix interactions are pre-       Brookhaven Protein Databank (Martin et al., 1998)
sent in vertebrates as well as in arthropods and nema-           shows that about half of the non-homologous protein
todes, suggesting that they were formed prior to the             families belong to the ab class, 25% belong to the mainly
divergence of these lineages. It is worth pointing out           a class, about another 25% is mainly b (Fig. 5A).
that the genes of modular proteins that appear to be             Structural classification of the class 1-1 modules shows
restricted to the chordate lineage provide the most              a significantly different distribution: more than 60%
convincing examples of a correlation between domain              belong to the mainly b class and only about 10% are
organization of a modular protein and exon–intron                mainly a ( Fig. 5B). If the differences in recurrence
structure of their gene. It seems likely that this is due        ( Fig. 4) of modules is also taken into account, there is
to the fact that these genes are relatively young and that       a further significant shift in favour of modules belonging
in the chordate lineage there was no significant pressure        to the mainly b class (Fig. 5C ), suggesting that structural
to eliminate their introns.                                      features (folding autonomy and stability) play a signifi-
   It seems possible that many of the modular proteins           cant role in controlling the spread of module-types. The
found so far only in the C. elegans genome have been             fact that the structural distribution of disulphide-bonded
formed in the nematode lineage. Even though we cannot            extracellular domains are in general shifted towards the
exclude the possibility that similar modular proteins will       mainly b proteins (Martin et al., 1998) indicates that
be found in other genomes, it seems to be clear that             such proteins are more stable in the extracellular envi-
there are significant differences in the population of           ronment than other fold classes. It is noteworthy in this
modular proteins of vertebrates vs. C. elegans. This             respect that as a reflection of their adaptation to the
point may be illustrated by the fact that different class        oxidative milieu of the extracytoplasmic space, many of
1-1 modules are not used with equal probability in these         the class 1-1 modules are rich in disulfide bonds.
two groups (Fig. 4.). Although in both groups the EGF-               In summary, the clearcut examples of proteins assem-
like module is found in the highest percentage of modu-          bled by exon-shuffling from modules are restricted to
lar proteins, there are also striking differences in the         metazoa, with practically no counterparts in prokary-
recurrence of class 1-1 modules (Fig. 4). ( The results of       otes, plants, protists or fungi. This observation is consis-
this analysis must be treated with some caution, since           tent with our earlier suggestion that the exon-shuffling
L. Patthy / Gene 238 (1999) 103–114

Fig. 3. Architecture of some modular proteases of the blood coagulation, fibrinolysis, complement activation cascades and other modular members of the trypsin-family. The SERPRO box
represents the trypsin-homolog serine protease domain, the colored boxes represent the various class 1-1 modules from which these proteases were assembled: kringle-module ( K ), epidermal
growth factor-module (G ), finger-module (F ), fibronectin type II module (FN2), preactivation peptide module (PAP), scavenger receptor module (SC ), vitamin K-dependent calcium-binding
module (C ), contact factor module (CF ), complement B-type module (B), complement C1r/C1s module (CUB), von Willebrand type A module (vWA), LDL-receptor module (LDL), meprin-
module (MAM ), frizzled receptor module (FRZ ). The black boxes indicate signal peptide domains, vertical black bars represent transmembrane domains. The boxes designating the different
domains are drawn to scale.
112                                                     L. Patthy / Gene 238 (1999) 103–114

                                                                            2.4. Significance of exon-shuffling in metazoan evolution

                                                                                It must be emphasized that most modular proteins
                                                                            produced by exon-shuffling are associated with, and are
                                                                            absolutely essential for, multicellularity of metazoa. For
                                                                            example, the appearance of the constituents of extracel-
                                                                            lular matrix is inextricably linked to the appearance of
                                                                            the first multicellular animals. The constituents of the
                                                                            extracellular matrix, membrane-associated proteins
                                                                            mediating cell–cell and cell–matrix interactions, recep-
                                                                            tor-proteins regulating cell–cell communications (such
                                                                            as receptor tyrosine kinases, receptor tyrosine phospha-
                                                                            tases) are of absolute importance in permitting cells to
                                                                            function in an integrated fashion.
                                                                                The fact that the overwhelming majority of the
                                                                            constituents of the extracellular matrix, cell adhesion
                                                                            proteins, receptor proteins were constructed from mod-
                                                                            ules underlines the extreme importance of exon-shuffling
                                                                            in metazoan evolution. As outlined above, this powerful
                                                                            evolutionary mechanism has become significant rela-
Fig. 4. Recurrence of the most frequently used class 1-1 modules in a
                                                                            tively late during evolution at the time of metazoan
database of vertebrate and C. elegans modular proteins. In this data-       radiation, parallel with a decrease in genome compact-
base, modular proteins with the same domain organization were               ness and parallel with the evolution of spliceosomal
counted as a single entry irrespective of the number of representatives     introns. It is interesting in this respect that the rise of
from different species or from the same species. Recurrence is defined      exon-shuffling seems to coincide with a spectacular burst
as the percentage of modular proteins in which the given domain is
present, irrespective of the number of repeats of that module.
                                                                            of evolutionary creativity at the time of metazoan radia-
Abbreviation of modules: G, epidermal growth-factor module; Ig,             tion. It seems probable that the rise of exon-shuffling
immunoglobulin module; FN3, fibronect type III module; B, comple-           has contributed significantly to this accelerated evolution
ment B-type module; LN, C-type lectin-module; TSP, thrombospondin           of metazoa.
module; LDL, LDL-receptor module; vWC, von Willebrand type C                    In summary, it seems that the evolution of less
module; K, kringle-module; vWA, von Willebrand type A module; SC,
scavenger receptor module; LMG, globular module of laminin A;
                                                                            compact genomes, the evolution and spread of spliceoso-
LMB, EGF-like module of laminin B; CUB, complement C1r/C1s                  mal introns, the creation of mobile modules have
module; INH, Kunitz-type inhibitor module; CAD, cadherin module.            reached a critical point sometime before the Cambrian
                                                                            period, and this led to a dramatic increase in the
machinery ( large spliceosomal pre-mRNA introns, pro-                       efficiency of modular protein evolution by exon-
tomodules) appeared relatively late during evolution                        shuffling. Increased efficiency of exon-shuffling was criti-
(Patthy, 1994) parallel with the appearance of less                         cal for the rapid creation of diverse multidomain pro-
compact genomes. The importance of the accumulation                         teins that are essential for multicellularity of metazoa.
of a critical mass of module-types is illustrated by the
fact that most of the modular proteins were assembled
from some five-dozen class 1-1 modules.                                     References

                                                                            Bassett Jr., D.E., Basrai, M.A., Connelly, C., Hyland, K.M., Kita-
                                                                               gawa, K., et al., 1996. Exploiting the complete yeast genome
                                                                               sequence. Curr. Opin. Genet. Dev. 6, 763–766.
                                                                            Belfort, M., 1991. Self-splicing introns in prokaryotes: migrant fossils?
                                                                               Cell 11, 9–11.
                                                                            Belfort, M., 1993. An expanding universe of introns. Science 262,
                                                                            Bernardi, G., 1993. The isochore organization of the human genome
                                                                               and its evolutionary history — a review. Gene 135, 57–66.
                                                                            Blattner, F.R., Plunkett III, G., Bloch, C.A., Perna, N.T., Burland,
Fig. 5. (A) Structural classification of non-homologous protein fami-          V., et al., 1997. The complete genome sequence of Escherichia coli
lies in the PDB (Martin et al., 1998). (B) Structural classification of        K-12. Science 277, 1453–1462.
non-homologous class 1-1 modules. (C ) Recurrence of class 1-1 mod-         Blaxter, M., 1998. Caenorhabditis elegans is a nematode. Science 282,
ules belonging to the three different structural classes in modular pro-       2041–2046.
teins of vertebrates. The colors define the class: red, mainly a; green,    Blumenthal, T., Spieth, J., 1996. Gene structure and organization in
mainly b; yellow, mixed ab.                                                    Caenorhabditis elegans. Curr. Opin. Genet. Dev. 6, 692–698.
L. Patthy / Gene 238 (1999) 103–114                                                      113

Bult, C.J., White, O., Olsen, G.J., Zhou, L., Fleischmann, R.D., et al.,     Fraser, C.M., Casjens, S., Huang, W.M., Sutton, G.G., Clayton, R.,
   1996. Complete genome sequence of the methanogenic Archaeon,                 et al., 1997. Genomic sequence of a Lyme disease spirochaete, Bor-
   Methanococcus jannaschii. Science 273, 1058–1073.                            relia burgdorferi. Nature 390, 580–586.
Cavalier-Smith, T., 1985. Selfish DNA and the origin of introns.             Gardner, M.J., Tettelin, H., Carucci, D.J., Cummings, L.M., Aravind,
   Nature 315, 283–284.                                                         L., et al., 1998. Chromosome 2 sequence of the human malaria
Cavalier-Smith, T., 1991. Intron-phylogeny: a new hypothesis. Trends            parasite Plasmodium falciparum. Science 282, 1126–1132.
   Genet. 7, 145–148.                                                        Gilbert, W., 1978. Why genes in pieces? Nature 271, 501.
Cech, T.R., 1985. Self-splicing RNA: implications for evolution. Int.        Gilbert, W., 1986. The RNA world. Nature 319, 618.
   Rev. Cytol. 93, 3–22.                                                     Gilbert, W., Glynias, M., 1993. On the ancient nature of introns. Gene
Cech, T.R., 1986. The generality of self-splicing RNA: relationship to          135, 137–144.
   nuclear mRNA splicing. Cell 44, 207–210.                                  Gow, C.H., Chang, H.Y., Lih, C.J., Chang, T.W., Hui, C.F., 1993.
Chalfie, M., 1998. The worm revealed. Nature 396, 620–621.                      Analysis of the Drosophila gene for the laminin B1 chain. DNA
Chan, S.S.Y., Zheng, H., Su, M.W., Wilk, R., Killeen, M.T.,                     Cell Biol. 12, 573–587.
   Hedgecock, E.M., Culotti, J.G., 1996. UNC-40, a C.elegans homo-           Grivell, L.A., 1994. Invasive introns. Curr. Biol. 4, 161–164.
   log of DCC (deleted in colorectal cancer), is required in motile cells    Hebsgaard, S.M., Korning, P.G., Tolstrup, N., Engelbrecht, J., Rouzé,
   responding to UNC-6 netrin cues. Cell 87, 187–195.                           P., et al., 1996. Splice site prediction in Arabidopsis thaliana pre-
de Chateau, M., Bjorck, L., 1994. Protein PAB, a mosaic albumin-                mRNA by combining local and global sequence information.
   binding bacterial protein representing the first contemporary exam-          Nucleic Acids Res. 24, 3439–3452.
   ple of module shuffling. J. Biol. Chem. 269, 12147–12151.                 Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.-C., et al.,
de Chateau, M., Bjorck, L., 1996. Identification of interdomain                 1996. Complete sequence analysis of.the genome of the bacterium
   sequences promoting the intronless evolution of a bacterial protein          Mycoplasma pneumoniae. Nucleic Acis Res. 24, 4420–4449.
   family. Proc. Natl. Acad. Sci., USA 93, 8490–8495.                        Himmelreich, R., Plagens, H., Hilbert, H., Reiner, B., Hermann, R.,
Chervitz, S.A., Aravind, L., Sherlock, G., Ball, C.A., Koonin, E.V.,            1997. Comparative analysis of the genomes of the bacteria Myco-
   Dwight, S.S., Harris, M.A., Dolinski, K., Mohr, S., Smith, T.,               plasma pneumoniae and Mycoplasma genitalium. Nucleic Acids Res.
   Weng, S., Cherry, J.M., Botstein, D., 1998. Comparison of the                25, 701–712.
   complete protein sets of worm and yeast: orthology and divergence.        Jacquier, A., 1990. Self-splicing group II and nuclear pre-mRNA
   Science 282, 2022–2028.
                                                                                introns: how similar are they? Trends Biochem. Sci. 15, 351–354.
Chi, H.C., Juminaga, D., Wang, S.Y., Wang, S.Y., Hui, C.F., 1991.
                                                                             Kallunki, T., Ikonen, J., Chow, L.T., Kallunki, P., Tryggvason, K.,
   Structure of the Drosophila gene for the Laminin B2 chain. DNA
                                                                                1991. Structure of the human laminin B2 chain gene reveals extens-
   Cell. Biol. 10, 451–466.
                                                                                ive divergence from the laminin B1 chain gene. J. Biol. Chem.
Childs, S.R., O’Connor, M.B., 1994. Two domains of the tolloid pro-
                                                                                266, 221–228.
   tein contribute to its unusual genetic interaction with decapen-
                                                                             Klenk, H.P., Clayton, R.A., Tomb, J.F., White, O., Nelson, K.E.,
   taplegic. Dev. Biol. 162, 209–220.
                                                                                et al., 1997. The complete genome sequence of the hyperthermophi-
Cho, K.R., Oliner, J.D., Simons, J.W., Hedrick, L., Fearon, L.,
                                                                                lic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature
   Fearon, E.R., Presinger, A.C., Hedge, P., Silverman, G.A.,
                                                                                390, 364–370.
   Vogelstein, B., 1994. The DCC gene: structural analysis and mut-
                                                                             Kunst, F., Ogasawara, N., Moszer, I., Albertini, A.M., Alloni, G.,
   ations in colorectal carcinomas. Genomics 19, 525–531.
                                                                                et al., 1997. The complete genome sequence of the Gram-positive
Clayton, R.A., White, O., Ketchum, K.A., Venter, J.C., 1997. The first
                                                                                bacterium Bacillus subtilis. Nature 390, 248–255.
   genome from the third domain of life. Nature 387, 459–462.
                                                                             Kusche-Gullberg, M., Garrison, K., MacKrell, A.J., Fessler, L.I., Fes-
Copertino, D.W., Hallick, R.B., 1993. Group II and group III introns
   of twintrons: potential relationships with nuclear pre-mRNA                  sler, J.H., 1992. Laminin A chain: expression during Drosophila
   introns. Trends Biochem. Sci. 18, 467–471.                                   development and genomic sequence. EMBO J. 11, 4519–4527.
Crick, F., 1979. Split genes and RNA splicing. Science 204, 264–271.         Lambowitz, AM., Belfort, M., 1989. Infectious introns. Cell 56,
Darnell, J.E., 1978. Implications of RNA.RNA splicing in evolution              323–326.
   of eukaryotic cells. Science 202, 1257–1260.                              Lambowitz, AM., 1993. Introns as mobile genetic elements. Annu.
Darnell, J.E., Doolittle, W.F., 1986. Speculations on the early course          Rev. Biochem. 62, 587–622.
   of evolution. Proc. Natl. Acad. Sci. USA 83, 1271–1275.                   Lhomond, G., Chiglione, C., Lepage, T., Gache, C., 1996. Structure
Doolittle, W.F., 1978. Genes in pieces: were they ever together? Nature         of the gene encoding the sea urchin blastula protease 10 (BP10),
   272, 581–582.                                                                a member of the astacin family of Zn2+ metalloproteases. Eur.
Dorit, R.L., Schoenbacher, L., Gilbert, W., 1990. How big is the uni-           J. Biochem. 238, 744–751.
   verse of exons? Science 250, 1342–1382.                                   Logsdon Jr., J.M., 1991. The recent origins of spliceosomal introns
Dujon, B., 1989. Group I introns as mobile genetic elements: facts and          revisited. Curr. Opin. Genet. Dev 8, 637–648.
   mechanistic speculations. Gene 82, 91–114.                                Martin, A.C.R., Orengo, C.A., Hutchinson, E.G., Jones, S., Karmint-
Dujon, B., 1997. The yeast genome project: what did we learn? Trend             zou, M., Laskowski, R.A., Mitchell, J.B.O., Taroni, C., Thornton,
   Genet. 12, 263–269.                                                          J.M., 1998. Protein folds and functions. Structure 6, 875–884.
Duret, L., Mouchiroud, D., Gautier, C., 1995. Statistical analysis of        McVey, J.H., Nomura, S., Kelly, P., Mason, I.J., Hogan, B.L.M., 1988.
   vertebrate sequences reveals that long genes are scarce in GC-rich           Characterization of the mouse SPARC/Osteonectin gene. J. Biol.
   isochores. J. Mol. Evol. 40, 308–317.                                        Chem. 263, 11111–11116.
Elgar, G., Sandford, R., Aparicio, S., Macrae, A., Venkatesh, B., et al.,    Morl, M., Schmelzer, C., 1990. Integration of group II intron bl1 into
   1996. Small is beautiful: comparative genomics with the pufferfish           a foreign RNA by reversal of the self-splicing reaction in vitro. Cell
   (Fugu rubripes). Trends Genet. 12, 145–150.                                  60, 629–636.
Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness,          Mouchiroud, D., D’Onofrio, G., Assani, B., Macaya, G., Gautier, C.,
   E.F., 1995. Whole-genome random sequencing and assembly of                   Bernardi, G., 1991. The distribution of genes in the human genome.
   Haemophilus influenzae Rd. Science 269, 496–512.                             Gene 100, 181–187.
Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A.,          Mueller, M.W., Allmaier, M., Eskes, R., Schweyen, R.J., 1993. Trans-
   et al., 1995. The minimal gene complement of Mycoplasma geni-                position of group II intron aI1 in yeast and invasion of mithochon-
   talium. Science 270, 397–403.                                                drial genes at new locations. Nature 366, 174–176.
114                                                    L. Patthy / Gene 238 (1999) 103–114

Oliver, S.G., 1997. From DNA sequence to biological function. Nature       Sarras Jr., M.P., Yan, L., Grens, A., Zhang, X., Agbas, A., Huff, J.K.,
   379, 597–600.                                                               St. John, P.L., Abrahamson, D.R., 1994. Cloning and biological
Oohashi, T., Ueki, Y., Sugimoto, M., Ninomiya, Y., 1995. Isolation             function of laminin in Hydra vulgaris. Dev. Biol. 164, 312–324.
   and structure of the COL4A6 gene encoding the human a6(IV )             Schäcke, H., Rinkevich, B., Gamulin, V., Müller, I.M., Müller,
   collagen chain and comparison with other type IV collagen genes.            W.E.G., 1994. Immunoglobulin-like domain is present in the extra-
   J. Biol. Chem. 270, 26863–26867.                                            cellular part of the receptor tyrosine kinase from the marine sponge.
Orgel, L.E., Crick, F.H.C., 1980. Selfish DNA: the ultimate parasite.          J. Mol. Recogn. 7, 272–276.
   Nature 284, 604–607.                                                    Schwarzbauer, J.E., Spencer, C.S., 1993. The Caenorhabditis elegans
Palmer, J.D., Logsdon Jr., J.M., 1991. The recent origins of introns.          homologue of the extracellular calcium binding protein SPARC/
   Curr. Opin. Genet. Dev. 1, 470–477.                                         Osteonectin affects nematode body morphology and mobility. Mol.
Patthy, L., 1985. Evolution of the proteases of blood coagulation and          Biol. Cell. 4, 941–952.
   fibrinolysis by assembly from modules. Cell 41, 657–663.                Sharp, P.A., 1985. On the origin of RNA splicing and introns. Cell
Patthy, L., 1987. Intron-dependent evolution: preferred types of exons         42, 397–400.
   and introns. FEBS Lett. 214, 1–7.                                       Sharp, P.A., 1994. Split genes and RNA splicing. Cell 77, 805–815.
Patthy, L., 1988. Detecting distant homologies of mosaic proteins.         Sibley, M.H., Johnson, J.J., Mello, C.C., Kramer, J.M., 1993. Genetic
   Analysis of the sequences of thrombomodulin, thrombospondin,                identification, sequence and alternative splicing of the Caenorhab-
   complement components C9, C8a and C8b, vitronectin and plasma               ditis elegans alpha 2(IV ) collagen gene. J. Cell Biol. 123, 255–264.
   cell membrane glycoprotein PC-1. J. Mol. Biol. 202, 689–696.            Simmen, M.W., Leitgeb, S., Clark, V.H., Jones, S.J.M., Bird, A., 1998.
Patthy, L., 1991a. Exons — original building blocks of proteins? BioEs-        Gene number in an invertebrate chordate, Ciona intestinalis. Proc.
   say 13, 187–192.                                                            Natl. Acad. Sci. USA 95, 4437–4440.
Patthy, L., 1991b. Modular exchange principles in proteins. Curr.          Steiner, F., Weber, K., Furst, D.O., 1999. M band proteins myomesin
   Opin. Struct. Biol. 1, 351–361.                                             and skelemin are encoded by the same gene: analysis of its organiza-
Patthy, L., 1993. Modular design of proteases of coagulation, fibrinol-        tion and expression. Genomics 56, 78–89.
   ysis and complement activation: implications for protein engineer-      Takahara, K., Lee, S., Wood, S., Greenspan, D.S., 1995. Structural
   ing and structure function studies. Methods Enzymol. 222, 10–21.            organization and genetic localization of the human bone morpho-
Patthy, L., 1994. Exons and introns. Curr. Opin. Struct. Biol. 4,              genetic protein 1/mamalian tolloid gene. Genomics 29, 9–15.
   383–392.                                                                The C. elegans Sequencing Consortium, 1998. Genome sequence of
Patthy, L., 1995. Protein Evolution by Exon-Shuffling. Molecular Biol-         the nematode C. elegans: a platform for investigating biology. Sci-
   ogy Intelligence Unit, R.G. Landes Company/Springer, New York.              ence 282, 2012–2018.
Patthy, L., 1996. Exon shuffling and other ways of module exchange.        The EU Arabidopsis Genome Project, 1998. Analysis of 1.9 Mb of
   Matrix Biol. 15, 301–310.                                                   contiguous sequence from chromosome 4 of Arabidopsis thaliana.
Patthy, L., 1999. Protein Evolution. Blackwell Science, Oxford.                Nature 391, 485–488.
Perlman, P.S., Butow, R.A., 1989. Mobile introns and intron-encoded        Vuolteenaho, R., Chow, L.T., Tryggvason, K., 1990. Structure of the
   proteins. Science 246, 1106–1109.                                           human laminin B1 chain gene. J. Biol. Chem. 265, 15611–15616.
Rubin, G.M., 1998. The Drosophila Genome Project: a progress report.       Yeo, G.S.H., Elgar, G., Sandford, R., Brenner, S., 1997. Cloning and
   Trends Genet. 14, 340–343.                                                  sequencing of complement component C9 and its linkage to DOC-2
Ruvkun, G., Hobert, O., 1998. The taxonomy of developmental con-               in the pufferfish Fugu rubripes. Gene 200, 203–211.
   trol in Caenorhabditis elegans. Science 282, 2033–2041.                 Zhang, X., Vuolteenanho, R., Tryggvason, K., 1996. Structure of the
Saldanha, R., Mohr, G., Belfort, M., Lambowitz, A.M., 1993. Group              human laminin a2-chain gene (LAMA2) which is affected in con-
   I and group II introns. FASEB J. 7, 15–24.                                  genital muscular dystrophy. J. Biol. Chem. 271, 27664–27669.
You can also read
Next slide ... Cancel