Alphy'2020: selected abstracts - UMR CNRS 5558 Laboratoire ...

Page created by Patrick Padilla
 
CONTINUE READING
Alphy’2020: selected abstracts
###################
Abstract 1 Rémi Allio
Contact : remi.allio@umontpellier.fr ISEM - CNRS
Title : Roadkill genomics: high quality mammalian genomes from hybrid assembly of short Illumina reads
and MinION long reads
Authors : Rémi Allio, Marie-Ka Tilak, Céline Scornavacca, Nico L. Avenant, Erwan Corre, Benoit Nabholz, and
Frédéric Delsuc

Obtaining genomic resources from mammalian wildlife can be difficult in practice but is becoming essential
for conservation. The thousands of fatalities due to car collisions with wildlife could potentially provide a
useful source material for genomics. To illustrate the potential of this underexploited resource, we sequenced
and assembled two reference genomes for two of the most frequently encountered mammalian roadkill in
South Africa: the bat-eared fox (Otocyon megalotis) and the aardwolf (Proteles cristatus). We developed a
protocol to extract DNA from roadkill samples suitable for Nanopore long reads sequencing using the MinION
device. We show that hybrid assembly of Illumina short reads and MiniON long reads results in genomes with
high contiguity and gene completeness. For both species, subspecies have been defined based on disjunct
distributions in Southern and Eastern Africa and morphological differences. To assess genetic diversity and
perform genome-scale species delimitation, additional individuals from disjunct South African and Tanzanian
populations of both species were resequenced using Illumina short reads. The genetic diversity of each
subspecies pair was estimated and compared with genetic diversity of well-established Carnivora species. Our
results support that the two subspecies of Proteles cristatus warrant species status whereas the two
subspecies of Otocyon megalotis do not. Obtaining high quality reference mammalian genomes opens the
way for large-scale population genomic studies of mammalian wildlife using (re)sequencing of samples
collected from roadkill.

###################
Abstract 3 Florian Benitiere
Contact : florian.benitiere@gmail.com LBBE
Title : The rate of alternative splicing in metazoans is driven by the mutation-selection-drift balance
Authors : Florian Benitiere, Anamaria Necsulea, Laurent Duret

Alternative splicing (AS) is a widespread process in eukaryotes, which contributes to expand the functional
repertoire of their genomes. However, AS is not always functional: a fraction of splice variants result from
errors of the splicing machinery, which are deleterious for the fitness of organisms. The question of the
relative proportion of splicing errors vs. functional splice variants remains highly debated. In humans, the
analysis of proteomic datasets revealed that a vast majority of splice variants correspond to erroneous
transcripts [Abascal et al. 2015, Tress et al. 2017] and it has been shown that the AS rate correlates negatively
with the strength of selection against splice-site mutations [Saudemont et al 2017]. This suggests that
variation in AS rate among genes essentially reflects the balance between selection (favoring efficient splice-
signals) and drift (allowing the fixation of sub-optimal mutations that increase the rate of splicing errors)
[Saudemont et al 2017]. If this interpretation is correct, then it is expected that the rate of splicing error
should be lower in species with larger effective population size (Ne), where selection is more efficient.
To test this prediction, we analyzed the relationship between the AS rate and Ne across metazoans. For this,
we selected 53 metazoan species (38 insects and 15 vertebrates) for which a complete genome assembly and
high-depth RNAseq datasets were available. We quantified alternative splicing in a set of 978 orthologous
genes, present in single copy in most metazoan genomes (BUSCO genes). AS rate was measured relative to
the canonical transcript, defined, in each species, as the one corresponding to the most abundant isoform. In
agreement with the mutation-selection-drift balance hypothesis, we observed a negative correlation
between AS rate and various proxies of Ne (longevity, adult body length, dN/dS ratio). Furthermore, some
preliminary observations suggest that the proportion of splice variants that are functional is higher in species
with large Ne. These results indicate the relative proportion of splicing errors vs. functional splice variants is
driven by the mutation-selection-drift balance.

Abascal et al. (2015). PLOS Computational Biology 11, e1004325.
Saudemont et (2017). Genome Biology 18, 208.
Tress et al. (2017). Trends in Biochemical Sciences 42, 98–110.

###################
Abstract 4 Marina Brasò Vives
Contact : marinabraso@gmail.com LBBE
Title : Copy number variation and fixed duplications among 198 Indian-origin rhesus macaques (Macaca
mulatta)
Authors : Marina Brasò-Vives, David Juan, Inna S. Povolotskaya, Belen Lorente-Galdos, Xavier Farré, Marcos
Fernandez-Callejo, Muthuswamy Raveendran, R. Alan Harris, Douglas L. Rosene, Diego A. Hartasànchez,
Arcadi Navarro, Jeffrey Rogers, Tomas Marques-Bonet

The rhesus macaque is an abundant species of Old World monkeys and a valuable model organism for
biomedical research due to its close phylogenetic relationship to humans. Copy number variation is one of
the main sources of genomic diversity within and between species and a widely recognized cause of inter-
individual differences in disease risk. However, copy number differences among rhesus macaques and
between the human and macaque genomes, as well as the relevance of this diversity to research involving
this nonhuman primate, remain understudied. Here we present a high-resolution map of sequence copy
number for the rhesus macaque genome constructed from a sample of 198 individuals. About one-eighth of
the rhesus macaque reference genome is composed of recently duplicated regions, either copy number
variable regions or fixed duplications. Comparison with human genomic copy number maps based on
previously published data shows that, despite overall similarities in the genome-wide distribution of these
regions, there exist specific differences at the chromosome level. Interestingly, we identify differences in copy
number behavior between human disease genes and their rhesus macaque orthologs. Our results highlight
the importance of addressing the number of copies of target genes in the design of experiments and cautions
against human-centered assumptions in research conducted with model organisms. Overall we present a
genome-wide copy number map from a large sample of rhesus macaque individuals representing an
important novel contribution concerning the evolution of copy number in primate genomes and its impact
on biomedical research conducted using key model organisms.

###################
Abstract 5 Romain BULTEAU
Contact : romain.bulteau@ens-lyon.fr LBMC (ENS de Lyon)
Title : RAPToR (Real Age Prediction from Transcriptome staging on Reference)
Authors : Romain BULTEAU, Mirko FRANCESCONI

Gene expression profiling has become standard practice and serves as the basis of many analyses in biology.
However, gene expression is highly variable during development. Unknown and unintended developmental
variation among samples can obscure and confound the effect of variables of interest in a study. This is
especially a problem for fast-developing organisms such as Caenorhabditis elegans, where drastic
developmental changes in gene expression occur within hours. The developmental speed of such organisms
is usually heavily influenced by numerous factors and therefore obtaining developmentally synchronized
samples is not trivial. Genome-wide gene expression is a very rich source of data, which includes information
about the developmental stage of an organism. However, methods that extract this information are still
lacking. We present RAPToR, a new tool to accurately predict individual samples' developmental age from
their gene expression profiles. We achieved this by building high-temporal-resolution time series across the
entire development from multiple available datasets, which we use as a reference to stage samples. Inferred
age can be used as a covariate in analyses and increase their power to detect differential expression by
including time-dependant effects. We also show multiple examples of how development impacts gene
expression studies in multiple organisms.

###################
Abstract 6 Martijn Callens
Contact : martijn.callens@cefe.cnrs.fr CNRS - CEFE
Title : Codon usage evolution of horizontally transferred genes in Pseudomonas aeruginosa.
Authors : Martijn Callens, Céline Scornavacca, Stéphanie Bedhomme

Prokaryote genome evolution is characterized by the frequent gain of genes through horizontal gene transfer
(HGT). When a gene is transferred from one species to another, it arrives in a genomic context different from
its original one. Consequently, efficient expression and integration of this new gene can be limited due to a
mismatch between the transferred gene and the expression machinery of the receiving organism. It is
commonly observed that different species use their own preferred subset of synonymous codons to encode
for the same amino acid, giving rise to species-specific codon usage bias (CUP). When the CUP of a transferred
gene strongly diverges from the CUP of the receiving organism, the speed and accuracy of protein translation
can be strongly affected. One mechanism to compensate for this is codon usage amelioration, whereby the
transferred gene evolves towards a CUP similar to that of the receiving organism. We investigated codon
usage amelioration using comparative genomics in the opportunistic pathogenic bacterium Pseudomonas
aeruginosa. To obtain estimates of the timing of HGT events, we use a reconciliation approach with presence-
absence patterns of accessory genes on the phylogeny of P. aeruginosa strains. This allowed us to investigate
the relation between the residence time of a gene within the species and its CUP, and estimate the magnitude
of amelioration processes after HGT. We observed that recently transferred genes exhibit a wide range of
CUP, including genes that are highly divergent from P. aeruginosa. Transferred genes that have been retained
for a longer time have an overall more similar CUP to P. aeruginosa, indicative of selection against a strongly
divergent CUP. However, some genes with a long residence time still have a CUP that is differentiated from
P. aeruginosa, showing that even after considerable evolutionary time full codon usage amelioration does
not necessarily occur.

###################
Abstract 7 Léopold Carron
Contact : leopold.carron@laposte.net LCQB
Title : Repeat element from the point of view of 3D chromosome folding
Authors : Leopold Carron, Julien Mozziconacci

Genetic information is encoded in DNA, a huge-size nucleotidic polymer. In order to understand DNA folding
mechanisms, an experimental technique is today available that quantifies distal genomic contacts. This high-
throughput chromosome conformation capture technique, called Hi-C, reveals 3D chromosome folding in the
nucleus. In recent years, Cournac & al show how repeat element position correlates with the 3D folding of
eucaryotes genome. In this presentation I would like to focus on how much you can predict the 3D folding
based on the genomic position of repeat element in human, mouse and aradipopsis thaliana.

###################
Abstract 8 Maxime Courcelle
Contact : maxime.courcelle@umontpellier.fr Institut des Sciences de l'Evolution de Montpellier
Title : Evolution of rodent Olfactory Receptors is governed by Phylogeny, Ecology and Functional
constraints
Authors : Courcelle Maxime, Douzery Emmanuel and Fabre Pierre-Henri

Olfactory receptor (OR) genes represent the largest multigenic family in mammalian genomes. They encode
proteins that bind environmental odorant molecules, and they have been categorized into a dozen of
conserved phylogenetic clusters. Surprisingly, the OR repertoire is extremely variable among species and is
subject to many gene duplication and losses. The expansion and contraction of the OR gene clusters have
been linked to niche adaptation in mammals. However, although they have mainly been studied on a large
(i.e. placental) taxonomic scale, a finer sampling is needed to better capture the mechanisms that drove the
evolution of the OR repertoire. Among placental mammals, rodents are well-suited for this task, as they
exhibit diverse life history traits and genomic data are available for each rodent major lineage. In this study,
published genomes of 53 rodent species were mined, and we retrieved more than 85,000 functional and
pseudogene OR sequences, and classified them into phylogenetic groups. We found a level of OR copy
number variation within rodents similar to the one previously described at the placental mammals level. This
sampling allowed us to demonstrate significant levels of phylogenetic inertia during the OR repertoire
evolution, such as shared gene families depletions or expansions within rodent subclades. Different groups
of OR subfamilies also displayed patterns of positive or negative covariation. Unexpectedly, we did not
observe significant convergent patterns in the repertoire of semi-aquatic rodents such as the castor and
coypu. However, a 10-fold expansion of the OR subfamily 14 was detected among the phylogenetically
divergent fossorial lineages of naked mole-rats (Bathyergidae) and mole rats (Spalacidae). This study
highlights how the diversity of the OR repertoire has evolved among rodents, shaped by phylogenetic
constraints, functional trade-offs, and species life history traits such as diet and lifestyle.

###################
Abstract 9 Corentin Dechaud
Contact : corentin.dechaud@ens-lyon.fr ENS de Lyon - IGFL
Title : Sex and the TEs: role of transposable elements in the control of sexual genes in teleost fish
Authors : Corentin Dechaud, Sho Miyake, Manfred Schartl, Jean-Nicolas Volff, Magali Naville

Teleost fishes show a high level of diversity affecting almost all facets of their biology. Their genomes also
contain many more families of transposable elements (TEs) than other vertebrates do. In this project, we
investigate the impact of TEs in fish diversification by focusing on sexual development, which appears
hypervariable in this clade. An example of such involvement of TEs has been described in the rice fish Oryzias
latipes, where sexual differentiation is under the control of the master gene Dmy that appeared by duplication
less than 20My ago. The transcription of Dmy is indeed controlled by Izanagi, a TE inserted in its promoter
region. Preliminary results were obtained from the transcriptome analysis of the male and female gonads of
four Oryzias species. The integrative analysis of gonad transcriptome data along with TE annotations allows
to systematically detect candidate cases of TE-regulated genes. Particularly, we identified a non-autonomous
transposon (~3000 copies in the O. latipes genome) that is found in the 5’ untranslated regions of 26 male-
biased genes and that carries a binding site for the transcription factor RFX2, which is involved in sexual
development in vertebrates. The systematic identification of such TE-derived regulatory sequences will allow
a better assessment of the role of TEs in the lability of the sexual development pathway in fish.

###################
Abstract 10 Clotilde Garrido
Contact : clotilde.garrido@ibpc.fr IBPC UMR7141
Title : Evidence Supporting an Antimicrobial Origin of Targeting Peptides, Computational Approches
Authors : Clotilde Garrido, Oliver D. Caspari, Yves Choquet, Francis-André Wollman, Ingrid Lafontaine

Targeting peptides (TPs) are cleavable N-terminal extensions involved in the subcellular targeting of most
nuclear encoded proteins localized in either the chloroplast or the mitochondria. Different TPs are very
divergent in primary sequence, but share a propensity to form amphiphilic alpha-helix stretches when in
contact with a membrane. In this respect they are very similar to a certain antimicrobial peptides (AMPs),
which are used by most eukaryotes as well as some bacteria to kill microbial antagonists. In a neat analogy to
post- import degradation of TPs, certain bacterial defense mechanisms against AMPs involve uptake and
intracellular destruction of the attacking peptide. Might TPs be the result of an arms-race involving AMPs
between the ancestors of mitochondria or chloroplasts and their hosts during the early phases of
endosymbiosis (Wollman 2016) In support of this hypothesis, we show that (i) AMPs and TPs share key
physico-chemical properties that set them apart from other classes of peptides, and that (ii) AMP peptide
sequences target Venus fluorescent protein to the chloroplast or the mitochondria when expressed in
Chlamydomonas reinhardtii. Reference: Wollman, F.-A. An antimicrobial origin of transit peptides accounts
for early endosymbiotic events. Traffic 17, 1322-1328 (2016).

###################
Abstract 11 Matthieu Haudiquet
Contact : matthieu.haudiquet@pasteur.fr Institut Pasteur
Title : The interplay between capsules and horizontal gene transfer shapes bacterial evolution
Authors : Matthieu Haudiquet, Olaya Rendueles, Eduardo PC Rocha

Capsules are the outermost layer of bacteria and are present across all major phyla. They are major virulence
factors and increase tolerance to antibiotics. Furthermore, capsules allow bacteria to colonize novel
environments and to withstand harsh conditions. Species encoding a capsular locus are associated with higher
rates of horizontal gene transfer (HGT) and homologous recombination (HR)[1], contradicting the current
paradigm that capsules act as a physical barrier to DNA exchanges. Moreover, the capsular locus is itself a
recombination hotspot, even if little is known about why and how it changes through time. We wish to
understand how the capsule affects and is affected by genetic exchanges. For this, we study Klebsiella
pneumoniae (Kpn), a capsulated enterobacteria that is an increasing health threat due to the ongoing gain of
antibiotics resistance genes and virulence factors through HGT. We identified mobile genetic elements (MGE),
like prophages or plasmids, in 4000 genomes and inferred HGT and HR events along the species tree. In
parallel, we predicted the polysaccharide structure of the capsule of these strains. We show that strains of
different serotypes, varying in sugar composition, are associated with different levels of HGT, reflected by
variations in terms of MGE content and HR rates. Hence, serotypes create clusters of strains that are not
necessarily monophyletic but exhibit more frequent within-cluster than between-cluster genetic exchanges.
Our experimental work suggests that phage predation drives the loss of the capsule. Its re-acquisition by HGT
may shuffle the position of the strain in terms of genetic exchanges, by re-wiring it preferentially with the
cluster associated with the novel serotype. Our results suggest that such shuffling is not random and depends
on the capsule composition. We thus propose that the capsule and its evolution play a complex role in
mediating DNA exchanges across a species. These findings have implications to understand the genetic
structure of bacteria, more than half of which encode capsules, and the role of the interplay between capsule
and genetic exchanges in the evolution of virulence and resistance.
[1] Rendueles et al. PLoS Genetics, 14(12).

###################
Abstract 12 Jos Käfer
Contact : jos.kafer@univ-lyon1.fr LBBE - UMR 5558, CNRS & Univ. Claude Bernard Lyon 1
Title : Detecting sex-linkage in DNA/RNA samples from species from natural populations
Authors : Jos Käfer

Despite progress in sequencing and genome assembly, sex chromosomes are still difficult to study in non-
model species. In plants, for example, among the ~15000 species having separate sexes, reliable sex-linked
sequences are available for a handful off species. In order to detect sex-linkage in almost any species, whether
or not a reference genome is available, I've developed SD-pop, a statistical framework that is applicable to
samples collected in natural populations. I will illustrate the method using some of its current applications.

###################
Abstract 13 Ingrid Lafontaine
Contact : ingrid.lafontaine@ibpc.fr Institut de Biologie Physico-Chimique, Paris
Title : Additional layer of regulation via convergent gene orientation in yeasts
Authors : Jules Gilet, Romain Conte, Claire Torchet, Lionel Benard and Ingrid Lafontaine

Convergent gene pairs can produce transcripts with complementary sequences. We had shown that mRNA
duplexes form in vivo in Saccharomyces cerevisiae via interactions of mRNA overlapping 3’-ends and can lead
to post-transcriptional regulatory events. Here we show that mRNA duplex formation is restricted to
convergent genes separated by short intergenic distance, independently of their 3’-UTR length. We disclose
an enrichment in genes involved in biological processes related to stress among these convergent genes. They
are markedly conserved in convergent orientation in budding yeasts, meaning that this mode of post-
transcriptional regulation could be shared in these organisms, conferring an additional level for modulating
stress response. We thus investigated the mechanistic advantages potentially conferred by 3’-UTR mRNA
interactions. Analysis of genome-wide transcriptome data revealed that Pat1 and Lsm1 factors, having 3’-UTR
binding preference and participating to the remodeling of messenger ribonucleoprotein particles, bind
differently these messenger interacting mRNAs (mimRNAs) forming duplexes in comparison to mRNAs that
do not interact (solo mRNAs). Functionally, mimRNAs show limited translational repression upon stress. We
thus propose that mRNA duplex formation modulates the regulation of mRNA expression by limiting their
access to translational repressors. Our results thus show that post-transcriptional regulation is an additional
factor that determines the order of coding genes.

###################
Abstract 14 Nicolas Lartillot
Contact : nicolas.lartillot@univ-lyon1.fr LBBE UMR 5558
Title : Mechanistic codon models for detecting adaptation
Authors : Nicolas Lartillot, Thibault Latrille, Nicolas Rodrigue

Codon substitution models have traditionally attempted to uncover signatures of adaptation within protein-
coding genes by contrasting the rates of synonymous and non-synonymous substitutions, with a dN/dS>1
considered as evidence of positive selection. However, protein coding sequences are in fact under a mixture
of positive and negative selection: even at the level of a single site, not all amino-acid mutations are expected
to be adaptive. Another modeling approach, known as the mutation-selection framework, attempts to
explicitly account for purifying selection at the amino acid level. By identifying the concept of nearly-neutral
evolution with that of mutation-selection balance under a fixed fitness landscape, this approach can be used
as a null model for the detection of adaptation, now characterized as an upward deviation of the dN/dS from
the prediction under the nearly-neutral model. Here, the OrthoMam database was used to conduct an
exome-wide characterization of adaptation across placental mammals, using both classical and mutation-
selection codon models. A large fraction of the genes are detected to be under adaptation by one or the other
approach. Gene enrichment analyses show immunity as a major function associated with ongoing adaptation
over large evolutionary scales. Interestingly, genes specifically detected by the mutation-selection approach,
but not by classical site-models, are enriched in genes encoded in the nucleus but targeted in mitochondria,
compatible with an already suggested run-away process of compensatory evolution occurring between the
two genomic compartments.

###################
Abstract 15 Hugo Menet
Contact : hugo.menet@univ-lyon1.fr LBBE
Title : Multi-scale phylogenetic approaches for the evolution of the holobiont
Authors : Hugo Menet, Vincent Daubin, Eric Tannier

As we begin to comprehend the importance of microbiotes for animals' functionment, understanding the
evolution and coevolution of the entities forming the holobiont becomes an essential issue. To be specific,
we investigate how the macroscopic host, microscopic symbionts, and genes from both parts can coevolve or
not with each other. We rely on reconciliation, a general method to embed two phylogenetic trees,
developped in both gene species and host symbiont cophylogeny studies. Reconciliation can be used to infer
ancestral correspondences and evolutionary events, such as duplication, loss or horizontal transfer, evaluate
coevolution or the quality of a phylogeny. We present an integrative model for the coevolution of host,
symbionts and genes from both the host and the symbionts genomes. Our model is based on two intertwined
reconciliations, the lower gene/species being aware of the upper symbiont/host one. We take into account
events of duplication, host switch and loss for the host/symbiont level, and duplication, horizontal gene
transfer and loss for the gene/species level. And, following works on gene's domains/genes and genes/species
joint reconciliations, that showed that finding a most parsimonious coupled reconciliations is NP-hard with
certain constraints on transfers, we discriminate two kind of horizontal gene transfers depending on whether
the donor and receiver species are part of the same holobiont, and propose heuristics, based on a
probabilistic reconciliation framework, to compute the likelihood of our model. We evaluate our new
approach on simulated data and apply it to Cinara genus aphids to test previously proposed horizontal gene
transfers scenarios between co-occuring enterobacterial endosymbionts.

###################
Abstract 16 Guy Perrière
Contact : guy.perriere@univ-lyon1.fr LBBE
Title : Horizontal transfers and the evolution of magnetotaxis
Authors : C.L. Monteil, G. Perrière, C. Lefèvre

Magnetotaxis is a prokaryotic function allowing organisms to geolocate and navigate in aquatic sediments
thanks to the concomitant sensing of chemical gradients and their passive orientation along geomagnetic
field lines. For freshwater magnetotactic spirilla of the Alphaproteobacteria (MTBMag), magnetotaxis guides
them more efficiently towards the oxic-anoxic transition zone. All members described to date in this group
are phenotypically homogeneous and share the same narrow ecological niche: they all biomineralize the
same magnetic particle chains and have the same physiology and ultrastructural features. Initially affiliated
to one unique genus (i.e. Magnetospirillum), MTBMag seem to be genetically structured into two
evolutionary independent and divergent clusters. Their high degree of phenotypic similarity suggesting the
parallel evolution of magnetotaxis and metabolism, we tested such hypothesis at the whole genome level. By
reconciling species and gene trees including newly sequenced genomes of cultured Magnetospirillum related
bacteria, we looked for the processes that could contribute at shaping such an evolutionary pattern. We
showed that repeated horizontal gene transfers and homologous recombination of entire operons
contributed to the evolution of magnetotaxis in the twelve MTBMag studied. Our environmental data are
thus in line with previous observations in experimental evolution suggesting that recombination could
alleviate clonal interference and speed up adaptation under some circumstances. Such processes could
represent a more parsimonious and rapid solution for adaptation compared to independent and repeated de
novo mutations, especially in the case traits as complex as magnetotaxis involving tens of interacting proteins.

###################
Abstract 17 Lea Picard
Contact : lea.picard@ens-lyon.fr CIRI, LBBE
Title : DGINN, an automated and highly-flexible pipeline for the Detection of Genetic INNovations on
protein-coding genes
Authors : Lea Picard, Quentin Ganivet, Omran Allatif, Andrea Cimarelli, Laurent Guéguen, Lucie Etienne

Adaptive evolution has shaped major biological processes. Finding the protein-coding genes and the sites that
have been subjected to adaptation during evolutionary time is a major endeavor. However, very few methods
fully automate the identification of positively selected genes, and widespread sources of genetic innovations
as gene duplication and recombination are absent of most pipelines. Here, we developed DGINN, an highly-
flexible and public pipeline to Detect Genetic INNovations and adaptive evolution in protein-coding genes.
DGINN automates, from a gene’s sequence, all steps of the evolutionary analyses necessary to detect the
aforementioned innovations, including the search for homologues in databases, assignation of orthologous
groups, identification of duplication and recombination events, as well as detection of positive selection using
five different methods to increase precision and ranking of genes when a large panel is analyzed. DGINN was
validated on nineteen genes with previously-characterized evolutionary histories in primates, including some
engaged in host-pathogen arms-races. The results obtained with DGINN confirm or expand on highly hand-
curated results from the literature, establishing DGINN as an efficient tool to automatically detect genetic
innovations and adaptive evolution in diverse datasets, from the user’s gene of interest to a large gene list in
any species range.
###################
Abstract 18 Maxime Policarpo
Contact : policarpo@egce.cnrs-gif.fr EGCE - CNRS
Title : Machine learning-based estimation of relaxed selection and cave brotula genomes shed new light on
genes decay in subterranean vertebrates
Authors : Maxime Policarpo, Didier Casane

Most subterranean vertebrates show convergent regressive traits such as small eyes and reduced
pigmentation. In some cases, eyes and pigmentation are completely lost. In cavefishes, convergent losses of
the circadian rhythm were also observed. However, most genetic analyses of the evolution of these traits
have been performed on cave populations belonging to Astyanax mexicanus, which are very recently evolved
cavefish. Accordingly, very few gene losses were found in these cavefish. We sought to better understand the
extent, tempo, modalities and limits of gene losses in relation to regressive evolution by examining the
genomes of more ancient cavefishes (i.e. three Sinocyclocheilus spp. and two Lucifuga spp.). Cave settlement
and relaxed selection were estimated using several methods based on dn/ds ratio and an new approach
relying on machine learning-based estimations of the deleterious effect of non-synonymous mutations
implemented in MutPred2. Our analyses revealed the combined effects of the level of eye regression, time
and genome ploidy on the number of eye pseudogenes in cavefishes. Nevertheless, no eye genes with many
LoF mutations were found, in sharp contrast to highly degenerated eye genes identified in subterranean
mammals, indicating that eye degeneration in cavefishes is much more recent. Dating results suggest that
blindness in cavefishes for which genome are available is not very ancient, ranging from early to late
Pleistocene. Contrary to eye genes, In sharp contrast, most circadian clock and pigmentation genes appeared
under strong selection in all these cavefishes.

###################
Abstract 19 Léa PRADIER
Contact : leaemiliepradier@gmail.com Centre d'Ecologie fonctionnelle et Evolutive
Title : PlasForest: a homology-based random forest classifier for plasmid detection
Authors : Léa Pradier, Stéphanie Bedhomme

Plasmids are mobile genetic elements that often carry accessory genes, and allow the horizontal gene transfer
of such genes between bacterial genomes. The detection of plasmids in large sets of genomes is therefore
crucial to analyze the spread of adaptation across bacteria, especially in the study of antibiotic resistance. To
that end, several bioinformatics methods have been developed to detect plasmids in draft genomes and
metagenomes. However, they suffer either from low sensitivity (i.e., most plasmids remain undetected) or
low precision (i.e, these methods identify chromosomes as plasmids). Here, we present PlasForest, a
homology-based random forest classifier for identification of bacterial plasmid sequences in unassembled
sets of contigs. This tool is based on the research of homologies against a database of plasmid sequences,
which is refined with a random forest classifier to discriminate between plasmid and nonplasmid sequences.
Without any prior knowledge of the taxonomical origin of samples, PlasForest can identify contigs as plasmids
or nonplasmids with an accuracy of 98%. Notably, it is able to detect up to 67% of plasmid contigs under 1kb
with less than 19% of false positives; and up to 88% of plasmid contigs over 100kb with less than 0.5% of false
positives. Compared to other currently available tools, PlasForest demonstrated significantly better
performance on test datasets. We implemented this tool in a user-friendly pipeline able to identify plasmids
in large datasets in reasonable time.

###################
Abstract 20 Nathanaëlle Saclier
Contact : nathanaelle.saclier@gmail.com ISEM - LEHNA
Title : Naturally radioactive environments influence the rate and spectrum of mutation
Authors : Nathanaëlle Saclier, Patrick Chardon, Florian Malard, Lara Konecny-Dupré, David Eme, Arnaud
Bellec, Vincent Breton, Laurent Duret, Tristan Lefébure, Christophe J. Douady
All organisms on Earth are exposed to low doses of natural radioactivity but some habitats are more
radioactive than others. Yet, we know almost nothing about the influence of natural radioactivity on the
evolution of biodiversity. Here, we addressed whether organisms living in naturally more radioactive habitats
accumulate more mutations across generations using 14 species of waterlice living in subterranean habitats
with contrasted levels of radioactivity. We found a 35.74% increase of the nuclear and 60.48% of the
mitochondrial substitution rates for species living in naturally highly radioactive environments. We also found
a positive correlation between the level of radioactivity and the probability of G to T (and complementary C
to A) mutations, a hallmark of oxidative stress. These results suggest that low doses of radiation impact the
rates of evolution through oxidative damages, and in particular in the mitochondrial genome.

###################
Abstract 21 Guillaume Scholz
Contact : guillaume.scholz@lirmm.fr LIRMM, Montpellier.
Title : Phylo-kmers for the detection of recombinants in viruses
Authors : Guillaume Scholz, Benjamin Linard, Eric Rivals, Fabio Pardi

Viruses have a high genetic variability, coupled with a strong propensity to recombine. To reflect the
variability, distinct genomes of a same virus are often classified into groups known as ‘genotypes’ or ‘types’
or, at a finer resolution, ‘subtypes’. Many bioinformatic tools have been proposed for the problem of
determining the group to which a given genome or genomic fragment belongs. However, the results produced
by such methods are not always relevant or precise enough in case the genome analyzed is a recombinant
whose parental strains originate from different groups. To tackle this problem, methods such as jpHMM [1]
(for HBV, HCV and HIV) or Scueal [2] (for HIV) have been designed. These methods rely on complex statistical
models, and heavy computations which are not always practical for a quick scan of a large amount of
genomes. Our contribution to the problem of recognizing and partitioning inter-group recombinants takes
the form of a new algorithm called SHERPAS (Spotting Historical Events of Recombination in a Phylogeny via
Ancestral Sequences). It relies on the concept of a phylogenetically-informative k-mer (phylo k-mer for short),
recently introduced in the context of phylogenetic placement of metagenomic sequences. These phylo k-
mers are inferred from a reference tree and a reference alignment, and contain information about how likely
a k-mer (short nucleotide sequence of length k) is to be found in a sequence belonging to a given group (see
[3] for more on phylo k-mers). SHERPAS is able to process tens of thousands of genomic sequences in a matter
of minutes, while pre-existing methods would take several hours to days. Experiments on synthetic HIV and
HBV recombinants show that SHERPAS is able to detect inter-group recombinants, identify their parental
origins, and partition their sequences with limited loss of accuracy, thus making it an ideal tool for the initial
screening of large datasets.
References:
[1] Schultz, A.K., Zhang, M., Bulla, I., Leitner, T., Korber, B., Morgenstern, B., Stanke, M., 2009. jpHMM:
improving the reliability of recombination prediction in HIV-1. Nucleic Acids Research 37, W647-W651
[2] Kosakovsky Pond, S.L., Posada, D., Stawiski, E., Chappey, C., Poon, A.F., Hughes, G., Fearnhill, E., Gravenor,
M.B., Leigh Brown, A.J., Frost, S.D., 2009. An evolutionary model-based algorithm for accurate phylogenetic
breakpoint mapping and subtype prediction in HIV-1. PLoS Computational Biology 5, e1000581.
[3] Linard, B., Swenson, K., Pardi F., 2019. Rapid alignment-free phylogenetic identification of metagenomic
sequences. Bioinformatics 35 (18), 3303-3312

###################
Abstract 22 Xavier VEKEMANS
Contact : xavier.vekemans@univ-lille.fr University of Lille
Title : Genotyping and de novo discovery of allelic variants at the Brassicaceae self-incompatibility locus
from short read sequencing data
Authors : Mathieu Genete, Vincent Castric & Xavier Vekemans

Loci with extremely high levels of molecular polymorphism such as the self-incompatibility locus (S-locus) of
Brassicaceae have remained recalcitrant to typing with NGS technologies based on short reads, as they are
typically challenging to assemble de novo as well as to align to a given reference. Up to now, studies of the
allelic diversity at the S-locus in natural populations have relied on labor-intensive molecular cloning or BAC
library approaches. Due to the severe reduction of the cost of shotgun sequencing, obtaining raw reads from
individual genomes is now becoming possible on a large scale and our previous work has shown that such
data can be used to reliably genotype individual accessions from the 1001 genome project of Arabidopsis
thaliana at the S-locus (Tsuchimatsu et al. 2017, https://doi.org/10.1093/molbev/msx122). We have
developed a methodological approach based on a computationally optimized comparison of short Illumina
sequencing reads from genomic DNA to a database of known nucleotide sequences of the extracellular
domain of SRK (eSRK). By examining mapping patterns along the reference sequences, we obtain highly
reliable predictions of S-genotypes from individuals collected in natural populations of Arabidopsis halleri.
Furthermore, using a de novo assembly approach of the filtered short reads, we obtain full length sequences
of eSRK even when the initial sequence in the database was only partial, and we discover putative new SRK
alleles that were not initially present in the database. When including those new alleles in the reference
database, we were able to resolve the complete diploid SI genotypes of all individuals. Beyond the specific
case of Brassicaceae S-alleles, our approach can be readily applied to other polymorphic loci, given reference
allelic sequences are available.

###################
Abstract 23 Anthony Venon
Contact : anthony.venon@inra.fr INRA, Le Moulon, Gif-sur-Yvette
Title : Contribution des éléments transposables à l'évolution structurale du génome du pommier dans un
contexte de domestication
Authors : A.Venon, X.Chen, K.Alix, A.Cornille

Les éléments transposables (ou ET) sont des séquences d’ADN répétées du génome capables de se déplacer
de façon autonome, de se multiplier dans le génome et possède un fort potentiel mutagène. Présents chez
tous les génomes eucaryotes, les ET sont particulièrement abondants dans les génomes des plantes mais leur
rôle y est encore mal connu. Les ET interviendraient dans les mécanismes de défense en réponse à différents
stress environnementaux, et possèderaient plus généralement un rôle dans la régulation de l’expression des
gènes (ou réseaux de gènes). Chez les plantes cultivées, des études suggèrent d’ailleurs que les ET joueraient
un rôle clé dans l’expression de caractères d’intérêt agronomique importants pour l’amélioration variétale.
Chez le pommier cultivé, l’insertion d’un ET expliquerait ainsi les variations de taille et de couleur du fruit.
Néanmoins, l’implication des éléments transposables dans le processus de domestication des fruitiers est
encore mal connue. Des études basées sur des marqueurs génétiques ont montré que le pommier cultivé
(Malus domestica) a été domestiqué en Asie Centrale à partir du pommier sauvage asiatique M. sieversii. Le
pommier cultivé a ensuite été introduit en Europe par les Grecques il y a environ 1 500 ans via les Routes de
la Soie. Lors de cette introduction, des introgressions récurrentes impliquant les pommiers sauvages
caucasien et européen, Malus orientalis et Malus sylvestris, respectivement, auraient permis au pommier
cultivé de s’adapter à ses nouveaux environnements locaux. Des études récentes en biologie moléculaire ont
montré le rôle clé de certains ET dans l’expression de gènes impliqués dans le développement du fruit et de
la fleur chez les pommiers. Cependant, l’architecture génomique en ET du pommier cultivé (M. domestica) et
de ses apparentés sauvages (M. sieversii, M. orientalis, M. sylvestris) reste peu étudiée. L’objectif de notre
étude est de caractériser et quantifier la composition en ET des espèces sauvages M. sieversii, M. orientalis
et M. sylvestris, en comparaison de celle de l’espèce cultivée M. domestica. Nous avons développé un pipeline
bioinformatique d’annotation d’ET à partir de données de séquençage Illumina non assemblées de ces quatre
espèces afin de créer une base de données des ET pour ces espèces. D’autres espèces de pommiers plus
éloignées génétiquement ont aussi été intégrées à nos travaux pour étudier la composition en ET à l’échelle
du genre Malus, genre regroupant l’ensemble des espèces de pommiers. Les questions auxquelles nous
répondront sont les suivantes : La composition en éléments transposables est-elle similaire entre les
différentes espèces de pommiers ? Les proportions d’éléments transposables entre pommiers sauvages et
cultivés sont-elles les mêmes ? Quel(s) est (sont) le(s) rôle(s) des éléments transposables dans l’adaptation et
la domestication des pommiers ? Ces résultats contribueront à mieux connaître les bases génomiques de la
domestication et l’adaptation du pommier, nécessaire à une meilleure gestion des ressources génétiques en
amélioration variétale.
###################
Abstract 24 Yishu Wang
Contact : yishu.wang@univ-lyon1.fr Université Claude Bernard Lyon 1; CNRS UMR5558
Title : Quokka: eQUivalence Klasses enumerator fOr Kophylogeny Associations
Authors : Yishu Wang, Arnaud Mary, Marie-France Sagot, and Blerina Sinaimeri

Phylogenetic tree reconciliation is the method of choice in analysing host-symbiont systems. Despite the
many reconciliation tools that have been proposed in the literature, two main issues remain unresolved: (i)
listing suboptimal solutions (i.e. whose score is ‘close’ to the optimal ones) and (ii) listing only solutions that
are biologically different ‘enough’. The first issue arises because the optimal solutions are not always the ones
biologically most significant; providing many suboptimal solutions as alternatives for the optimal ones is thus
very useful. The second one is related to the difficulty to analyse an often huge number of optimal solutions.
We propose Quokka that addresses both of these problems in an efficient way. Furthermore, it includes a
solutions visualisation tool that significantly helps the user in the process of analysing the results. The source
code,      documentation,       and      binaries     for    all   platforms     are     freely     available    at
https://gitlab.inria.fr/yiswang/quokka.

###################
Abstract 25 Arthur Weyna
Contact : arthur.weyna@umontpellier.fr Institut des Sciences de l'Evoluton de Montpellier
Title : A model to detect and classify hybrids from single Illumina genomes
Authors : Arthur Weyna, Jonathan Romiguier & Nicolas Galtier.

Hybridization has a major role in the processes of speciation, diversification and adaptation, and has been
studied extensively. It has notably been proposed to be at the origin of some new species, thus promoting
diversificaton. Inversely, recurrent hybridization may be associated with homogenizing gene flow, which can
lower the speed of species divergence. The role of hybridization-associated introgression in maintaining
neutral, deleterious and adaptative genetic variance within population is a also wide research subject by itself.
With the increasing availability of molecular data, tools to detect, characterize, and study hybrid individuals
at the genetic level have emerged. However, these tools (Phylogenetic approach, admixture analyses,
multivariate analyses, F-statistics, hybrid index, Newhybrids, Introgress ...) always require some previous
knowledge about the studied hybrid system or/and the availability of genetic data from several individuals
within one hybrid system. This has logically constrained the study of hybridization to well characterized
and/or affordable model species. Here, we present a new tool designed to partly overcome these problems,
allowing the detection and partial classification (into genealogical classes) of hybrids from any single illumina
genome, in the absence of any other information. The tool relies on a maximum-likelihood fit of the
distribution of heterozygosity within each genome, which shape depends on the hybrid status of the
sequenced individual. We also present a jointly developed pipeline designed to make the estimation tool
easily and accurately applicable to virtually any NGS sequencing data (raw fastq files). We finally discuss how
the tool could soon be applied (work in progress) on the large scale to conduct wide comparative studies of
the prevalence of hybridization across the tree of life.
You can also read