FROM DATA TO KNOWLEDGE IN THE LANGUAGE SCIENCES - IRG 2020

Page created by Bernard Santos
 
CONTINUE READING
FROM DATA TO KNOWLEDGE IN THE LANGUAGE SCIENCES - IRG 2020
FROM DATA TO KNOWLEDGE IN
THE LANGUAGE SCIENCES

                Institut de plurilinguisme/Institut für Mehrsprachigkeit
                                        Fribourg/Freiburg (Switzerland)
                                                     6 to 8 February 2020

             BOOK OF ABSTRACTS

Symposium für junge Forschende | Symposium per giovani ricercatori
 Young researchers symposium | Symposium de jeunes chercheurs

                                                     www.irg2020.ch
                                                                       Photo by Paul Murphy
IRG Symposium 2020: From Data to Knowledge in the Language Sciences
6-8 February 2020, Institute of Multilingualism, Fribourg, Switzerland

BOOK OF ABSTRACTS

Overview

Schedule ............................................................................................................................... 3
Keynote presentations ........................................................................................................... 4
Workshops “Meet the Keynotes” ........................................................................................... 6
Poster presentations ............................................................................................................. 8
Paper presentations .............................................................................................................13
   Section 1: Types of data and data selection ......................................................................13
      Session 1A ....................................................................................................................13
      Session 1B ....................................................................................................................14
   Section 2: Access to data and data collection ...................................................................17
      Session 2A ....................................................................................................................17
      Session 2B ....................................................................................................................19
      Session 2C ....................................................................................................................18
   Section 3: Data management............................................................................................21
      Session 3A ....................................................................................................................21
      Session 3B ....................................................................................................................22
   Section 4: Data interpretation............................................................................................24
      Session 4A ....................................................................................................................24
      Session 4B ....................................................................................................................26
   Section 5: Challenging data: critique and validation ..........................................................28
   Section 6: Data reporting ..................................................................................................30

Important remark concerning the languages of the symposium: The main languages of
the symposium are English, French, German and Italian. All presentations will be held in the
language of the abstract printed in this document, unless otherwise stated. All presentations
not in English should feature English slides to help with understanding.
For discussions, groups should find a language concept comfortable to all, e.g. speak a
common language or mix spoken and heard languages as preferred.

info@irg2020.ch                                          v1.0 (24/01/20)                                                                2
Schedule
For changes please refer to our website or the printouts on site.
Click on the blue event titles to read the corresponding abstracts.

 Thursday,         from 10 a.m.          Registration & Poster installation
 February 6
                   11.30-12              Welcome                                       K0.02

                   12-14                 Lunch                                        [HEP I]

                   14-15                 Keynote: Tineke Brunfaut                      K0.02

                   15.15-17              Session 4A          L1.06 Session 1A          L1.08

                   17-17.30              Coffee break

                   17.30-19.15           Session 1B          L1.06 Session 2A          L1.08

                   19.15-20.15           Welcome Apéro

 Friday,           8.45-9.45             Keynote: Ingrid de Saint-Georges              K0.02
 February 7
                   9.45-10.15            Coffee break

                   10.15-12.00           Session 3A          L1.06 Session 4B          L1.08

                   12.00-14.00           Lunch & Poster session                        K1.03

                   14.00-15.45                                        Session 5        L1.08

                   15.45-16.15           Coffee break

                   16.15-18.00           Session 2           L1.06 Session 3B          L1.08

                   19-24                 Conference dinner

 Saturday,         8.45-9.45             Keynote: Sarah Schimke                        K0.02
 February 8
                   9.45-10.15            Coffee break

                   10.15-11.30           Session 2           L1.06 Session 6           L1.08

                   11.30-13              Lunch                                        [HEP I]

                   13-15.30              Meet the Keynotes                K1.03, L1.06, L1.08
                   14.30                 Coffee served

                   15.30-16              Closing session                               K0.02

info@irg2020.ch                         v1.0 (24/01/20)                                        3
Keynote presentations
Garbage in, garbage out – assessing data collection instruments
Tineke Brunfaut, Lancaster University
                                                                            Thursday, 14h, K0.02
As researchers in the language sciences, we select, design or adapt instruments to collect
data on our topic of interest. To ensure meaningful and useful interpretations of the data we
gather, we need to establish that the research instruments themselves are valid for our
research purposes and population. If the research instruments are flawed, the conclusions we
draw – whether with respect to language theory, learning, teaching or assessment – will be
misleading, lack sufficient grounding, and will not be credible.
It follows that an assessment or evaluation of the data collection instruments used should be
a key step in any research project. This process, termed validation, involves obtaining evidence
for the quality of the instruments, and thus for the claims being made about them and about
the data and findings resulting from them. It requires collecting evidence to justify the
interpretations of participants’ scores or answers on the research instruments. While the
importance of validating research instruments is increasingly recognised among researchers
in the language sciences, in practice, it is still not systematically implemented and validation
efforts do not always meet the accepted standards for developing, using and evaluating
research instruments.
In this talk, I will draw on the field of language testing and assessment to explain current
conceptualisations of validity and validation, and to describe validation frameworks that can be
used in language sciences research. I will also show examples of how such frameworks can
be used in practice to validate research instruments.

Conceptualiser la notion de « donnée qualitative » : contexte, tensions et possibilités
Ingrid de Saint-Georges, Université de Luxembourg
                                                                              Friday, 8h45, K0.02
Dans la sphère publique comme dans la recherche académique, les données sont aujourd’hui
au cœur de nombreux débats. Ces débats sont complexes et souvent contradictoires. Dans
cette présentation, nous en proposons trois lectures. Nous montrons d’abord par une lecture
historique que la manière dont les données sont pensées et appréhendées dans la science
dépend en partie de l’évolution du champ social, technologique et culturel. Un rapide survol
de ces évolutions sera avancé afin de dépasser certaines visions encore parfois simplistes
des rapports entre sciences qualitatives et quantitatives. Dans un deuxième temps, une lecture
critique des conditions contemporaines de conduite de la science sera envisagée. En
particulier, le contexte actuel de production de la science sera interrogé. Il s’agira d’examiner
les répercussions éventuelles d’une culture de l’audit, de la quantification et de l’évaluation sur
les pratiques de recherches dites qualitatives. Dans un troisième temps, une lecture
pragmatique visera à réfléchir aux rôles que la sociolinguistique et les approches
ethnographiques peuvent jouer dans l’élaboration de nouveaux agendas et de nouvelles
réflexions autour des données et de leur conception. En fin de compte, c’est donc la notion de
donnée comme idéologie et comme pratique qui sera abordée. Des arguments seront
également détaillés pour défendre, à l’encontre de l’imposition d’épistémologies normatives,
une manière plurielle d’envisager l’analyse et l’usage des données.

info@irg2020.ch                          v1.0 (24/01/20)                                         4
Die Interpretation von Online- und Offlinedaten in der Sprachwissenschaft
Sarah Schimke, Technische Universität Dortmund
                                                                        Saturday, 8h45, K0.02
Viele sprachwissenschaftliche Studien erheben sowohl Online- als auch Offlinedaten, um das
Wissen von Sprachnutzern über einzelne sprachliche Phänomene zu erfassen. Mit
Onlinedaten sind dabei solche Daten gemeint, die einen Einblick in Prozesse der
Sprachverarbeitung erlauben, während diese Prozesse stattfinden. Es kann sich dabei
beispielsweise um Lesezeiten und Blickbewegungen während des Lesens oder während der
Verarbeitung auditiver Stimuli handeln. Mit Offlinedaten sind Daten gemeint, die das Ergebnis
der Sprachverarbeitung wiederspiegeln, zum Beispiel Grammatikalitätsurteile. Dabei können
sehr häufig Online- und Offlinedaten für ein und denselben Prozess erhoben werden, zum
Beispiel, wenn Lesezeiten gemessen werden, während Probanden einen Satz lesen, und die
Probanden anschließend auch ein Grammatikalitätsurteil über den Satz abgeben.
In diesem Vortrag soll dargestellt werden, in welchem Verhältnis die resultierenden Daten
zueinanderstehen können, und wie sich verschiedene Verhältnisse jeweils erklären und
interpretieren lassen.
Grundsätzlich können Online- und Offlinedaten ein sehr ähnliches Bild des sprachlichen
Wissens zeigen, oder voneinander abweichen. In letzterem Fall gibt es einerseits Befunde, bei
denen online Wissen sichtbar wird, das sich offline nicht oder nicht so deutlich zeigt. Dieses
Muster tritt auf, weil viele Offlinedaten einen bewussten Zugang zu sprachlichem Wissen
voraussetzen, der nicht selbstverständlich gegeben ist (s. z.B. Höhle et al., 2016; Osterhout et
al., 2006). Es gibt aber auch Studien, in denen sich in Offlinedaten nachweisen lässt, dass
sprachliches Wissen vorhanden ist, während korrespondierende Onlinedaten zeigen, dass die
Anwendung dieses Wissens während der Verarbeitung in verschiedenen Gruppen
unterschiedlich schnell und zuverlässig ist (s. z.B. Pan et al., 2015; Roberts et al., 2008).
In dem Vortrag werden mögliche Interpretationen derartiger Muster diskutiert. Dabei werden
einerseits Eigenschaften der untersuchten Sprachnutzer einbezogen, insbesondere ihr Alter
zum Zeitpunkt der Datenerhebung und ihr Alter bei Erwerbsbeginn der untersuchten Sprache,
andererseits auch Eigenschaften der spezifischen experimentellen Aufgabe und des
sprachlichen Phänomens. Zusammenfassend unterstreicht die Komplexität der Ergebnisse
den Wert der Anwendung mehrerer Methoden.

info@irg2020.ch                         v1.0 (24/01/20)                                       5
Workshops “Meet the Keynotes”
                                                              Saturday, 13h, K1.03, L1.06, L1.08

Fitting the puzzle pieces together: the benefits and challenges of mixed-methods
research
Tineke Brunfaut, Lancaster University
                                                                             Saturday, 13h, K1.03
The use of mixed-methods approaches has considerably increased in language sciences
research in recent years. Mixed-methods research involves the collection and analysis of both
quantitative and qualitative data within the same study. It is justified by the idea that it combines
the strengths of both qualitative and quantitative methods, that it allows to explore a topic from
different perspectives, and that it helps uncover relationships between various aspects and
layers of the topic. An important characteristic of mixed-methods approaches is the purposeful
integration or linking of the various data strands as part of the data interpretation process. As
with any methodological approach, however, the suitability of mixed methods needs to be
carefully considered against the research questions. The use of this methodology also
presents its own challenges.
In this workshop, we will first look at examples of existing studies that have used mixed
methods. We will consider what types of research methods were combined in these studies,
and how methodological innovations in the language sciences might have enhanced mixed
methods research. We will explore what role the different datasets generated through the
different methods played within each study, as well as their accompanying data analyses. We
will also look into how the different pieces of information were tied together in each study and
how they contributed to answering the study’s research questions. Second, we will explore the
potential and suitability of mixed-methods approaches to workshop participants’ own research.
We will discuss challenges you may have experienced in conducting mixed-methods research,
questions you may have concerning mixed-methods methodologies, and ideas or suggestions
you may have around mixed methods research.

Theorizing and generalizing in fieldwork-driven language research
Ingrid de Saint-Georges, University of Luxembourg
                                                                              Saturday, 13h, L1.06
This workshop will be open to any questions participants have about their research. To launch
the discussion, however, we will focus on two related aspects of the research process that are
not often discussed in doctoral training in the language sciences: generalizing and theorizing.
       •   Generalizing: The idea that 'one cannot generalize' from case studies is regularly
           accepted as a fact in linguistic research. The first purpose of this workshop will be
           to question this assumption. Is it always true? And if one can never generalize from
           case studies, what might be the societal or intellectual impact of our qualitative
           research? We will examine different ways of understanding 'generalizing'. We will
           also discuss when and why we might want to adopt or avoid the discourse of
           generalizing altogether. The aim is not only to prepare oneself to answer a kind of
           criticism often addressed to qualitative research, but also to clarify the purpose of
           one's work.
                                                                                            (cont’d)

info@irg2020.ch                           v1.0 (24/01/20)                                          6
•   Theorizing: A major challenge for field researchers is to move from richly textured
           experiences which are diverse, subjective and piecemeal to the construction of a
           coherent and meaningful image of the field that matters to its actors, to decision-
           makers or to other researchers. In this part of the workshop, we will ask ourselves:
           When and how should we theorize? Should we wait until the observations are
           completed, or could it be interesting to theorize even prior to entering the field?
           Different strategies for theorizing will also be discussed, as will the reasons why it
           might be important to pay closer attention to our own theorizing processes.
The workshop will be interactive, focusing on the discussion of practical problems. Time
permitting, we will also consider the role of writing in theorizing and generalizing. At the end of
the workshop, a bibliography for further exploration of these questions will be made available.

How to research the same question in different types of language users – some
methodological considerations
Sarah Schimke, TU Dortmund University
                                                                            Saturday, 13h, L1.08
Linguistic research is often concerned with characterizing what language users know about a
specific language. There is no one method, however, that would allow for a privileged access
to this knowledge, as each method comes with limitations. While this in itself constitutes a
methodological problem, this problem is amplified by the fact that the same method may play
out differently in different types of language users.
For instance, in experimental research, adults may not be challenged by stimuli suitable for
children, and this may influence the way they use their linguistic knowledge. On the other hand,
children’s cognitive resources may make it impossible for them to treat materials that were
designed for adults. Next to age, other variables, such as educational background or
motivation, may also strongly influence how language users respond to experimental
situations. Similar difficulties arise for non-experimental methods, such as corpus research or
interviews.
Given all this, researchers who want to measure the same construct in different populations
are faced with the challenge of developing measures that are appropriate for each population
and still yield results that can be compared to each other in a meaningful way.
In this workshop, we will look at this problem from different perspectives and discuss possible
strategies for different types of research questions, data and participating language users. We
will discuss existing examples of work comparing different populations (e.g. Järvikivi et al.,
2015; Schimke & Dimroth, 2018; Verhagen & Schimke, 2009). In addition, workshop
participants are encouraged to bring their research questions, research methods, or existing
data.

info@irg2020.ch                          v1.0 (24/01/20)                                         7
Poster presentations
                                    Friday, 12h-14h (and anytime during the conference), K1.03

Collecting data in a comparative study on conversion and class-changing affixation in
Present-Day English and French
Chloé Marie Debouzie, Université Lumière Lyon 2
Quantitative corpus-based studies require access to a wealth of data. My research focuses on
analysing the competition between two morphological word-formation processes: lexical class-
changing affixation and conversion in Present-Day English and French. I have collected a
dataset of affixed and converted words (for example cheatN – cheaterN, nannyV –nannifyV,
googlerV - googliserV) to analyse the presence or absence of competing pairs.
First, I identified the data required for my study, using lists of prefixes and suffixes (there exists
no “set list” of affixes, therefore I compiled my own lists). Identifying conversion is more
problematic, as by definition, the input and the output are formally identical. This constitutes
one of the main challenges in my data collection.
To collect a manageable quantity of data, I restricted my scope to Present-Day English and
French, and decided to study words coined since 1950. I collected data using the online
versions of the Oxford English Dictionary and Le Grand Robert de la langue française. Using
dictionaries poses several methodological issues, such as the question of the reliability of
etymological data, the vagueness of the dates in French (some words being mentioned as
“20th century” or “middle of the 20th century”), and the question of the arbitrariness of the
words listed in dictionaries needs to be considered.
I then investigated corpora to collect further data (the Corpus of Contemporary American
English and the Corpus de Référence du Français Contemporain). The search for affixed
words is done using wild cards (selecting words beginning or ending with a specific affix) but
looking for converted words is much more problematic. In these corpora, words are tagged for
their part of speech, but tagging errors exist.
This poster will provide an opportunity to discuss the advantages and drawbacks of building a
dataset of constructed words using dictionaries and existing corpora.

Students’ language choice in Swedish compulsory school - expectations, learning and
assessment
Ingela Finndahl, University of Gothenburg
The aim of this poster is to receive feedback on the interpretation of data. The study, a PhD
project, is concerned with young learners’ choice of a second foreign language in the Swedish
school context.
The main data collection will be carried out in school year 2019/2020 as an ethnographic case
study. The project aims to investigate young language learners’ study of a second foreign
language (SFL) in a Swedish elementary school, focusing on their expectations, perceived
learning and achievements. A multi-methods design has been chosen, aiming to capture
learners’ beliefs through questionnaires and interviews, and learning practices and
assessment by observations and interviews.
Crucial questions concern the analysis and interpretation of the data, given the ethnographic
approach chosen. The statistical analysis of the questionnaires, the coding of the observations
and the transcriptions of the interviews will be work in progress by the time of the conference.

info@irg2020.ch                           v1.0 (24/01/20)                                           8
These are all aspects of my data that I wish to discuss from the point of view of analysis and
possible interpretation, and I look forward to receiving feedback from peers and experienced
researchers.
The contextual background to my study is that Swedish pupils choose an SFL after English in
year 5 and begin these studies in year 6, at the age of 12. A language choice is obligatory.
About 80 % of all pupils normally choose French, German or Spanish, but they can also decide
on additional English or Swedish, mother tongue (if other than Swedish) or sign language. The
focus of the study will be on the choice of a second foreign language, French, German or
Spanish, but other options will also be taken into account.

Measuring and enhancing the migrants’ comprehension of Italian administrative texts
Giulia Lombardi, University of Genoa
The aim of the research was to investigate the readability of Italian administrative documents
among foreigners and offer clearer alternatives or set up clarity guidelines that might improve
access to these documents. Too often, the Italian administrative language tends to be
unnecessarily difficult for all those who are L2 beginners in Italian language skills to become
resident in Italy and yet need to deal with various red tape and formalities. We decided to carry
on a quantitative analysis. The starter point of the research, in 2017, was a computational-
linguistic analysis of a synchronic mono-thematic corpus which includes the most important
forms foreigners have to submit in Italy, in order to find what lexical, semantic and syntactical
structures were too difficult for foreigners. Than, in 2018, 101 students of Italian L2 were tested
on the comprehension of authentic Italian institutional texts. Many difficulties were singled out:
the correlation between personal factors (like age, schooling, mother tongue and motivation)
and reading comprehension was analyzed by a multiple regression analysis. All the data were
collected in order to design specific language policies and practice. In the beginning of 2019,
61 students have been tested on simplified texts; among them, 32 attended a specific language
course. Data have been analyzed with t-test and Anova test: the amelioration of the
comprehension on the simplified texts is statically significant (df = 59, p-value = 1.066e-09),
while the manually reformulation seems to have a greater impact to respect to automatized
lexical simplification. The language course is also a predictor of a better comprehension (F
value = 4.56, p-value=0.037 *). The final results show what could be effective in enhancing the
migrants’ comprehension on such an important texts content.

Building a specialised corpus – a case study of generics in Norwegian
Anna Kurek-Przybilski, Adam Mickiewicz University in Poznań
Existing research on genericity focuses mainly on sentence analysis. Sentences, created for
the sake of a given study, do not contain a broader generic context, making sentence analyses
somehow incomplete. A solution to that can be conducting a corpus research on a tailor-made
corpus of generic texts, as the phenomenon is not tagged in any of the already existing
corpora. What is more, not every text genre contains generic expressions so creating a
database of many different text types may not give desired results.
In order to perform a study on genericity in Norwegian, 170 generic texts were retrieved from
an online encyclopaedia ‘Store norske leksikon’ (a data set of over 180000 words was created).
Each of the texts consisted of at least one paragraph and belonged to one of 5 categories: 1)
people, 2) animals, 3) plants, 4) tools, 5) other. The texts were tagged with the use of R
software.
Choosing an encyclopaedia as a source, makes the data homogenous. This has both
advantages and disadvantages. On the one hand, it puts limitations to data analysis. On the

info@irg2020.ch                          v1.0 (24/01/20)                                         9
other hand, a homogenous data set is easy to manage in terms of manual corpus tagging and
text sorting since all the samples contain the studied phenomenon.
The type of data chosen for the study on generics in Norwegian proved crucial for successful
analyses. Narrowing genres to encyclopaedic texts not only provides the data on generics in
context but also guarantees that each of the samples will include generic nouns and noun
phrases. This approach to studying genericity in Norwegian is innovative and can lay the
foundation for further research on the phenomenon.

A diachronic look at the English passive: Distributional semantics of be vs get
Axel Bohmann, Mirka Honkanen, Julia Müller & Miriam Neuhausen
Albert Ludwig University of Freiburg
In this poster, we discuss the method and first findings of a distributional semantic analysis of
the passive construction in a large diachronic corpus of American English. We compare the
distribution of the canonical be-passive and the more recent get-passive (Schwarz 2018) to
see whether the alleged connotations of the latter (adversativity, agentivity/responsibility, etc.)
(Huddleston & Pullum 2002) are empirically verifiable and historically stable.
Distributional semantics (Erk 2012; Perek 2018) is a corpus-based method that allows
investigating the types of lexical verbs that commonly occur with each of the passive auxiliaries
and visualizing their semantic similarity or distance. It is based on the assumption that words
that occur in similar contexts—i.e. have many of the same collocates—have similar meanings
as well. This approach enables us to follow the individual development of each passive
construction over time as well as compare them at different points in time.
We apply this method to the Corpus of Historical American English (Davies 2010–), which
consists of 400 million words of written American English from the 1810s–2000s. First, all
instances of the passive voice in the corpus were extracted with a Python script. The thousand
most frequent verbs that occur with both auxiliaries were included in the analysis. We represent
these verbs as vectors in semantic ‘space’ on the basis of their collocate frequencies. In this
visualization, verbs that occur in similar contexts cluster together.
Our analysis demonstrates an innovative method that relies on the availability of very large
amounts of corpus data. Such data offer a new way of looking at the interface of semantics
and structural change, based on a quantitative approach to semantics.

Sprachbiografien junger Erwachsener aus Romanisch- und Italienischbünden
Language biographies of young adults from the Romansh and Italian areas of Grisons
Flurina Kaufmann-Henkel, Sabrina Sala, Pädagogische Hochschule Graubünden
Im vorliegenden Projekt werden Sprachbiografien junger Erwachsener aus Romanisch- und
Italienischbünden untersucht. Sowohl Romanisch als auch Italienisch gelten als
Minderheitensprachen im Kanton Graubünden. Die Studie interessiert sich dafür, wie die in
gemischtsprachigen Familien aufgewachsenen Teilnehmenden, die mindestens einen
Sprachraumwechsel durchlaufen haben, ihre Mehrsprachigkeit erleben, reflektieren und
kommentieren. Diese jungen Erwachsenen, die in der Familie eine weitere Sprache neben
Romanisch respektive Italienisch sprechen, sehen sich als Minderheit in der Minderheit mit
besonderen sprachlichen Herausforderungen konfrontiert.
Es sind 19 junge Erwachsene aus Italienisch- und 21 aus Romanischbünden interviewt
worden. Die Erhebungsmethode bestand einerseits aus einem biografisch-narrativen
Interview, andererseits aus einer anschliessenden leitfadengestützten Befragung. Der
Stimulus zur Spontanerzählung gab jeweils die Gestaltung eines Sprachenportraits.

info@irg2020.ch                          v1.0 (24/01/20)                                        10
Es liegen zur Zeit des Symposiums einige Transkripte vor, wovon Ausschnitte auf dem
vorliegenden Poster zu sehen sind. Des Weiteren kann Einsicht in einzelne Sprachenportraits
mit den dazugehörenden Transkriptausschnitten gegeben werden. Das Forschungsteam stellt
eine strukturierende Inhaltsanalyse mit deduktiver Kategorienanwendung am Datenmaterial
vor, möchte aber auch weitere inhaltsanalytische Verfahren zur Diskussion stellen.

Chroniques de terrains - L’ethnographie, une question de terrain
Tales from the Field – Formulating questions during fieldwork
Kevin Petit Cahill, ICAR, Université Lumière Lyon 2
En sociolinguistique ethnographique, les questions de recherche ne préexistent pas au terrain
mais sont construites et reformulées par le travail de terrain via un va-et-vient constant entre
théories et observations participantes. L’objectif de ce poster est d’illustrer cela par mon
expérience de thèse.
Lorsque je commence ma recherche, je m’intéresse au mouvement de revitalisation de la
langue irlandaise, et plus particulièrement à une pratique populaire depuis plus de cent ans
qui consiste à se rendre l’été dans des colonies de vacances (ou summer colleges) pour
apprendre la langue. Je m’inscris d’abord dans une tradition de recherche sur les langues en
danger qui visait à « reverse language shift » (Fishman 1991) principalement via la production
de locuteurs. Je me focalise donc sur les effets « techniques » des colonies sur les élèves, et
plus particulièrement sur les facteurs influençant leur motivation et donc leur niveau de langue.
La particularité de ces colonies est qu’elles proposent un enseignement en immersion dans
des régions officiellement irlandophones, la Gaeltacht. Mais une fois sur le terrain je réalise
que la summer college experience ne consiste pas à se plonger en immersion dans un bain
monolingue irlandais naturellement présent. La Gaeltacht étant en fait bilingue, l’expérience
consiste plutôt à produire cet espace imaginé comme monolingue. De plus, les effets
techniques restant relativement limités, mes questionnements se déplacent alors sur
l’ « efficacité proprement magique d'initiation et de consécration » de l’action pédagogique
(Bourdieu 1981). Je m’intéresse donc maintenant au rôle de l’expérience des summer colleges
dans la naturalisation (ou la contestation) de catégories sociales telles que la Gaeltacht.
C’est en prenant une perspective émique et interpretiviste propre à l’ethnographie (i.e.
s’intéresser à comment les acteurs créent du sens par leurs pratiques sociales) que j’ai pu
reformuler mes questions de recherche au gré du travail de terrain.

Evaluation de la prononciation en français L1/L2, entre données qualitatives et
quantitatives
French L1/L2 pronunciation evaluation: between qualitative and quantitative data
Marion Didelot, Université de Genève
Notre recherche de thèse porte sur la réception et l’évaluation de la parole accentuée native
et non native en français auprès de différents groupes d’auditeurs. Nous nous basons sur les
travaux menés en folk linguistics (Niedzielski & Preston 2003), qui s’intéressent à ce que les
locuteurs non spécialistes pensent et affirment à propos de la langue et qui confrontent cette
approche avec ce qu’ils font en réalité. Notre recherche comporte ainsi deux volets : une
expérience de perception d’une part, dans laquelle des auditeurs doivent évaluer, sur une
échelle de Likert, « à l’aveugle » (c’est-à-dire en se basant uniquement sur un extrait sonore)
différents extraits produits par des locuteurs natifs et non natifs de français, en répondant à
des questions d’ordre sociolinguistique et linguistique, et, d’autre part, des entretiens semi-
dirigés qui permettent d’approfondir certains thèmes abordés dans l’expérience de perception.

info@irg2020.ch                         v1.0 (24/01/20)                                       11
Nous obtenons ainsi des données quantitatives et qualitatives à propos de la parole
accentuée. Pour notre expérience de perception à l’aveugle, nous avons choisi deux locuteurs
par variété de français soumise à évaluation et nous cherchons, dans la mesure du possible,
à former des groupes d’auditeurs relativement homogènes.
Notre communication se focalisera sur les liens entre données quantitatives et qualitatives et
sur les limites de nos résultats. En effet, si l’intérêt de la démarche choisie réside surtout dans
la complémentarité qu’offre l’étude des représentations/attitudes à la fois conscientes et moins
conscientes, elle nous semble également avantageuse quant à l’interprétation des données
récoltées. L’analyse des entretiens pourrait ainsi nous amener à (ré)interpréter nos données
quantitatives et, peut-être, expliquer certains résultats surprenants le cas échéant. Enfin, la
question de la généralisation de nos résultats se pose également, en raison notamment du
choix des locuteurs sélectionnés pour représenter une variété de français et de la
représentativité des auditeurs de notre étude.

info@irg2020.ch                          v1.0 (24/01/20)                                        12
Paper presentations
Section 1: Types of data and data selection
Session 1A

                           Session chair: Katja Fiechter (University of Fribourg/Switzerland)
                                                                   Thursday, 15h15-17h, L1.08

Benefits and limitations of a combinatorial approach to agentivity
Célia Hoffstetter, Grenoble Alpes University
Agentivity has often been conceptualized as a semantic feature of a category of verbs called
“agentive verbs”, whose grammatical subject can only be animate and “thought of as the willful
source or agent of the activity described in the sentence” (Gruber 1965). However, this
approach does not satisfactorily account for a great number of cases where inanimate entities
“do” something. For instance, the inanimate subject in “The stone flew across the window” can
hardly be "willful" as is the human-animate subject in “Charles Lindberg flew across the
Atlantic”, although it retains "some notion of agency" (Quirk et al. 1985) which needs to be
further specified. Drawing from constructional approaches in cognitive linguistics (Goldberg
1995, Fillmore & Kay 1999), I argue that agentivity is not only a semantic property of verbs,
but rather emerges from constructions, i.e. combinations of words. In this paper, I will explain
the benefits of examining subject-verb combinations in a corpus, as well as the limitations that
are necessarily involved in the wording of corpus searches. In that perspective, I will introduce
Lexicoscope, a corpus analysis tool developed by Kraif and Diwersy (2016) dedicated to the
study of combinatorial profiles of lexical entries, and how it can be used on a large corpus –
more than 30 million words – containing a wide range of inanimate referents that may be
considered active in different respects. More specifically, I will compare the results produced
by two types of searches, one of which focuses on the noun phrase (e.g. “the stone”), and the
other on the verb phrase (e.g. “flew”) to show the differential impact of such a methodological
choice on data collection.

Combining conversation analysis and experimental methods in the study of
comprehension of interaction by L2 learners
Simone Morehed, University of Fribourg/Switzerland
Comprehension is crucial in L2 interaction, without which the learner is not able to interact in
an appropriate manner. However, comprehension in interaction is largely absent in research
in interactional and pragmatic competences.
Production is studied through conversation analyses of interactions between L2 speakers,
where research shows that although L2 learners develop their interactional proficiency
(Skogmyr et al. 2017), they often do not express themselves in the same way as the L1
speakers, and might encounter disruptions even at advanced levels.
Comprehension is included in experimental studies of specific pragmatic markers. Even
though the authentic oral interaction is studied, these studies mostly use written or oral non-
authentic material (Culpeper et al. 2018).
Previous studies conclude that L2 learners often have different comprehension issues in
interaction, but we do not yet know which aspects of interaction are the most crucial for the L2

info@irg2020.ch                         v1.0 (24/01/20)                                       13
learner’s comprehension. There is a clear need for research focusing on comprehension in
interaction.
In this presentation we will discuss the methodological potentials and challenges of studying
comprehension by combining conversation analysis with an experimental approach, using
authentic corpora as material (Kendrick 2017). We will discuss the use of authentic material in
an experimental study, more precisely the variation and representativeness of the material, the
control of variables, the level of the conversation analyses (micro/macro), and the length of the
interactions (role of the sequential context).

Die Gratwanderung zwischen freien und vorgegebenen Antworten in einer Online-
Umfrage zu schweizerdeutschen Dialekten
The dilemma of analyzing open-ended and multiple choice questions in an online survey on
Swiss-German dialects
Melanie Bösiger, Universität Freiburg/Schweiz
Schweizerdeutsch ist eine vielfältige Sprache und so haben Sprecher_innen zum Ausdruck
eines bestimmten Sachverhalts manchmal mehrere Möglichkeiten. Dabei werden bestimmte
Formulierungen präferiert, andere eher selten verwendet. So zum Beispiel bei den
Possessivkonstruktionen: Neben dativischen Konstruktionen mit von (‚de Teddy vo de Anna‘)
oder mit Possessivpronomen (‚de Anna ihre Teddy‘) kommt auch im Schweizerdeutschen der
Genitiv vor: ‚s Annas Teddy‘. Der Genitiv gilt vielerorts als archaisch und wird nur selten
gebraucht, findet aber bei Bildungen mit Eigennamen und Appellativen durchaus Verwendung.
Er ist für die Dialektforschung insofern interessant, als dass wegen seiner Seltenheit eine
gewisse Unsicherheit bei der Bildung besteht. Insbesondere beim Artikelgebrauch gab und
gibt es Wandel. Aber wie erhebt man selten vorkommende Phänomene in der Dialektologie?
In zwei aufeinander folgenden Online-Umfragen im Rahmen zweier Dissertationsprojekte
wurde versucht, dem Genitiv auf die Spur zu kommen. Die Problematik bestand dabei eben
darin, dass der Genitiv eine von mehreren Möglichkeiten ist, die Sprecher_innen zur Bildung
von Possessivkonstruktionen verwenden können. Bei reinen Übersetzungsfragen („Wie sagen
Sie 'Annas Teddybär' in Ihrem Dialekt?“) gehen darum viele Daten verloren, weil Antwortende
eine andere Konstruktion wählen, obwohl sie in ihrem Dialekt auch den Genitiv bilden könnten.
Es müssen also gewisse Vorgaben in der Fragestellung enthalten sein, die den Genitiv zwar
elizitieren, aber trotzdem nicht suggerieren. Von dieser Gratwanderung soll im Vortrag
berichtet werden: Die Resultate beider Online-Umfragen werden verglichen und so können
Vor- und Nachteile der unterschiedlichen Vorgehensweisen aufgezeigt werden.

Session 1B

                                                                            Session chair: tba
                                                                Thursday, 17h30-19h15, L1.06

Fieldwork, corpora, and tailored methods in dialect syntax
Cameron Morin, University of Paris, Jack Grieve, University of Birmingham
This paper focuses on some empirical problems and solutions in the study of rare language
variation, through the case study of dialect syntax in English, and drawing on substantial
fieldwork by the author.

info@irg2020.ch                         v1.0 (24/01/20)                                       14
Multiple modals are peripheral but noticeable constructions in several British and American
basilects. The following examples come from Borders Scots:
    (a) He’ll can help us tomorrow.
    (b) They might could be working.
Investigating these features is an empirical and methodological challenge. Firstly, classic
corpora-based enquiries (AMC Edinburgh) reveal themselves to be insufficient. This is
supposedly due to the marginality of the structures, even in the varieties where they have been
suggested to occur.
Alternative sources of data collection may prove more useful, such as fieldwork
experimentation directly interacting with the speech communities concerned. In January 2018,
I conducted a field experiment in the town of Hawick (Borders), distributing a questionnaire to
approximately 60 respondents from various age groups and occupations. The questionnaire
was semi-structured, and revolved around tasks of judgment elicitation and syntactic
manipulations to get a quantitative and qualitative picture of double modals in this
representative locus of Borders Scots which could never have been provided through a corpus.
However, do intuition-based judgments unfailingly deserve our trust? There are serious
empirical issues with these alternative methods, and close scrutiny must be brought to the
ways of avoiding their biggest pitfalls.
These new problems might be well compensated, however, by a new combinatorial and
multidimensional approach to rare dialect syntax, by reappraising both corpora compilation
and fieldwork; and cross-examining specific quantitative and qualitative aspects of their
individual components to establish the coherence of the resulting picture. This is a view on the
rise in studies of language variation and change, and it is one I am currently developing for his
doctoral investigation of multiple modals in English.

Dall’idea ai fatti: i compromessi nella ricerca
From the idea to the facts: compromising in research
Dalila Dipino, Università di Zurigo
Le ricerche degli ultimi decenni in sociolinguistica hanno avviato una proficua riflessione sul
lavoro di raccolta e costruzione dei dati linguistici (cfr. D’Agostino, 2006; Calamai, 2004),
mostrando come l’elaborazione di metodi di ricerca adeguati ai propri scopi sia un’operazione
estremamente complessa. Anche nel nostro caso la fase della progettazione e composizione
del corpus si è dimostrata assai ardua e delicata.
Il progetto di ricerca in questione intende studiare la realizzazione di un tratto fonetico
soprasegmentale, la lunghezza vocalica, in alcuni dialetti italo-romanzi settentrionali,
appartenenti a tre sottogruppi diversi del ligure (Forner, 1988). Gli obiettivi originari erano molto
ambiziosi: ci si era proposti di raccogliere dati relativi alla lingua parlata di oltre 25 informatori
per ognuno dei tre gruppi dialettali, equamente differenziati per sesso ed età, così da ottenere
un corpus robusto, bilanciato e rappresentativo. Ugualmente ambiziosa era l’idea di effettuare
test per la raccolta di materiali eterogenei, in svariati contesti prosodici e pragmatici: dal parlato
spontaneo al Discourse Completion task, dai compiti di traduzione fino al Map Task (per una
panoramica v. Calamai, 2015).
Gli obiettivi iniziali sono stati progressivamente ridimensionati di fronte alle difficoltà di
elaborazione, da parte del ricercatore, e di svolgimento, da parte dei soggetti, di un
questionario tanto complesso. Innanzitutto, abbiamo sperimentato la difficoltà di elaborare test
adatti ad età molto diverse. Perfino la presentazione degli stimoli si è dimostrata un passaggio
delicato, costituendo l’intermediazione dell’italiano una pericolosa fonte di pressione sulle

info@irg2020.ch                           v1.0 (24/01/20)                                          15
produzioni dialettali. Non ultima, la difficoltà nel reclutamento degli informatori, considerata la
scarsità di persone dialettofone, soprattutto giovani, e i numerosi rifiuti.
Nel nostro contributo si illustreranno i tentativi di risoluzione dei problemi suesposti (dalla
rielaborazione dei metodi di ricerca, alla creazione di test innovativi e attività ludiche fino al
sostegno degli enti locali) e le questioni tuttora irrisolte.

Using corpus data for pragmatic analysis: Researching response tokens with the
International Corpus of English (ICE)
Annika Blum, University of Bayreuth
Until recently, corpus linguistic studies only rarely considered pragmatic phenomena. The
relationship between these two fields could have been summarized as “parallel but often
mutually exclusive and excluding” (Romero-Trillo 2008: 2): While corpus linguists prefer to
work quantitatively and read texts vertically, pragmaticists work mostly qualitatively and tend
to read texts horizontally including contextual information on the variable under investigation.
The relatively new field of corpus pragmatics, however, combines corpus linguistics with
pragmatics and promises that new insights will be gained through this approach (Rühlemann
& Aijmer 2015).
The proposed paper is linked to a PhD project on response tokens (RTs) in the field of
variational pragmatics. These studies focused for a long time exclusively on RT use in the
Inner Circle varieties of American, British and Irish English (McCarthy 2002, 2003; Murphy
2012; O’Keeffe & Adolphs 2008; Wong & Kruger 2018). This PhD project seeks to fill a
research gap by adding variational pragmatic and corpus pragmatic research on RT use
addressing different oral text types in Outer Circle Englishes, i.e. Nigerian and Philippine
English.
It will be shown how corpus pragmatics can contribute to the study of RTs in different Outer
Circle varieties of English and text types. Due to the variable under investigation only dialogic
exchanges, such as face-to-face conversations, phone calls or broadcast interviews and
discussions etc., will be considered. Consequently, written and spoken, monologic text types
will be excluded from the study. Using these spoken, dialogic sub-corpora of the International
Corpus of English (ICE), it shall elaborate on the choice of data sets and the choice of tools
and methods for data analysis and interpretation. Additionally, it aims to reflect on the limits of
working with secondary data by highlighting the challenges that researchers on pragmatics
have to deal with when working with corpora that do not contain pragmatic annotation.

info@irg2020.ch                          v1.0 (24/01/20)                                        16
Section 2: Access to data and data collection
Session 2A

                        Session chair: Kevin Petit Cahill (ICAR, Université Lumière Lyon 2)
                                                                Thursday, 17h30-19h15, L1.08

Données quantitatives en territoire insulaire : quel(s) modèle(s) interprétatif(s) ?
Quantitative data in an insular territory: which interpretative model(s) can be used?
Cleudir Filipe da Luz Mota, Laboratoire DyLis, Université de Rouen Normandie
La République du Cap-Vert est un petit pays insulaire dont la population est estimée à environ
538.000 habitants (Instituto Nacional de Estatísticas, 2018). Sa situation sociolinguistique est
marquée par une cohabitation entre la langue capverdienne (un créole de base lexicale
portugaise formé pendant la colonisation ; aujourd’hui langue nationale, parlée par la quasi-
totalité de la population dans les situations de communication informelles) et la langue
portugaise (langue officielle utilisée dans les contextes formels).
Depuis son indépendance, en 1975, les différents gouvernements que l’archipel a connus ont
tenté d’adopter une politique linguistique qui mènerait à l’officialisation de la langue
capverdienne. Ceci a suscité un grand débat social et politique autour des conséquences de
ces « interventions » (Calvet, 2017).
Dans le cadre de notre étude de terrain réalisée sur quatre îles du Cap-Vert (Santo Antão, São
Vicente, Santiago et Fogo), nous avons recueilli les avis des Capverdiens par rapport aux
mesures de politique linguistique adoptées. Ayant adopté une approche à la fois quantitative
(Berthier, 2000) et qualitative et choisi de mener des enquêtes directives (Blanchet et
Chardenet, 2011), nous avons administré (en langue capverdienne) un total de 289
questionnaires dans des espaces publics.
Comme sur chaque île les enquêtés utilisent leur propre variété de la langue capverdienne,
leurs avis sont très variés. Un nouveau défi s’est alors présenté : quels modèles
d’interprétation adopter pour analyser et synthétiser des données récoltées sur un terrain à
caractère archipélagique ? En effet, nombreuses sont les variables à prendre en compte (entre
autres le niveau de scolarité, les langues parlées et l’île d’origine) dans le cadre d’une étude
réalisée au niveau « macro ».
Notre enquête nous a ainsi permis de prendre en considération de nombreux enjeux
méthodologiques (liés aux outils et au terrain de recherche) dont il faudrait rendre compte
lorsque l’on réalise des enquêtes sociolinguistiques sur un terrain insulaire où les enjeux
identitaires sont fortement présents.

Radicalité djihadiste et médias sociaux : enjeux, méthodes et défis liés à la sélection et
à la récolte de données sensibles
Jihadist radicality and social media: Issues, methods and challenges related to the selection
and collection of sensitive data
Laurène Renaut, Université de Cergy-Pontoise
Cette communication qui s’inscrit au croisement de plusieurs courants des sciences du
langage (linguistique appliquée et analyse du discours) se propose d’interroger, dans le
contexte de la radicalisation djihadiste en ligne, les méthodes pour sélectionner et récolter des
données sensibles issues des médias sociaux.

info@irg2020.ch                         v1.0 (24/01/20)                                       17
Notre recherche s’appuie sur les données publiques de 100 profils radicalisés sur Facebook
(60 hommes et 40 femmes répertoriés selon l’organisation terroriste dont ils se réclament et
le degré d’activité de leurs comptes) ; donc sur un corpus numérique anonymisé pour des
raisons de sécurité et confidentialité. Précisons que la constitution de notre corpus a exigé une
phase d’observation de la djihadosphère, donc une enquête préparatoire reposant sur un
parcours de recension des comptes radicalisés afin d’investir le terrain questionné. De ce
travail découle ensuite une phase de caractérisation ou circonscription du territoire visant à
préétablir une grille de critères en veillant à cerner la catégorie « djihadiste » en comparaison
à d’autres catégories comme les comptes salafistes.
Si dans notre thèse nous interrogeons les évolutions des stratégies sémio-discursives
déployées pour se dire « djihadiste » sur les réseaux sociaux entre 2015 et 2019, le focus de
cette communication sera porté sur les défis rencontrés plutôt que sur les résultats des
analyses menées. Dans cette perspective, nous évoquerons les difficultés d’accès à ce corpus
et les problèmes relatifs à la nécessaire anonymisation du chercheur menant cette étude et
utilisant des avatars Facebook pour sa sécurité. Par ailleurs, nous aborderons les obstacles
surmontés tant dans le choix des données que dans leur collecte (comptes censurés, évolution
du RGPD et problématique du web scraping).

Session 2B

                                                           Session chair: Philippe Humbert
                                       (University of Teacher Education Fribourg/Switzerland)
                                                                      Friday, 16h15-18h, L1.06

Observer et interpréter « chez soi » : entre chercheuse et actrice sociale
Observing and interpreting as an “insider”: between researcher and social actor
Salomé Molina Torres, Université Sorbonne Nouvelle - Paris 3
Dans le cadre de ma recherche doctorale, je réalise une ethnographie multi-située où
j’interroge les enjeux sociolinguistiques du processus de production d’une communauté
imaginée (Anderson, 1983) colombienne à Paris. La méthodologie de mon enquête relève de
la participation observante (Rötterink, 2008) au sein des réseaux et espaces colombiens à
Paris. Etant moi-même colombienne migrante, mon arrivée au terrain s’est faite avant même
qu’il devienne un espace de réflexion sociolinguistique. Ceci facilite la prise de contact avec
certains réseaux, mais représente également une difficulté vis-à-vis du regard que je porte sur
les phénomènes socio-langagiers qui m’intéressent. Le caractère subjectif de ma recherche a
été depuis le début à la fois une motivation et un grand questionnement de mon approche
ethnographique. Si mon appartenance à la migration colombienne contribue à ma réflexion
académique, mon implication personnelle représente un possible biais dont je suis consciente.
Etant donné que mon interprétation est influencée par les relations que j’ai construites en tant
que colombienne et chercheuse, je m’interroge constamment sur mon rôle au sein de cette
migration vis-à-vis des catégories qui m’y sont attribuées (colombienne, étudiante,
chercheuse, amie, jeune…). Est-il nécessaire et possible de tracer une frontière entre un moi
migrante/colombienne et un moi chercheuse ? Si on n’attend plus de l’ethnographe qu’il
s’efface du terrain pour garder une posture de neutralité (Volvey, 2014), les débats sont encore
centrés sur son statut en tant que chercheur. Pourtant l’ethnographe n’est pas que chercheur
dans son terrain ; il s’y investit personnellement. Comment son engagement influence son
interprétation ? Pour aller au-delà d’une compréhension dichotomique (rapprochement-
distanciation ; implication-désimplication) d’une « anthropologie chez soi » (Ouattara, 2004),

info@irg2020.ch                         v1.0 (24/01/20)                                       18
You can also read