What's up, Switzerland? Final workshop - Elisabeth Stark, University of Zurich - What's up, Switzerland
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
What‘s up, Switzerland?
Final workshop
Elisabeth Stark, University of Zurich
estark@rom.uzh.ch
University of Zurich
03/11/2018The project • Funded by the Swiss National Science Fundation • Project sum: CHF 1'597'904 • Project leader: Elisabeth Stark (estark@rom.uzh.ch) • Involved Universities: Zurich, Bern, Neuchâtel, Leipzig • Project duration: 36 months (1/1/2016 – 31/12/2018) plus extension 03/11/2018 2
Two Overall Research Questions
• What do Swiss WhatsApp messages look like? What has changed overall
between Swiss SMS and Swiss WhatsApp messages, and why (as regards
linguistic structures, use of images in a broad sense, spelling, register-
specific style, individualization vs. accommodation)?
• What is said / done by the individual users and the media in/on WhatsApp
messages and chats, in relation to the findings for question 1?
03/11/2018 3Aims of this workshop 1. Present and discuss results of doctoral students and post-docs • …with the respective external experts, but also • the whole audience! 2. Get valuable guidance for the last part of their work: writing up! 03/11/2018 4
Other large-scale WhatsApp copora
• Siebenhaar et al.: What‘s up, Deutschland?
• Beißwenger et al.: MoCoDa (Mobile Communication Database): Aufbau
einer Datenbank zur digitalen Kurznachrichtenkommunikation (WhatsApp,
SMS & Co.) als Ressource für Forschung, Lehre und Unterricht. (ongoing)
Non-public WhatsApp corpora
• Hilte et al. (Antwerpen): Flemish Online Teenage Talk
• Lieke Verheijen (Groningen): Dutch WhatsApp collection
Lists of CMC corpora:
• www.cmc-corpora.ch
• www.clarin.eu/resource-families/cmc-corpora
03/11/2018 5Subproject A: Language(s) of WhatsApp: Verbal Periphrases
(VP) and Argument Drop (AD)
• Research question:
VP and AD as register-specific features (in the sense of Biber 1995) and/or
mainly technologically provoked?
• Lead: Elisabeth Stark (Zurich), Silvia Natale (Bern)
• Doctoral students: Franziska Stuntebeck (Fr 14.45 – 15.30), Rossella Maraffino
(Sa 9.45 – 10.30)
• Collaboration partners:
– Florence Lefeuvre (Université Sorbonne Nouvelle, Paris 3)
– Liliane Haegeman (Ghent University), Fr 14.00 – 14.45: Aspects of subject
omission in the diary register
– Christiane von Stutterheim (University of Heidelberg), Sa 11.00 – 11.45:
Event unit formation under a cross linguistic perspective
– Monique Flecken (Max Planck Institute for Psycholinguistics, Nijmegen), Sa
09.00 – 09.45: Influences of aspect on event processing 6Subproject B: Language Design in WhatsApp: Icono/Graphy
• Research question:
nature and function of graphic strategies, especially new sets of
iconographic signs (emojis).
• Lead: Christa Dürscheid (Zurich), Federica Diémoz (Neuchâtel)
• Postdocs: Christina Siever (Thu 13.30 – 14.15), Etienne Morel (Fr 11.45 –
12.30) /Silvia Natale (Fr 11.00 – 11.45).
• Collaboration partners:
– Jürgen Spitzmüller (University of Vienna), Thu 14.15 – 15.00:
Mediatised Lifeworlds - Young people's narrative constructions,
connections and appropriations: Introducing a Research Platform
– Marie-José Béguelin
03/11/2018 7Subproject C: Individuals in WhatsApp
• Research question:
Individual vs. group variation, patterns of accommodation in interaction.
• Lead: Beat Siebenhaar (Leipzig)
• Doctoral student: Samuel Felder (Fr 09.45 – 10.30)
• Collaboration partners - external expert:
Michael Beißwenger and Steffen Pappert (University of Duisburg/
Essen), Thu 15.30 – 16.15: Use of emojis in a didactic peer-feedback
setting: A pragmatic analysis and description framework
– Jannis Androutsopoulos (Hamburg)
– Peter Schlobinski (Hannover)
03/11/2018 8Subproject D: The Cultural Discourses and Social Meanings of
Mobile Communication
• Research question:
What does the public discourse on graphic mobile communication via
WhatsApp (and SMS) look like?
• Lead: Crispin Thurlow (Bern)
• Doctoral student: Vanessa Jaroski (Thu 10.30 – 11.15)
• Collaboration partners:
– Lauren Squires (Ohio State University).
– Ana Deumert (University of Cape Town) Thu 09.15 – 10.00:
Sociolinguistics and Mobile Communication - Looking back and looking
ahead, a view from the global south
– Rodney Jones ( Reading University, England) Thu 11.15 – 12.00:
GDPR, digital surveillance, and the semiotic coercion of consent
9Where we stand
• All doctoral students and the postdocs have finished their empirical analyses
and are now starting to write up.
• Students wrote 10 papers based on the corpus.
• 111 presentations were given by the team.
• 32 publications with a link to the project / the corpus were published. Many
more are in preparation.
• The corpus was used by 23 people outside the project for their research.
• 123 articles about the project appeared in the printed press, radio or
television until now.
• The project was also presented to a general public, e.g. at the Scientifica
2017 (UZH, A. Göhring), at the Kinderuniversität 2017 (UZH, Ch. Siever) or at
the Wissenschaftsfestival 2018 (UZH, Ch. Dürscheid).
03/11/2018 10Some selected presentations:
• 09/06/2017: Samuel Felder/Beat Siebenhaar: Individual, accommodation,
synchronisation. The use of emojis in WhatsApp communication. International
Conference on Language Variation in Europe, Malaga.
• 19/07/2017: Samuel Felder: Stylistic Variation as a Means for Identity
Construction in WhatsApp Interactions. 15th International Pragmatics
Conference, Belfast.
• 21/07/2017: Etienne Morel and Cécile Petitjean: "Hahaha": How and why to
produce laughter in WhatsApp conversations. 15th International Pragmatics
Conference, Belfast.
• 27/07/2017: Christa Dürscheid/Christina Siever: On the Relation of Writing and
Images in Digital Communication. AILA, Rio de Janeiro.
• 27/06/2018: Adam Jaworski, Crispin Thurlow: Deconstructing the ideologies of
universal visual language. 22nd Sociolinguistics Symposium, University of
Auckland.
• 13/09/2018: Liliane Haegeman: Subject ellipsis and the anaphorizing
deficiency of impersonal pronouns. Linguistics Association of Great Britain,
11
Annual Meeting 2018, University of Sheffield.Some selected publications:
• Dürscheid, Christa/Siever, Christina M. (2017). Jenseits des Alphabets. Kommunikation mit
Emojis. Zeitschrift für Germanistische Linguistik 45/2, 256–285.
• Jucker, Andreas H., Heiko Hausendorf, Christa Dürscheid, Karina Frick, Christoph Hottiger,
Wolfgang Kesselheim, Angelika Linke, Nathalie Meyer, and Antonia Steger (2018). Doing
space in face-to-face interaction and on interactive multimodal platforms. Journal of
Pragmatics 134, 85-101.
• Lusetti, M., Ruzsics, T., Göhring, A., Samardžić, T., and Stark, E. (2018). Encoder-Decoder
Methods for Text Normalization. In Proceedings of the Fifth Workshop on NLP for Similar
Languages, Varieties and Dialects (VarDial 2018), 18–28. Santa Fe, New Mexico, USA.
• Petitjean, Cécile/Morel, Etienne (2017). "Hahaha": Laughter as a Resource to Manage
WhatsApp Conversations. Journal of Pragmatics 110, 1-19.
• Siebenhaar, Beat (2018.: Funktionen von Emojis und Altersabhängigkeit ihres Gebrauchs in
der WhatsApp-Kommunikation. In: a. Ziegler (ed.): Jugendsprachen. Aktuelle Perspektiven
internationaler Forschung. Berlin: De Gruyter.
• Thurlow, Crispin (in press): Mediatizing sex: Sexting and/as digital discourse. In K. Hall & R.
Barrett (eds:. The Oxford Handbook of Language and Sexuality. New York: Oxford University
Press.
• Ueberwasser, Simone/Stark, Elisabeth (2017). What’s up, Switzerland? A corpus-based
12
research project in a multilingual country. Linguistik online 84/5, 105-126.Press 03/11/2018 13
The corpus
• Planned publication (open access): end of 2019
• 617 Chats
• 763,650 messages with text
• 5,543,692 tokens
• Texts from 945 participants, 426 with demographics
• Multilingual (German, French, Italian, Romansh)
Language Chats Messages Tokens
Non-dialectal German 93 81,456 625,419
Swiss German 275 506,984 3,611,033
French 141 197,255 1,397,375
Italian 87 42,559 293,567
Romansh 77 29,094 283,909
03/11/2018 14The corpus: What has been done up to now • Masking of non-consented messages • Anonymization • Mapping of emoji codes to Unicode and emojiQdescription • Tokenization • Removal of duplicate chats • Language identification for chats and messages • PoS annotations and normalization for the French corpus • Manual normalization of parts of the Swiss German corpus 03/11/2018 15
The corpus: Planned processing steps • PoS and noralization for the Italian data • Documentation 03/11/2018 16
Extended NLP (Natural Language Processing) on Swiss German
chats
• Team (collaboration with URPP “Language and Space”): Anne Göhring,
Tanja Samardžić, Massimo Lusetti, Tatyana Ruzsics
• Aim: Automated normalization of Swiss German messages
• Relevance beyond our project/added value:
• Some of the problems: Dialect and CMC, clitic forms, spelling variants:
03/11/2018 17Extended NLP on Swiss German chats
• Character-level statistical machine translation (CSMT) as a baseline:
– Ziit→Zeit (‘time’)
– wiiter→weiter (‘further’)
– Priis→Preis (‘price')
• Phase I (concluded): Neural encoder-decoder (ED) models (Lusetti, M., Ruzsics, T.,
Göhring, A., Samardžić, T., and Stark, E. (2018). Encoder-Decoder Methods for Text Normalization. In
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), 18–28.
Santa Fe, New Mexico, USA. )
• Phase II (ongoing): include Gold standard for PoS and look at the word's
environment (Planned publication for Cambridge Journal of Natural Language Engineering (NLE).)
• Accuracy scores: ~ 88%
03/11/2018 18You can also read