Le GdR LIFT organise le 1er juin 2021 une journée de séminaire en ligne sur les interactions entre linguistiques formelles et computationnelles.
Le séminaire se penchera en particulier sur la place des méthodes symboliques dans les systèmes actuels de traitement automatique des langues et sur l’apport des méthodes computationnelles à la linguistique théorique.
Cette journée a pour but de réunir des membres de communautés scientifiques différentes tout autour du monde et de favoriser l’interfécondation des approches.
Le séminaire sera entièrement gratuit et aura lieu en ligne via les plateformes Zoom et Gather.Town.
Nous vous invitons à vous inscrire le plus tôt possible avec le formulaire suivant, qui nous permettra de vous transmettre les informations de connexion : [ici]
Les horaires sont indiqués suivant l’heure d’été d’Europe centrale (UTC+2).
- 10h30-11h30 Juan Luis Gastaldi, ETH Zürich : [tba]
- 11h30-12h30 Koji Mineshima, Keio University : [tba]
- 12h30-14h00 Pause déjeuner et rencontre Gather.Town
- 14h00-15h00 Maud Pironneau, Druide informatique : « Once Upon a Time, Linguists, Computer Scientists and Disruptive Technologies » [résumé]
- 15h00-16h00 Marie-Catherine de Marneffe, Ohio State University : [tba]
- 16h00-16h30 Rencontre Gather.Town
- 16h30-17h30 Jacob Andreas, MIT : « Language models as world models » [résumé]
- 17h30-18h30 Olga Zamaraeva, University of Washington : « Assembling Syntax: Modeling wh-Questions in a Grammar Engineering Framework » [résumé]
- 18h30-19h30 Rencontre Gather.Town
- Jacob Andreas (MIT, MA, É.-U.)
Titre : Language models as world models
Résumé : Neural language models, which place probability distributions over sequences of words, produce vector representations of words and sentences that are useful for language processing tasks as diverse as machine translation, question answering, and image captioning. These models’ usefulness is partially explained by the fact that their representations robustly encode lexical and syntactic information. But the extent to which language model training also induces representations of *meaning* remains a topic of ongoing debate. I will describe recent work showing that language models—trained on text alone, without any kind of grounded supervision—build structured meaning representations that are used to simulate entities and situations as they evolve over the course of a discourse. These representations can be linearly decoded into logical representations of world state (e.g. discourse representation structures). They can also be directly manipulated to produce predictable changes in generated output. Together, these results suggest that (some) highly structured aspects of meaning can be recovered by relatively unstructured models trained on corpus data.
- Juan Luis Gastaldi (ETH, Zürich, Suisse)
Titre : [TBA]
Résumé : [TBA]
- Marie-Catherine de Marneffe (Ohio State University, OH, É.-U.)
Titre : [TBA]
Résumé : [TBA]
- Koji Mineshima (Keio University, Tokyo, Japon)
Titre : [TBA]
Résumé : [TBA]
- Maud Pironneau (Druide informatique, Québec, Canada)
Titre : Once Upon a Time, Linguists, Computer Scientists and Disruptive Technologies
Résumé : At Druide informatique, we have been devising writing assistance software for over 25 years. We create writing text correctors, dictionaries, and guides, for everyone and every type of written document, available first in French, and more recently in English. As of 2021, more than 1 million people use Antidote, our flagship product. Consequently, we possess extensive experience in language technologies and we know how to make linguists and computer scientists work together. This knowledge can be seen as both historical and paradigm-shifting: historical in that Antidote for French was created back in 1993, at that time using symbolic rules; paradigm-shifting through the use of disruptive technologies and applications for different languages. Add to this complexity constant societal evolution, a dash of language politics, rational or not, and an inherent linguistic conservatism: now you have a portrait of the important themes in our work. This presentation will expose our successes as well as our failures across this field of possibilities.
- Olga Zamaraeva (University of Washington, WA, É.-U.)
Titre : Assembling Syntax: Modeling wh-Questions in a Grammar Engineering Framework
Résumé : Studying syntactic structure is one of the ways to learn about the range variation in human languages. But without computational aid, assembling the complex and fragmented hypotheses about different syntactic phenomena quickly becomes intractable. Fully explicit formalisms like HPSG allow us to encode our hypotheses about syntax and associated compositional semantics on the computer. We can then test these hypotheses rigorously, showing a clear area of their applicability, which can grow over time. In this talk, I will present my recent work on modeling the syntactic structure of constituent (wh-)questions for an HPSG-based grammar engineering framework called the Grammar Matrix. The Matrix includes implemented syntactic analyses which are automatically tested as a system on test suites from diverse languages. The framework helps speed up grammar development and is intended to make implemented grammar artifacts possible for many languages of the world, particularly for endangered languages. In computational linguistics, formalized syntactic representations produced by such grammars play a crucial role in creating annotations which are then used for evaluating NLP system performance. The grammars were also shown to be useful in applications such as grammar coaching, and advancing this line of research can contribute to educational and revitalization efforts.