Le GdR LIFT organise un séminaire mensuel en ligne sur les interactions entre linguistiques formelles et computationnelles.
L’un des buts du séminaire est de réunir des membres de communautés scientifiques différentes tout autour du monde et de favoriser l’interfécondation des approches.
Le séminaire est entièrement gratuit et a lieu en ligne via la plateforme Zoom.
Pour assister au séminaire et recevoir les informations à propos des prochaines séances, veuillez vous inscrire à la liste de diffusion : [ici]
Prochaines sessions :
(Les horaires indiqués correspondent, suivant la période de l’année, à l’heure d’hiver (UTC+1) ou d’été (UTC+2) d’Europe centrale.)
- 2022/12/14 16h00-17h00 UTC+1 : Guy Emerson (University of Cambridge ; 15h00-16h00 UTC+0)
Titre : Learning meaning in a logically structured model: An introduction to Functional Distributional Semantics
Résumé: The aim of distributional semantics is to design computational techniques that can automatically learn the meanings of words based on the contexts in which they are observed. The mainstream approach is to represent meanings as vectors (such as Word2Vec embeddings, or contextualised BERT embeddings). However, vectors do not provide a natural way to talk about basic concepts in logic and formal semantics, such as truth and reference. While there have been many attempts to extend vector space models to support such concepts, there does not seem to be a clear solution. In this talk, I will instead go back to fundamentals, questioning whether we should represent meaning as a vector.
I will present the framework of Functional Distributional Semantics, which makes a clear distinction between words and the entities they refer to. The meaning of a word is represented as a binary classifier over entities, identifying whether the word could refer to the entity – in formal semantic terms, whether the word is true of the entity. The structure of the model provides a natural way to model logical inference, semantic composition, and context-dependent meanings, where Bayesian inference plays a crucial role. The same kind of model can also be applied to different kinds of data, including both grounded data such as labelled images (where entities are observed) and also text data (where entities are latent). I will discuss results on semantic evaluation datasets, indicating that the model can learn information not captured by vector space models like Word2Vec and BERT. I will conclude with an outlook for future work, including challenges and opportunities of joint learning from different data sources.
- 2023/01/18 17h00-18h00 UTC+1 : Carolyn Anderson (Wellesley College ; 11h00-12h00 UTC-5)
Titre : [TBA]
- 2023/02/15 : Steven T. Piantadosi (UC Berkeley)
Titre : [TBA]
Résumé : [TBA]
Sessions passées :
- 2022/11/16 17h00-18h00 UTC+1 : Allyson Ettinger (University of Chicago ; 10h00-11h00 UTC-6)
Titre : “Understanding” and prediction: Disentangling meaning extraction and predictive processes in humans and AI
Résumé: The interaction between “understanding” and prediction is a central theme both in psycholinguistics and in the AI domain of natural language processing (NLP). Evidence indicates that the human brain engages in predictive processing while extracting the meaning of language in real time, while NLP models use training based on prediction in context to learn strategies of language “understanding”. In this talk I will discuss work that tackles key problems both in linguistics and in NLP by exploring and teasing apart effects of compositional meaning extraction and effects of statistical-associative processes associated with prediction. I will begin with work that diagnoses the linguistic capabilities of NLP models, investigating the extent to which these models exhibit robust compositional meaning processing resembling that of humans, versus shallower heuristic sensitivities associated with predictive processes. I will show that with properly controlled tests, we identify important limitations in the capacities of current NLP models to handle compositional meaning as humans do. However, the models’ behaviors do show signs of aligning with statistical sensitivities associated with predictive mechanisms in human real-time processing. Leveraging this knowledge, I will then turn to work that directly models the mechanisms underlying human real-time language comprehension, with a focus on understanding how the robust compositional meaning extraction processes exhibited by humans interact with probabilistic predictive mechanisms. I will show that by combining psycholinguistic theory with targeted use of measures from NLP models, we can strengthen the explanatory power of psycholinguistic models and achieve nuanced accounts of interacting factors underlying a wide range of observed effects in human language processing.
- 2022/10/12 17h00-18h00 UTC+2 : Dan Lassiter (University of Edinburgh ; 16h00-17h00 UTC+1)
Titre : Modelling suppositional meaning in discourse
Résumé: English and many other languages show a variety of “suppositional devices” that are used to create temporary discourse contexts where a certain proposition is taken for granted. Most work on this topic has dealt with a single item, “if”, and assumes that the phenomenon is basically one of sentence-level semantics. In recent work I’ve argued that such theories miss a number of important generalizations that are better captured by treating the discourse effect of suppositions as primary and their sentence-level effects as parasitic on the pragmatics of assertion and the dependency of certain operators on local context. After reviewing these arguments, I’ll turn to a rather straightforward account that this approach suggests of so-called “modal subordination”, in which a temporary assumption survives over multiple utterances. A simple, context-free version of this theory is sufficient in many cases, but certain examples show crossing dependencies that require a non-context-free treatment. This is interesting, among other things, because Kogkalidis and Wijnholds (2022) have recently shown that BERT and other large language models have difficulty learning crossing grammatical dependencies in Dutch. Similar dependencies at the discourse level may be even more difficult to acquire, since the cues that humans use to resolve them are typically not explicitly represented in written text. I suggest that learning crossing discourse dependencies will be a major practical challenge for those who seek to engineer robust natural language understanding systems using written texts as the primary data source.
- 2022/09/14 17h00-18h00 UTC+2 : Ellie Pavlick (Brown University & Google ; 11h00-12h00 UTC-4)
Titre : Implementing Symbols and Rules with Neural Networks
Résumé : Many aspects of human language and reasoning are well explained in terms of symbols and rules. However, state-of-the-art computational models are based on large neural networks which lack explicit symbolic representations of the type frequently used in cognitive theories. One response has been the development of neuro-symbolic models which introduce explicit representations of symbols into neural network architectures or loss functions. In terms of Marr’s levels of analysis, such approaches achieve symbolic reasoning at the computational level (“what the system does and why”) by introducing symbols and rules at the implementation and algorithmic levels. In this talk, I will consider an alternative: can neural networks (without any explicit symbolic components) nonetheless implement symbolic reasoning at the computational level? I will describe several diagnostic tests of “symbolic” and “rule-governed” behavior and use these tests to analyze neural models of visual and language processing. Our results show that on many counts, neural models appear to encode symbol-like concepts (e.g., conceptual representations that are abstract, systematic, and modular), but not perfectly so. Analysis of the failure cases reveals that future work is needed on methodological tools for analyzing neural networks, as well as refinement of models of hybrid neuro-symbolic reasoning in humans, in order to determine whether neural networks’ deviations from the symbolic paradigm are a feature or a bug.
- 2022/06/14 17h00-18h00 UTC+2 : Gene Louis Kim (University of South Florida ; 11h00-12h00 UTC-4)
Titre : Corpus Annotation, Parsing, and Inference for Episodic Logic Type Structure
Résumé : A growing interest in moving beyond lesser goals in the NLP community and moving to language understanding has led to the search for a semantic representation which fulfills its nuanced modeling and inferential needs. In this talk, I discuss the design and use of Unscoped Logical Forms (ULFs) of Episodic Logic for the goal of building a system that can understand human language. ULF is designed to balance the needs of semantic expressivity, ease of annotation for training corpus creation, derivability from English, and support of inference. I show that by leveraging the systematic syntactic and semantic underpinnings of ULFs we can outperform existing semantic parsers and overcome the limitations of modern data-hungry techniques on a more modestly-sized dataset. I then describe our experiments showing how ULFs enable us to generate certain important classes of discourse inferences and “natural logic” inferences. I conclude by sketching the current wider use of ULFs in dialogue management and schema learning. Time permitting, I will discuss promising early results of augmenting the manually-annotated ULF dataset with formulas sampled from the underlying ULF type system for improving the trained ULF parser.
- 2022/05/17 17h00-18h00 UTC+2 : Roger Levy (Massachusetts Institute of Technology ; 11h00-12h00 UTC-4)
Titre : The acquisition and processing of grammatical structure: insights from deep learning
Résumé : Psycholinguistics and computational linguistics are the two fields most dedicated to accounting for the computational operations required to understand natural language. Today, both fields find themselves responsible for understanding the behaviors and inductive biases of “black-box” systems: the human mind and artificial neural-network language models (NLMs), respectively. Contemporary NLMs can be trained on a human lifetime’s worth of text or more, and generate text of apparently remarkable grammaticality and fluency. Here, we use NLMs to address questions of learnability and processing of natural language syntax. By testing NLMs trained on naturalistic corpora as if they were subjects in a psycholinguistics experiment, we show that they exhibit a range of subtle behaviors, including embedding-depth tracking and garden-pathing over long stretches of text, suggesting representations homologous to incremental syntactic state in human language processing. Strikingly, these NLMs also learn many generalizations about the long-distance filler-gap dependencies that are a hallmark of natural language syntax, perhaps most surprisingly many “island” constraints. I conclude with comments on the long-standing idea of whether the departures of NLMs from the predictions of the “competence” grammars developed in generative linguistics might provide a “performance” account of human language processing: by and large, they don’t.
- 2022/04/12 15h00-16h00 UTC+2 : Noortje Venhuizen (Saarland University)
Titre : Distributional Formal Semantics
Résumé : Formal Semantics and Distributional Semantics offer complementary strengths in capturing the meaning of natural language. As such, a considerable amount of research has sought to unify them, either by augmenting formal semantic systems with a distributional component, or by defining a formal system on top of distributed representations. Arriving at such a unified formalism has, however, proven extremely challenging. One reason for this is that formal and distributional semantics operate on a fundamentally different ‘representational currency’: formal semantics defines meaning in terms of models of the world, whereas distributional semantics defines meaning in terms of linguistic context. An alternative approach from cognitive science, however, proposes a vector space model that defines meaning in a distributed manner relative to the state of the world. This talk presents a re-conceptualisation of this approach based on well-known principles from formal semantics, thereby demonstrating its full logical capacity. The resulting Distributional Formal Semantics is shown to offer the best of both worlds: contextualised distributed representations that are also inherently compositional and probabilistic. The application of the representations is illustrated using a neural network model that captures various semantic phenomena, including probabilistic inference and entailment, negation, quantification, reference resolution and presupposition.
- 2022/03/15 17h00-18h00 UTC+1 : Mark Steedman (University of Edinburgh)
Titre : Projecting Dependency: CCG and Minimalism
Résumé : Since the publication of “Bare Phrase Structure” it has been clear that Chomskyan Minimalism can be thought of as a form of Categorial Grammar, distinguished by the addition of movement rules to handle “displacement” or non-local dependency in surface forms. More specifically, the Minimalist Principle of Inclusiveness can be interpreted as requiring that all language-specific details of combinatory potential, such as category, subcategorization, agreement, and the like, must be specified at the level of the lexicon, and must be either “checked” or “projected” unchanged by language-independent universal rules onto the constituents of the syntactic derivation, which can add no information such as “indices, traces, syntactic categories or bar-levels and so on” that has not already been specified in the lexicon.
The place of rules of movement in such a system is somewhat unclear. While sometimes referred to as an “internal” form of MERGE, defined in terms of “copies” that are sometimes thought of as identical, it still seems to involve “action at a distance” over a structure. Yet Inclusiveness seems to require that copies are already specified as such in the lexicon.
Combinatory Categorial Grammar (CCG) insists under a Principle of Adjacency that all rules of syntactic combination are local, applying to contiguous syntactically-typed constituents, where the type-system in question crucially includes second-order functions, whose arguments are themselves functions. The consequence is that iterated contiguous combinatory reductions can in syntactic and semantic lock-step project the lexical local binding by a verb of a complement such as an object NP from the lexicon onto an unbounded dependency, which can be satisfied by reduction with a relative pronoun or right-node raising, as well as by an in situ NP. A number of surface-discontinuous constructions, including raising, “there”-insertion, scrambling, non-constituent coordination, and “wh”-extraction can thereby be handled without any involvement of non-locality in syntactic rules, such as movement or deletion, in a theory that is “pure derivational”. One you have Inclusiveness, Contiguity is all you need.
- 2022/02/15 17h00-18h00 UTC+1 : Najoung Kim (New York University ; 11h00-12h00 UTC-5)
Titre : Compositional Linguistic Generalization in Artificial Neural Networks
Résumé : Compositionality is considered a central property of human language. One key benefit of compositionality is the generalization it enables—the production and comprehension of novel expressions analyzed as new compositions of familiar parts. I construct a test for compositional generalization for artificial neural networks based on human generalization patterns discussed in existing linguistic and developmental studies, and test several instantiations of Transformer (Vaswani et al. 2017) and Long Short-Term Memory (Hochreiter & Schmidhuber 1997) models. The models evaluated exhibit only limited degrees of compositional generalization, implying that their learning biases for induction to fill gaps in the training data differ from those of human learners. An error analysis reveals that all models tested lack bias towards faithfulness (à la Prince & Smolensky 1993/2002). Adding a glossing task (word-by-word translation), a task that requires maximally faithful input-output mappings, as an auxiliary training objective to the Transformer model substantially improves generalization, showing that the auxiliary training successfully modified the model’s inductive bias. However, the improvement is limited to generalization to novel compositions of known lexical items and known structures; all models still struggled with generalization to novel structures, regardless of auxiliary training. The challenge of structural generalization leaves open exciting avenues for future research for both human and machine learners.
- 2022/01/18 17h00-18h00 UTC+1 : Johan Bos (University of Groningen)
Titre : Variable-free Meaning Representations
Résumé : Most formal meaning representations use variables to represent entities and relations between them. But variables can be bothersome for people annotating texts with meanings, and for algorithms that work with meanings representations, in particular the recent machine learning methods based on neural network technology.
Hence the question that I am interested in is: can we replace the currently popular meaning representations with representations that do not use variables, without giving up any expressive power? My starting point are the representations of Discourse Representation Theory. I will show that these can be replaced by a simple language based on indices instead of variables, assuming a neo-Davidsonian event semantics.
The resulting formalism has several interesting consequences. Apart from being beneficial to human annotators and machine learning algorithms, it also offers straightforward visualisation possibilities and potential for modelling information packaging.
- 2021/12/14 17h00-18h00 UTC+1 : Lisa Bylinina (Bookarang, Netherlands)
Titre : Polarity in multilingual language models
Résumé : The space of natural languages is constrained by various interactions between linguistic phenomena. In this talk, I will focus on one particular type of such interaction, in which logical properties of a context constrain the distribution of negative polarity items (NPIs), like English ‘any’. Correlational — and possibly, causal — interaction between logical monotonicity and NPI distribution has been observed for some NPIs in some languages for some contexts, with the help of theoretical, psycholinguistic and computational tools. How general is this relation across languages? How inferable is it from just textual data? What kind of generalization — if any — about NPI distribution would a massively multilingual speaker form, and what kind of causal structure would guide such speaker’s intuition? Humans speaking 100+ languages natively are hard to find — but we do have multilingual language models. I will report experiments in which we study NPIs in four languages (English, French, Russian and Turkish) in two pre-trained models — multilingual BERT and XLM-RoBERTa. We evaluate the models’ recognition of polarity-sensitivity and its cross-lingual generality. Further, using the artificial language learning paradigm, we look for the connection between semantic profiles of tokens and their ability to license NPIs. We find partial evidence for such connection.
Collaboration avec Alexey Tikhonov (Yandex).
- 2021/11/16 17h00-18h00 UTC+1 : Alex Lascarides (University of Edinburgh ; 16h00-17h00 UTC+0)
Titre : Situated Communication
Résumé : This talk focuses on how to represent and reason about the content of conversation when it takes place in an embodied, dynamic environment. I will argue that speakers can, and do, appropriate non-linguistic events into their communicative intents, even when those events weren’t produced with the intention of being a part of a discourse. Indeed, non-linguistic events can contribute an (instance of) a proposition to the content of the speaker’s message, even when her verbal signal contains no demonstratives or anaphora of any kind.
I will argue that representing and reasoning about discourse coherence is essential to capturing these features of situated conversation. I will make two claims: first, non-linguistic events affect rhetorical structure in non-trivial ways; and secondly, rhetorical structure guides the conceptualisation of non-linguistic events. I will support the first claim via empirical observations from the STAC corpus (www.irit.fr/STAC/corpus.html)—a corpus of dialogues that take place between players during the board game Settlers of Catan. I will support the second claim via experiments in Interactive Task Learning: a software agent jointly learns how to conceptualise the domain, ground previously unknown words in the embodied environment, and solve its planning problem, by using the evidence of an expert’s corrective (verbal) feedback on its physical actions.
- 2021/10/12 17h00-18h00 UTC+2 : Christopher Potts (Stanford University ; 8h00-9h00 UTC-7)
Titre : Causal Abstractions of Neural Natural Language Inference Models
Résumé : Neural networks have a reputation for being « black boxes » — complex, opaque systems that can be studied using only purely behavioral evaluations. However, much recent work on structural analysis methods (e.g., probing and feature attribution) is allowing us to peer inside these models and deeply understand their internal dynamics. In this talk, I’ll describe a new structural analysis method we’ve developed that is grounded in a formal theory of causal abstraction. In this method, neural representations are aligned with variables in interpretable causal models, and then *interchange interventions* are used to experimentally verify that the neural representations have the causal properties of their aligned variables. I’ll use these methods to explore problems in Natural Language Inference, focusing in particular on compositional interactions between lexical entailment and negation. Recent Transformer-based models can solve hard generalization tasks involving these phenomena, and our causal analysis method helps explain why: the models have learned modular representations that closely approximate the high-level compositional theory. Finally, I will show how to bring interchange interventions into the training process, which allows us to push our models to acquire desired modular internal structures like this.
Collaboration avec Atticus Geiger, Hanson Lu, Noah Goodman et Thomas Icard.
- 2021/06/01 10h30-18h30 UTC+2 : journée avec 6 intervenant·e·s : Juan Luis Gastaldi (ETH Zürich), Koji Mineshima (Keio University), Maud Pironneau (Druide informatique), Marie-Catherine de Marneffe (Ohio State University), Jacob Andreas (MIT) et Olga Zamaraeva (University of Washington).
Contact : Timothée BERNARD (firstname.lastname@example.org) et Grégoire WINTERSTEIN (email@example.com)