GdR LIFT is organising a monthly online seminar on the interactions between formal and computational linguistics.
The seminar is intended to make members of diverse scientific communities around the world meet and share their different perspectives.
It is free to attend the seminar and it is held on Zoom.
To attend the seminar and get updates, please register to be on our mailing list: [here]
(The times are, depending on the time of the year, given in the Central European Winter (UTC+1) or Summer (UTC+2) Time zone.)
- 2023/10/18 17:00-18:00 UTC+2: Alexander Koller (Saarland University)
- 2023/11/15 17:00-18:00 UTC+1: Raquel Fernández (University of Amsterdam)
- 2023/12/13 17:00-18:00 UTC+1: Richard Futrell (UC Irvine; 8:00-9:00 UTC-8)
- 2023/09/13 17:00-18:00 UTC+2: Anna Ivanova (Massachusetts Institute of Technology; 11:00-12:00 UTC-4)
Title: Dissociating formal and functional linguistic competence in large language models
Abstract: Today’s large language models (LLMs) routinely generate coherent, grammatical and seemingly meaningful paragraphs of text. This achievement has led to speculation that LLMs have become “thinking machines”, capable of performing tasks that require reasoning and/or world knowledge. In this talk, I will introduce a distinction between formal competence—knowledge of linguistic rules and patterns—and functional competence—understanding and using language in the world. This distinction is grounded in human neuroscience, which shows that formal and functional competence recruit different cognitive mechanisms. I will show that the word-in-context prediction objective has allowed LLMs to essentially master formal linguistic competence; however, LLMs still lag behind at many aspects of functional linguistic competence, and improvements in this domain often depend on specialized fine-tuning or coupling with an external module. In the last part of the talk, I will present a case study highlighting the difficulties of disentangling formal and functional competence when it comes to evaluating world knowledge, and show that similar difficulties are present in neuroscience research. I will conclude by discussing the value of the formal/functional competence framework for evaluating and building flexible, humanlike models of language use.
- 2023/06/14 17:00-18:00 UTC+2: Alex Warstadt (ETH Zürich)
Title: Language Models and Human Language Acquisition
Abstract: Children’s remarkable ability to learn language has been an object of fascination in science for millennia. In just the last few years, neural language models (LMs) have also proven to be incredibly adept at learning human language. In this talk, I discuss scientific progress that uses recent developments in natural language processing to advance linguistics—and vice-versa. My research explores this intersection from three angles: evaluation, experimentation, and engineering. Using linguistically motivated benchmarks, I provide evidence that LMs share many aspects of human grammatical knowledge and probe how this knowledge varies across training regimes. I further argue that—under the right circumstances—we can use LMs to test key hypotheses about language acquisition that have been difficult or impossible to evaluate with human subjects. As a proof of concept, I use LMs to experimentally test the long controversial claim that direct disambiguating evidence is necessary to acquire the structure dependent rule of subject-auxiliary inversion in English. Finally, I describe ongoing work to engineer learning environments and objectives for LM pretraining inspired by human development, with the goal of making LMs more data efficient and more plausible models of human learning.
- 2023/05/17 17:00-18:00 UTC+2: Laura Kallmeyer (Heinrich-Heine Universität Düsseldorf)
Title: Probing large language models for syntactic structure: Creating nonce treebanks in order to separate syntax from semantics
Abstract: The question we address in this work is to which extent large language models (LMs) learn syntactic structure. One difficulty when probing LMs for syntax is to ensure whether the syntactic findings are purely due to syntax or are the result of semantic knowledge learned in the model. In order to avoid this potential problem, we create syntactically well-formed but semantically nonce treebanks by, starting from an existing dependency treebank, replacing a certain ratio of words with words that can appear in the same syntactic contexts. In Arps et al. (2022), we use the Penn Treebank and semantically perturbed versions of it and we train a linear probe using our treebanks that constructs entire constituency parse trees based on a sequence labeling scheme. We find that even on semantically perturbed data, the constituency tree can be reconstructed by the probe with an F1 score of 72.8 in the lowest nonce setting. In more recent work, we extend this line of research to other languages, creating nonce treebanks also for Arabic, Chinese, French, German and Russian. We then apply a structural probing method for labeled dependency trees (Müller-Eberstein et al., 2022) to the nonce treebanks. The results show that in this setting, probes trained on the original treebanks are able to predict the syntactic structure of nonce test sets, with good performance but an increase in error rate. Probes trained on nonce treebanks perform on par with standard probes when evaluated on the original treebanks in monolingual and crosslingual settings. These results and the ones from Arps et al. (2022) indicate that the syntactic structure of nonce sentences is encoded in the language model.
- David Arps, Younes Samih, Laura Kallmeyer, and Hassan Sajjad. 2022. Probing for constituency structure in neural language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6738–6757, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Max Müller-Eberstein, Rob van der Goot, and Barbara Plank. 2022. Probing for labeled dependency trees. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 277 1: Long Papers), pages 7711–7726, Dublin, Ireland. Association for Computational Linguistics.
- 2023/04/12 17:00-18:00 UTC+2: Timothy J. O’Donnell (McGill University; 11:00-12:00 UTC-4)
Title: Linguistic Productivity, Compositionality, and Incremental Processing
Abstract: In this talk, I will present several recent projects focusing on three key properties of natural language. First, I will discuss several projects focused on productivity: the ability to produce and comprehend new expressions. I will review work using program synthesis techniques to understand learning in “small data” domains as well as some recent results examining the productivity of language models. Finally, I will turn to the problem of incremental processing, presenting modeling and empirical work on the nature of the algorithms that underlie the human sentence processor.
- 2023/03/22 17:00-18:00 UTC+1: Katrin Erk (University of Texas at Austin; 11:00-12:00 UTC-5)
Title: The habitual listener: a usage-based view of lexical meaning, and a matching computational model of utterance understanding
Abstract: We can think of word embeddings as a condensed record of utterances of many speakers. If we do that, then by inspecting word embeddings we can get a sense of a usage-based notion of word meaning: What patterns, what influences on word meaning in context get picked up in this condensed record? I will argue that what we see in word embeddings is a mixture of many things, including what is traditionally called word senses, but also cultural and emotional influences, basically a record of the stories that people habitually tell with words. The theory that comes closest to what we see in word embeddings is frame semantics, which assumes that words evoke rich, complex chunks of background knowledge.
If word meaning contains traces of world knowledge, then this best matches a Contextualist view of utterance understanding, where understanding what is said already involves *some* pragmatics. I will sketch a model of utterance understanding that we have been developing, Situation Description Systems. In Situation Description Systems, word meaning in context is influenced, among other things, by the overall “story” or scenario that the sentence alludes to. This model has been implemented at scale using corpus-derived representations of word meaning.
The picture that emerges about utterance understanding is then not a divide between literal meaning and pragmatic inference, but between habitual, conventionalized lexical-pragmatic knowledge on the one hand, and on the other hand explicit reasoning over what is implicated.
- 2023/02/15 17:00-18:00 UTC+1: Steven T. Piantadosi (UC Berkeley; 08:00-09:00 UTC-8)
Title: One model for the learning of language
Abstract: A major target of linguistics and cognitive science is to understand what class of learning systems can acquire the key structures of natural language. Until recently, the computational requirements of language have been used to argue that learning is impossible without a highly constrained hypothesis space. Here, we describe a learning system that is maximally unconstrained, operating over the space of all computations, and is able to acquire several of the key structures present natural language from positive evidence alone. We demonstrate this by providing the same learning model with data from 70 distinct formal languages which have been argued to capture key features of language, have been studied in experimental work, or come from an interesting complexity class. The model is able to successfully induce the latent system generating the observed strings from positive evidence in almost all cases, including regular, context-free, and context-sensitive formal languages, as well as languages studied in artificial language learning experiments. These results show that relatively small amounts of positive evidence can support learning of rich classes of generative computations over structures.
- 2023/01/18 17:00-18:00 UTC+1: Carolyn Jane Anderson (Wellesley College; 11:00-12:00 UTC-5)
Title: A Bayesian Approach to Modeling Grammatical Perspective-Taking
Abstract: Many components of language involve perspective: deictic expressions, predicates of personal taste, and epithets, among others. Perspective is situation-dependent, dynamic, and grounded in the physical world, making it a particularly challenging domain to model. For instance, the perspectival motion verb ‘come’ requires a perspective-holder to be located at the destination of motion. When a listener hears a sentence like ‘Thelma is coming to the zoo’, how do they decide whose perspective the speaker is using?
In this talk, I will propose a computational model of the reasoning process that conversation participants use to select and interpret perspectival expressions. I model the listener’s interpretative process as a Bayesian inference process: listeners reason jointly about the speaker’s intended meaning and their adopted perspective using a mental model of how the speaker selected the utterance. I generate predictions from simulations run in the WebPPL probabilistic programming language and provide empirical evidence from crowdsourced production and comprehension experiments in support of a key prediction of the model: that listeners simultaneously consider multiple perspectives.
- 2022/12/14 16:00-17:00 UTC+1: Guy Emerson (University of Cambridge; 15:00-16:00 UTC+0)
Title: Learning meaning in a logically structured model: An introduction to Functional Distributional Semantics
Abstract: The aim of distributional semantics is to design computational techniques that can automatically learn the meanings of words based on the contexts in which they are observed. The mainstream approach is to represent meanings as vectors (such as Word2Vec embeddings, or contextualised BERT embeddings). However, vectors do not provide a natural way to talk about basic concepts in logic and formal semantics, such as truth and reference. While there have been many attempts to extend vector space models to support such concepts, there does not seem to be a clear solution. In this talk, I will instead go back to fundamentals, questioning whether we should represent meaning as a vector.
I will present the framework of Functional Distributional Semantics, which makes a clear distinction between words and the entities they refer to. The meaning of a word is represented as a binary classifier over entities, identifying whether the word could refer to the entity – in formal semantic terms, whether the word is true of the entity. The structure of the model provides a natural way to model logical inference, semantic composition, and context-dependent meanings, where Bayesian inference plays a crucial role. The same kind of model can also be applied to different kinds of data, including both grounded data such as labelled images (where entities are observed) and also text data (where entities are latent). I will discuss results on semantic evaluation datasets, indicating that the model can learn information not captured by vector space models like Word2Vec and BERT. I will conclude with an outlook for future work, including challenges and opportunities of joint learning from different data sources.
- 2022/11/16 17:00-18:00 UTC+1: Allyson Ettinger (University of Chicago; 10:00-11:00 UTC-6)
Title: “Understanding” and prediction: Disentangling meaning extraction and predictive processes in humans and AI
Abstract: The interaction between “understanding” and prediction is a central theme both in psycholinguistics and in the AI domain of natural language processing (NLP). Evidence indicates that the human brain engages in predictive processing while extracting the meaning of language in real time, while NLP models use training based on prediction in context to learn strategies of language “understanding”. In this talk I will discuss work that tackles key problems both in linguistics and in NLP by exploring and teasing apart effects of compositional meaning extraction and effects of statistical-associative processes associated with prediction. I will begin with work that diagnoses the linguistic capabilities of NLP models, investigating the extent to which these models exhibit robust compositional meaning processing resembling that of humans, versus shallower heuristic sensitivities associated with predictive processes. I will show that with properly controlled tests, we identify important limitations in the capacities of current NLP models to handle compositional meaning as humans do. However, the models’ behaviors do show signs of aligning with statistical sensitivities associated with predictive mechanisms in human real-time processing. Leveraging this knowledge, I will then turn to work that directly models the mechanisms underlying human real-time language comprehension, with a focus on understanding how the robust compositional meaning extraction processes exhibited by humans interact with probabilistic predictive mechanisms. I will show that by combining psycholinguistic theory with targeted use of measures from NLP models, we can strengthen the explanatory power of psycholinguistic models and achieve nuanced accounts of interacting factors underlying a wide range of observed effects in human language processing.
- 2022/10/12 17:00-18:00 UTC+2: Dan Lassiter (University of Edinburgh; 16:00-17:00 UTC+1)
Title: Modelling suppositional meaning in discourse
Abstract: English and many other languages show a variety of “suppositional devices” that are used to create temporary discourse contexts where a certain proposition is taken for granted. Most work on this topic has dealt with a single item, “if”, and assumes that the phenomenon is basically one of sentence-level semantics. In recent work I’ve argued that such theories miss a number of important generalizations that are better captured by treating the discourse effect of suppositions as primary and their sentence-level effects as parasitic on the pragmatics of assertion and the dependency of certain operators on local context. After reviewing these arguments, I’ll turn to a rather straightforward account that this approach suggests of so-called “modal subordination”, in which a temporary assumption survives over multiple utterances. A simple, context-free version of this theory is sufficient in many cases, but certain examples show crossing dependencies that require a non-context-free treatment. This is interesting, among other things, because Kogkalidis and Wijnholds (2022) have recently shown that BERT and other large language models have difficulty learning crossing grammatical dependencies in Dutch. Similar dependencies at the discourse level may be even more difficult to acquire, since the cues that humans use to resolve them are typically not explicitly represented in written text. I suggest that learning crossing discourse dependencies will be a major practical challenge for those who seek to engineer robust natural language understanding systems using written texts as the primary data source.
- 2022/09/14 17:00-18:00 UTC+2: Ellie Pavlick (Brown University & Google; 11:00-12:00 UTC-4)
Title: Implementing Symbols and Rules with Neural Networks
Abstract: Many aspects of human language and reasoning are well explained in terms of symbols and rules. However, state-of-the-art computational models are based on large neural networks which lack explicit symbolic representations of the type frequently used in cognitive theories. One response has been the development of neuro-symbolic models which introduce explicit representations of symbols into neural network architectures or loss functions. In terms of Marr’s levels of analysis, such approaches achieve symbolic reasoning at the computational level (“what the system does and why”) by introducing symbols and rules at the implementation and algorithmic levels. In this talk, I will consider an alternative: can neural networks (without any explicit symbolic components) nonetheless implement symbolic reasoning at the computational level? I will describe several diagnostic tests of “symbolic” and “rule-governed” behavior and use these tests to analyze neural models of visual and language processing. Our results show that on many counts, neural models appear to encode symbol-like concepts (e.g., conceptual representations that are abstract, systematic, and modular), but not perfectly so. Analysis of the failure cases reveals that future work is needed on methodological tools for analyzing neural networks, as well as refinement of models of hybrid neuro-symbolic reasoning in humans, in order to determine whether neural networks’ deviations from the symbolic paradigm are a feature or a bug.
- 2022/06/14 17:00-18:00 UTC+2: Gene Louis Kim (University of South Florida; 11:00-12:00 UTC-4)
Title: Corpus Annotation, Parsing, and Inference for Episodic Logic Type Structure
Abstract: A growing interest in moving beyond lesser goals in the NLP community and moving to language understanding has led to the search for a semantic representation which fulfills its nuanced modeling and inferential needs. In this talk, I discuss the design and use of Unscoped Logical Forms (ULFs) of Episodic Logic for the goal of building a system that can understand human language. ULF is designed to balance the needs of semantic expressivity, ease of annotation for training corpus creation, derivability from English, and support of inference. I show that by leveraging the systematic syntactic and semantic underpinnings of ULFs we can outperform existing semantic parsers and overcome the limitations of modern data-hungry techniques on a more modestly-sized dataset. I then describe our experiments showing how ULFs enable us to generate certain important classes of discourse inferences and “natural logic” inferences. I conclude by sketching the current wider use of ULFs in dialogue management and schema learning. Time permitting, I will discuss promising early results of augmenting the manually-annotated ULF dataset with formulas sampled from the underlying ULF type system for improving the trained ULF parser.
- 2022/05/17 17:00-18:00 UTC+2: Roger Levy (Massachusetts Institute of Technology; 11:00-12:00 UTC-4)
Title: The acquisition and processing of grammatical structure: insights from deep learning
Abstract: Psycholinguistics and computational linguistics are the two fields most dedicated to accounting for the computational operations required to understand natural language. Today, both fields find themselves responsible for understanding the behaviors and inductive biases of “black-box” systems: the human mind and artificial neural-network language models (NLMs), respectively. Contemporary NLMs can be trained on a human lifetime’s worth of text or more, and generate text of apparently remarkable grammaticality and fluency. Here, we use NLMs to address questions of learnability and processing of natural language syntax. By testing NLMs trained on naturalistic corpora as if they were subjects in a psycholinguistics experiment, we show that they exhibit a range of subtle behaviors, including embedding-depth tracking and garden-pathing over long stretches of text, suggesting representations homologous to incremental syntactic state in human language processing. Strikingly, these NLMs also learn many generalizations about the long-distance filler-gap dependencies that are a hallmark of natural language syntax, perhaps most surprisingly many “island” constraints. I conclude with comments on the long-standing idea of whether the departures of NLMs from the predictions of the “competence” grammars developed in generative linguistics might provide a “performance” account of human language processing: by and large, they don’t.
- 2022/04/12 15:00-16:00 UTC+2: Noortje Venhuizen (Saarland University)
Title: Distributional Formal Semantics
Abstract: Formal Semantics and Distributional Semantics offer complementary strengths in capturing the meaning of natural language. As such, a considerable amount of research has sought to unify them, either by augmenting formal semantic systems with a distributional component, or by defining a formal system on top of distributed representations. Arriving at such a unified formalism has, however, proven extremely challenging. One reason for this is that formal and distributional semantics operate on a fundamentally different ‘representational currency’: formal semantics defines meaning in terms of models of the world, whereas distributional semantics defines meaning in terms of linguistic context. An alternative approach from cognitive science, however, proposes a vector space model that defines meaning in a distributed manner relative to the state of the world. This talk presents a re-conceptualisation of this approach based on well-known principles from formal semantics, thereby demonstrating its full logical capacity. The resulting Distributional Formal Semantics is shown to offer the best of both worlds: contextualised distributed representations that are also inherently compositional and probabilistic. The application of the representations is illustrated using a neural network model that captures various semantic phenomena, including probabilistic inference and entailment, negation, quantification, reference resolution and presupposition.
- 2022/03/15 17:00-18:00 UTC+1: Mark Steedman (University of Edinburgh; 16:00-17:00 UTC+0)
Title: Projecting Dependency: CCG and Minimalism
Abstract: Since the publication of “Bare Phrase Structure” it has been clear that Chomskyan Minimalism can be thought of as a form of Categorial Grammar, distinguished by the addition of movement rules to handle “displacement” or non-local dependency in surface forms. More specifically, the Minimalist Principle of Inclusiveness can be interpreted as requiring that all language-specific details of combinatory potential, such as category, subcategorization, agreement, and the like, must be specified at the level of the lexicon, and must be either “checked” or “projected” unchanged by language-independent universal rules onto the constituents of the syntactic derivation, which can add no information such as “indices, traces, syntactic categories or bar-levels and so on” that has not already been specified in the lexicon.
The place of rules of movement in such a system is somewhat unclear. While sometimes referred to as an “internal” form of MERGE, defined in terms of “copies” that are sometimes thought of as identical, it still seems to involve “action at a distance” over a structure. Yet Inclusiveness seems to require that copies are already specified as such in the lexicon.
Combinatory Categorial Grammar (CCG) insists under a Principle of Adjacency that all rules of syntactic combination are local, applying to contiguous syntactically-typed constituents, where the type-system in question crucially includes second-order functions, whose arguments are themselves functions. The consequence is that iterated contiguous combinatory reductions can in syntactic and semantic lock-step project the lexical local binding by a verb of a complement such as an object NP from the lexicon onto an unbounded dependency, which can be satisfied by reduction with a relative pronoun or right-node raising, as well as by an in situ NP. A number of surface-discontinuous constructions, including raising, “there”-insertion, scrambling, non-constituent coordination, and “wh”-extraction can thereby be handled without any involvement of non-locality in syntactic rules, such as movement or deletion, in a theory that is “pure derivational”. One you have Inclusiveness, Contiguity is all you need.
- 2022/02/15 17:00-18:00 UTC+1: Najoung Kim (New York University; 11:00-12:00 UTC-5)
Title: Compositional Linguistic Generalization in Artificial Neural Networks
Abstract: Compositionality is considered a central property of human language. One key benefit of compositionality is the generalization it enables—the production and comprehension of novel expressions analyzed as new compositions of familiar parts. I construct a test for compositional generalization for artificial neural networks based on human generalization patterns discussed in existing linguistic and developmental studies, and test several instantiations of Transformer (Vaswani et al. 2017) and Long Short-Term Memory (Hochreiter & Schmidhuber 1997) models. The models evaluated exhibit only limited degrees of compositional generalization, implying that their learning biases for induction to fill gaps in the training data differ from those of human learners. An error analysis reveals that all models tested lack bias towards faithfulness (à la Prince & Smolensky 1993/2002). Adding a glossing task (word-by-word translation), a task that requires maximally faithful input-output mappings, as an auxiliary training objective to the Transformer model substantially improves generalization, showing that the auxiliary training successfully modified the model’s inductive bias. However, the improvement is limited to generalization to novel compositions of known lexical items and known structures; all models still struggled with generalization to novel structures, regardless of auxiliary training. The challenge of structural generalization leaves open exciting avenues for future research for both human and machine learners.
- 2022/01/18 17:00-18:00 UTC+1: Johan Bos (University of Groningen)
Title: Variable-free Meaning Representations
Abstract: Most formal meaning representations use variables to represent entities and relations between them. But variables can be bothersome for people annotating texts with meanings, and for algorithms that work with meanings representations, in particular the recent machine learning methods based on neural network technology.
Hence the question that I am interested in is: can we replace the currently popular meaning representations with representations that do not use variables, without giving up any expressive power? My starting point are the representations of Discourse Representation Theory. I will show that these can be replaced by a simple language based on indices instead of variables, assuming a neo-Davidsonian event semantics.
The resulting formalism has several interesting consequences. Apart from being beneficial to human annotators and machine learning algorithms, it also offers straightforward visualisation possibilities and potential for modelling information packaging.
- 2021/12/14 17:00-18:00 UTC+1: Lisa Bylinina (Bookarang, Netherlands)
Title: Polarity in multilingual language models
Abstract: The space of natural languages is constrained by various interactions between linguistic phenomena. In this talk, I will focus on one particular type of such interaction, in which logical properties of a context constrain the distribution of negative polarity items (NPIs), like English ‘any’. Correlational — and possibly, causal — interaction between logical monotonicity and NPI distribution has been observed for some NPIs in some languages for some contexts, with the help of theoretical, psycholinguistic and computational tools. How general is this relation across languages? How inferable is it from just textual data? What kind of generalization — if any — about NPI distribution would a massively multilingual speaker form, and what kind of causal structure would guide such speaker’s intuition? Humans speaking 100+ languages natively are hard to find — but we do have multilingual language models. I will report experiments in which we study NPIs in four languages (English, French, Russian and Turkish) in two pre-trained models — multilingual BERT and XLM-RoBERTa. We evaluate the models’ recognition of polarity-sensitivity and its cross-lingual generality. Further, using the artificial language learning paradigm, we look for the connection between semantic profiles of tokens and their ability to license NPIs. We find partial evidence for such connection.
Joint work with Alexey Tikhonov (Yandex).
- 2021/11/16 17:00-18:00 UTC+1: Alex Lascarides (University of Edinburgh; 16:00-17:00 UTC+0)
Title: Situated Communication
Abstract: This talk focuses on how to represent and reason about the content of conversation when it takes place in an embodied, dynamic environment. I will argue that speakers can, and do, appropriate non-linguistic events into their communicative intents, even when those events weren’t produced with the intention of being a part of a discourse. Indeed, non-linguistic events can contribute an (instance of) a proposition to the content of the speaker’s message, even when her verbal signal contains no demonstratives or anaphora of any kind.
I will argue that representing and reasoning about discourse coherence is essential to capturing these features of situated conversation. I will make two claims: first, non-linguistic events affect rhetorical structure in non-trivial ways; and secondly, rhetorical structure guides the conceptualisation of non-linguistic events. I will support the first claim via empirical observations from the STAC corpus (www.irit.fr/STAC/corpus.html)—a corpus of dialogues that take place between players during the board game Settlers of Catan. I will support the second claim via experiments in Interactive Task Learning: a software agent jointly learns how to conceptualise the domain, ground previously unknown words in the embodied environment, and solve its planning problem, by using the evidence of an expert’s corrective (verbal) feedback on its physical actions.
- 2021/10/12 17:00-18:00 UTC+2: Christopher Potts (Stanford University; 8:00-9:00 UTC-7)
Title: Causal Abstractions of Neural Natural Language Inference Models
Abstract: Neural networks have a reputation for being « black boxes » — complex, opaque systems that can be studied using only purely behavioral evaluations. However, much recent work on structural analysis methods (e.g., probing and feature attribution) is allowing us to peer inside these models and deeply understand their internal dynamics. In this talk, I’ll describe a new structural analysis method we’ve developed that is grounded in a formal theory of causal abstraction. In this method, neural representations are aligned with variables in interpretable causal models, and then *interchange interventions* are used to experimentally verify that the neural representations have the causal properties of their aligned variables. I’ll use these methods to explore problems in Natural Language Inference, focusing in particular on compositional interactions between lexical entailment and negation. Recent Transformer-based models can solve hard generalization tasks involving these phenomena, and our causal analysis method helps explain why: the models have learned modular representations that closely approximate the high-level compositional theory. Finally, I will show how to bring interchange interventions into the training process, which allows us to push our models to acquire desired modular internal structures like this.
Joint work with Atticus Geiger, Hanson Lu, Noah Goodman, and Thomas Icard.
- 2021/06/01 10:30-18:30 UTC+2: one-day event with 6 speakers, namely Juan Luis Gastaldi (ETH Zürich), Koji Mineshima (Keio University), Maud Pironneau (Druide informatique), Marie-Catherine de Marneffe (Ohio State University), Jacob Andreas (MIT) and Olga Zamaraeva (University of Washington).
Contact : Timothée BERNARD (firstname.lastname@example.org) and Grégoire WINTERSTEIN (email@example.com)