Bridges and Gaps between Formal and Computational Linguistics (an ESSLLI 2022 workshop)

Workshop description

While computational linguistics is historically rooted in formal linguistics, it might seem that the distance between these two fields has only grown larger and larger.
The goal of this workshop is to consider whether this impression is correct in the light of both recent developments and long-standing approaches.
Indeed, while we are currently witnessing a growing interest within formal linguistics in both explaining the remarkable successes of neural-based language models and uncovering their limitations, one should not forget the contribution to theoretical linguistics provided, for example, by the computational implementation of grammatical formalisms. And while neural-based methods have recently received the lion’s share of the public attention, interpretable models based on symbolic methods are still relevant and widely used in the natural language processing industry.
This workshop is intended to make members of the aforementioned scientific communities meet and share their perspectives on these topics and related areas.

Relevant topics

The following is a non-exhaustive list of topics relevant to the workshop:

  • Symbolic methods for linguistic;
  • Implementation of formal grammars;
  • Use of computational methods for linguistic inquiry;
  • Compositionality and vector semantics;
  • Investigation of the linguistic properties of machine learning models;
  • Trends in the history of computational linguistics;

Location

The workshop will take place during the 33rd edition of the European Summer School of Logic, Language and Information (ESSLLI) in Galway, Ireland.

Guest speakers

  • Aurelie Herbelot, Centre for Mind/Brain Sciences (CIMeC), University of Trento
  • James Pustejovsky, Volen Center Complex System, Brandeis University

Programme

  • Monday, Aug. 8th
    • 09h00: Introduction to the workshop
    • 09h15: Aurelie Herbelot (guest speaker)
      Title: Can truth-theoretic knowledge be learned?
      Abstract: Large Neural Language Models have been criticised from various perspectives. One aspect of this critique concerns the inability of current Deep Learning paradigms to encode core semantic competences such as quantification or negation. On the other hand, logical approaches to meaning — which readily account for a wide range of semantic phenomena — lack a theory of acquisition, prompting questions about their suitability for machine learning. In this talk, I will argue it is both possible and desirable to build computational models that take meaning seriously and encode its truth-theoretical aspects. I will first show how a mapping can be automatically learned between corpus data and some underspecified set-theoretic representation of the world. Having identified the limits of this approach, I will then propose a more complete formalisation of set theory in terms of a vector space, amenable to computational treatment. Finally, I will show that such a formalisation can be automatically learned from small data, providing high levels of performance on core semantic tasks.
  • Tuesday, Aug. 9th
    • 09h00: Larry Moss, “Monotonicity Reasoning as a Bridge Between Linguistic Theory and NLI” [abstract]
      09h30: Martin Kopf, Maryam Rajestari and Remus Gergel, “Crowdsourcing for generating diachronic semantic annotations in presupposition triggers: a pilot on again” [abstract]
    • 10h00: Luuk Suurmeijer, Hana Filip and Noortje J. Venhuizen, “Probabilities in language and the world: Modeling syllogistic reasoning” [abstract]
  • Wednesday, Aug. 10th
    • 09h00: Masaya Taniguchi, Satoshi Tojo and Koji Mineshima, “Interactive CCG Parsing with Incremental Trees” [abstract]
    • 09h30: Kata Balogh, “Pragmatic Structuring and Negation in Formal Grammar” [abstract]
    • 10h00: Jean-Philippe Bernardy and Shalom Lappin, “Unitary Matrices As Compositional Word Embeddings” [abstract]
  • Thursday, Aug. 11th
    • 09h00: Michael Goodale and Salvador Mascarenhas, “Do contextual word embeddings represent richly subsective adjectives more diversely than intersective adjectives?” [abstract]
    • 09h30: Lucía Ormaechea Grijalba, Benjamin Lecouteux, Pierrette Bouillon and Didier Schwab, “A Tool for Easily Integrating Grammars as Language Models into the Kaldi Speech Recognition Toolkit” [abstract]
    • 10h00: Deniz Ekin Yavas, Marta Ricchiardi, Elisabetta Ježek, Laura Kallmeyer and Rainer Osswald, “Automatic Detection of Copredication using Contextualized Word Embeddings” [abstract]
  • Friday, Aug. 12th
    • 09h00: James Pustejovsky (guest speaker)
      Title: Dense Paraphrasing for Textual Enrichment: Question Answering and Inference
      Abstract: Much of the current computational work on inference in NLP can be associated with one of two techniques: the first focuses on a specific notion of text-based question answering (QA), using large pre-trained language models (LLMs). To examine specific linguistic properties present in the model, « probing tasks » (diagnostic classifiers) have been developed to test capabilities that the LLM demonstrates on interpretable semantic inferencing tasks, such as age and object comparisons, hypernym conjunction, antonym negation, and others. The second is Knowledge Graph-based inference and QA, where triples are mined from Wikipedia, ConceptNet, WikiData, and other non-corpus resources, and then used for answering questions involving multiple components of the KG (multi-hop QA). While quite impressive with benchmarked metrics in QA, both techniques are completely confused by (a) syntactically missing semantic content, and (b) the semantics accompanying the consequences of events and actions in narratives.
      In this talk, I discuss a model we have developed to enrich the surface form of texts, using type-based semantic operations to « textually expose » the deeper meaning of the corpus that was used to make the original embeddings in the language model. This model, Dense Paraphrasing, is a linguistically-motivated, textual enrichment strategy, that textualizes the compositional operations inherent in a semantic model, such as Generative Lexicon Theory or CCG. This involve broadly three kinds of interpretive processes: (i) recognizing the diverse variability in linguistic forms that can be associated with the same underlying semantic representation (paraphrases); (ii) identifying semantic factors or variables that accompany or are presupposed by the lexical semantics of the words present in the text, through dropped, hidden or shadow arguments; and (iii) interpreting or computing the dynamic consequences of actions and events in the text. After performing these textual enrichment algorithms, we fine-tune the LLM which allows more robust inference and QA task performance.
    • 10h15: Outro

Submission details

We invite authors to submit abstracts for 30 minutes presentations (including questions) related to the above topics.

Abstracts must be anonymous, up to two pages (A4 or US letter) in 12pt font, with 1 inch margins, plus an additional page for supplemental material (figures, glossed examples for languages other than English), and unlimited references.

The submission needs to be done in PDF format via Easychair, using the following link: https://easychair.org/conferences/?conf=brigap2022

Important dates

  • Submission deadline: Friday, April 1st 2022 April 8th 2022
  • Notification of acceptance: Monday, May 9th 2022
  • Workshop: August 8th – 12th 2022

Workshop chairs

  • Timothée Bernard (Université Paris Cité)
  • Grégoire Winterstein (UQAM)

Programme committee

  • Lasha Abzianidze (Utrecht University)
  • Pascal Amsili (Université Sorbonne Nouvelle)
  • Gemma Boleda (Universitat Pompeu Fabra)
  • Johan Bos (University of Groningen)
  • Chloé Braud (CNRS)
  • Lisa Bylinina (Bookarang)
  • Benoît Crabbé (Université Paris Cité)
  • Jonathan Ginzburg (Université Paris Cité)
  • Aurelie Herbelot (University of Trento)
  • Denis Paperno (Utrecht University)
  • James Pustejovsky (Brandeis University)
  • Laura Kallmeyer (Heinrich-Heine-Universität Düsseldorf)
  • Najoung Kim (New York University)
  • Dan Lassiter (Stanford University)
  • Christian Retoré (Université de Montpellier)
  • Guillaume Wisniewski (Université Paris Cité)

Related events

The workshop is part of the GdR LIFT activities on formal and computational linguistics. Other activities of the GdR on formal and computational linguistics include:

Thanks

The organisation of this workshop is made possible thanks to funding from GdR LIFT, Laboratoire de linguistique formelle and Labex EFL.