Antal van den Bosch

Implicit Linguistics with Memory-based Language Processing

Abstract

Memory-based language processing (MBLP) is an approach to language processing based on exemplar storage during learning and analogical reasoning during processing. From a cognitive perspective, the approach is attractive as a model for human language processing because it does not make any assumptions about the way abstractions are shaped, nor any a priori distinction between regular and exceptional exemplars. Schema-like behavior and the
emergence of categories can be explained in MBLP as implicit by-products of analogical reasoning over exemplars in memory. 

Natural language processing models and systems (developed in the fields of computational linguistics and artificial intelligence) typically employ abstract linguistic representations (syntactic, semantic, or pragmatic) as intermediate working units. The presupposition of abstract representations is challenged by the memory-based approach. Classes of natural processing tasks that provide good testing grounds for this challenge are processes in which form is mapped to form, i.e., in which neither the input nor the output contains abstract elements to begin with. We take (machine) translation as our focus domain.

The advent of statistical machine translation in the 1990s raised the field to a new level. Google Translate, with all its defects, has gained notoriety and wordwide usage. In these models, text is translated in maximally large chunks, where each chunk is translated by analogy to stored translations of chunks, and where the final translation is a likely arrangement of chunks according to a statistical language model. In this complex model, a considerable amount of “implicit linguistics” occurs. Besides the fact that the used chunks are collocationally strong n-grams, or multi-word units, the translation model ties them to strong n-grams in the other language with conditional probabilities, arguably grounding their meaning. The course explores how we can see these models as implementations of complex lexical items and the mental lexicon – or constructicon.

Reading list

  • Daelemans, W., and Van den Bosch, A. (2005). Memory-based language processing. Cambridge, UK: Cambridge University Press.
  • Van den Bosch, A., and Daelemans, W. (2013). Implicit schemata and categories in memory-based language processing. Language and Speech, 56:3, pp. 308-326. doi:10.1177/0023830913484902
  • Van den Bosch, A., and Bresnan, J. (submitted). Modeling Dative Alternations of Individual Children. Draft for comments, http://www.stanford.edu/~bresnan/bosch-bresnan-submitted.pdf