1. 2.3.

Ann Arbor Spark

CLD: Software for Language Documentation

Tuesday, August 20 2019 at 6:00pm - 8:00pm

Dr. Steven Abney, Linguistics Professor at the University of Michigan, will be presenting on software he created for language documentation.

“I’d like to describe an application for language documentation, specifically, entry of text and audio, transcription, and translation. It provides tight integration of audio, text, and lexicon; in fact, the lexicon is automatically constructed from the texts.

In some ways the real question is not what CLD is, but why I wrote it. I describe a linguistic approach that I call “inductive general grammar,” or, more casually, “linguistics with a computational attitude,” in which the big question is how one can automatically learn a complete language. The first order of business is the collection of a large training set—standard operating procedure in computational linguistics but a novel idea for linguistics. In this case, though, the items in the training set are entire languages.

The current rate of language loss gives the matter urgency. The Universal Dependencies treebanks are a terrific resource, but they only touch 1% of the world’s languages. How can we accelerate the collection of data? The idea behind CLD is to do so via a mutually beneficial collaboration with speaker communities. Transmission to the next generation is a major issue, and CLD aims to provide a self-study complement to immersion learning.

This will not be the typical A2D-NLP talk: it will be mostly about linguistics and a user interface. But the next steps pose some fascinating challenges for computational linguistics, such as automatic phonetic transcription and automatic translation-lexicon construction.”

Ann Arbor SPARK Central Innovation Center
330 East Liberty Street
Ann Arbor, MI, USA