Abstract
String transduction problems are ubiquitous in natural language
processing: they include transliteration, grapheme-to-phoneme
conversion, text normalization and translation. String transduction
can be reduced to the simpler problems of sequence labeling by
expressing the target string as a sequence of edit operations applied
to the source string. Due to this reduction all sequence labeling
models become applicable in typical transduction settings. Sequence
models range from simple linear models such as sequence perceptron
which require external feature extractors to recurrent neural
networks with long short-term memory (LSTM) units which can do feature
extraction internally. Versions of recurrent neural networks are also
capable of solving string transduction natively, without reformulating
it in terms of edit operations. In this talk I analyze the effect of
these variations in model architecture and input representation on
performance and engineering effort for string transduction, focusing
especially on the text normalization task.
processing: they include transliteration, grapheme-to-phoneme
conversion, text normalization and translation. String transduction
can be reduced to the simpler problems of sequence labeling by
expressing the target string as a sequence of edit operations applied
to the source string. Due to this reduction all sequence labeling
models become applicable in typical transduction settings. Sequence
models range from simple linear models such as sequence perceptron
which require external feature extractors to recurrent neural
networks with long short-term memory (LSTM) units which can do feature
extraction internally. Versions of recurrent neural networks are also
capable of solving string transduction natively, without reformulating
it in terms of edit operations. In this talk I analyze the effect of
these variations in model architecture and input representation on
performance and engineering effort for string transduction, focusing
especially on the text normalization task.
Original language | English |
---|---|
Publication status | Published - 2015 |
Event | The 25th Meeting of Computational Linguistics in the Netherlands (CLIN25) - , Belgium Duration: 5 Feb 2015 → 6 Feb 2015 |
Conference
Conference | The 25th Meeting of Computational Linguistics in the Netherlands (CLIN25) |
---|---|
Country/Territory | Belgium |
Period | 5/02/15 → 6/02/15 |