In this paper we describe the collection of a parallel corpus (in Dutch) and its use in a sentence compression tool with the intention to automatically generate subtitles for the deaf from transcripts of a television program. First, the collection of the corpus is described, together with the manipulations and transformations performed on that corpus. Second, a hybrid sentence compression tool is described together with its evaluation.
|Title of host publication||Proceedings of the 4th International Language Resources and Evaluation Conference (LREC 2004)|
|Place of Publication||Lisbon|
|Number of pages||4|
|Publication status||Published - 2004|