Projects per year
Abstract
We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams and word bigrams. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates. We describe the implemented trilingual (Dutch, English, French) prototype and evaluate it on English and Dutch text, monolingual and mixed, containing real-world errors in context. %The results are compared with those of the isolated word spelling checking programs Ispell and the Microsoft Proofing Tools MPT.
Original language | English |
---|---|
Title of host publication | Proceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources |
Place of Publication | Geneva |
Publisher | Unknown Publisher |
Pages | 117-124 |
Number of pages | 8 |
Publication status | Published - 2004 |
Fingerprint
Dive into the research topics of 'Multilingual text induced spelling correction'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Automatic text analysis and machinelearning for prosody
Marsi, E. C. (Researcher) & Reynaert, M. (Researcher)
1/01/01 → 1/01/05
Project: Research project