Projects per year
Abstract
We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word unigrams and word bigrams. It is stored in a novel representation based on a purpose-built hashing function, which provides a fast and computationally tractable way of checking whether a particular word form likely constitutes a spelling error and of retrieving correction candidates. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates when insufficient information for an unambiguous decision on a single correction is available. We describe the implemented prototype and evaluate it on English and Dutch text, containing real-world errors in more or less limited contexts. The results are compared with those of the isolated word spelling checking programs Ispell and the Microsoft Proofing Tools MPT.
Original language | English |
---|---|
Title of host publication | Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004) |
Place of Publication | Geneva |
Publisher | Unknown Publisher |
Pages | 834-840 |
Number of pages | 7 |
ISBN (Print) | 1932432485 |
Publication status | Published - 2004 |
Fingerprint
Dive into the research topics of 'Text Induced Spelling Correction'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Automatic text analysis and machinelearning for prosody
Marsi, E. C. (Researcher) & Reynaert, M. (Researcher)
1/01/01 → 1/01/05
Project: Research project