Multilingual text induced spelling correction

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams and word bigrams. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates. We describe the implemented trilingual (Dutch, English, French) prototype and evaluate it on English and Dutch text, monolingual and mixed, containing real-world errors in context. %The results are compared with those of the isolated word spelling checking programs Ispell and the Microsoft Proofing Tools MPT.
    Original languageEnglish
    Title of host publicationProceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources
    Place of PublicationGeneva
    PublisherUnknown Publisher
    Pages117-124
    Number of pages8
    Publication statusPublished - 2004

    Cite this

    Reynaert, M. W. C. (2004). Multilingual text induced spelling correction. In Proceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources (pp. 117-124). Geneva: Unknown Publisher.
    Reynaert, M.W.C. / Multilingual text induced spelling correction. Proceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources. Geneva : Unknown Publisher, 2004. pp. 117-124
    @inproceedings{4766fd6cdf24429a89662ed0f498a83c,
    title = "Multilingual text induced spelling correction",
    abstract = "We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams and word bigrams. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates. We describe the implemented trilingual (Dutch, English, French) prototype and evaluate it on English and Dutch text, monolingual and mixed, containing real-world errors in context. {\%}The results are compared with those of the isolated word spelling checking programs Ispell and the Microsoft Proofing Tools MPT.",
    author = "M.W.C. Reynaert",
    note = "Pagination: 8",
    year = "2004",
    language = "English",
    pages = "117--124",
    booktitle = "Proceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources",
    publisher = "Unknown Publisher",

    }

    Reynaert, MWC 2004, Multilingual text induced spelling correction. in Proceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources. Unknown Publisher, Geneva, pp. 117-124.

    Multilingual text induced spelling correction. / Reynaert, M.W.C.

    Proceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources. Geneva : Unknown Publisher, 2004. p. 117-124.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - Multilingual text induced spelling correction

    AU - Reynaert, M.W.C.

    N1 - Pagination: 8

    PY - 2004

    Y1 - 2004

    N2 - We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams and word bigrams. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates. We describe the implemented trilingual (Dutch, English, French) prototype and evaluate it on English and Dutch text, monolingual and mixed, containing real-world errors in context. %The results are compared with those of the isolated word spelling checking programs Ispell and the Microsoft Proofing Tools MPT.

    AB - We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams and word bigrams. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates. We describe the implemented trilingual (Dutch, English, French) prototype and evaluate it on English and Dutch text, monolingual and mixed, containing real-world errors in context. %The results are compared with those of the isolated word spelling checking programs Ispell and the Microsoft Proofing Tools MPT.

    M3 - Conference contribution

    SP - 117

    EP - 124

    BT - Proceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources

    PB - Unknown Publisher

    CY - Geneva

    ER -

    Reynaert MWC. Multilingual text induced spelling correction. In Proceedings of the COLING 2004 Workshop on Multilingual Linguistic Resources. Geneva: Unknown Publisher. 2004. p. 117-124