OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    153 Downloads (Pure)

    Abstract

    We present further work on evaluation of the fully automatic post-correction of Early Dutch Books Online, a collection of 10,333 18th century books. In prior work we evaluated the new implementation of Text-Induced Corpus Clean-up (TICCL) on the basis of a single book Gold Standard derived from this collection. In the current paper we revisit the same collection on the basis of a sizeable 1020 item
    random sample of OCR post-corrected strings from the full collection. Both evaluations have their own stories to tell and lessons to teach.
    Original languageEnglish
    Title of host publicationProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
    Editors Calzolari
    PublisherELRA
    Pages967-974
    Number of pages8
    Publication statusPublished - 2016
    EventInternational Conference on Language Resources and Evaluation 2016: 10th edition - Grand Hotel Bernardin Conference Center, Portoroz, Slovenia
    Duration: 23 May 201628 May 2016
    Conference number: 10
    http://lrec2016.lrec-conf.org/en/

    Conference

    ConferenceInternational Conference on Language Resources and Evaluation 2016
    Abbreviated titleLREC 2016
    CountrySlovenia
    CityPortoroz
    Period23/05/1628/05/16
    Internet address

    Fingerprint

    evaluation
    gold standard
    random sample

    Keywords

    • TICCL
    • OCR post-correction
    • evaluation
    • EDBO
    • Nederlab
    • CLARIAH

    Cite this

    Reynaert, M. (2016). OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited. In Calzolari (Ed.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (pp. 967-974). ELRA.
    Reynaert, Martin. / OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). editor / Calzolari. ELRA, 2016. pp. 967-974
    @inproceedings{29cbd25aa16348cbb1c78745cf08ddec,
    title = "OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited",
    abstract = "We present further work on evaluation of the fully automatic post-correction of Early Dutch Books Online, a collection of 10,333 18th century books. In prior work we evaluated the new implementation of Text-Induced Corpus Clean-up (TICCL) on the basis of a single book Gold Standard derived from this collection. In the current paper we revisit the same collection on the basis of a sizeable 1020 item random sample of OCR post-corrected strings from the full collection. Both evaluations have their own stories to tell and lessons to teach.",
    keywords = "TICCL, OCR post-correction, evaluation, EDBO, Nederlab, CLARIAH",
    author = "Martin Reynaert",
    year = "2016",
    language = "English",
    pages = "967--974",
    editor = "Calzolari",
    booktitle = "Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)",
    publisher = "ELRA",

    }

    Reynaert, M 2016, OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited. in Calzolari (ed.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). ELRA, pp. 967-974, International Conference on Language Resources and Evaluation 2016, Portoroz, Slovenia, 23/05/16.

    OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited. / Reynaert, Martin.

    Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). ed. / Calzolari. ELRA, 2016. p. 967-974.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited

    AU - Reynaert, Martin

    PY - 2016

    Y1 - 2016

    N2 - We present further work on evaluation of the fully automatic post-correction of Early Dutch Books Online, a collection of 10,333 18th century books. In prior work we evaluated the new implementation of Text-Induced Corpus Clean-up (TICCL) on the basis of a single book Gold Standard derived from this collection. In the current paper we revisit the same collection on the basis of a sizeable 1020 item random sample of OCR post-corrected strings from the full collection. Both evaluations have their own stories to tell and lessons to teach.

    AB - We present further work on evaluation of the fully automatic post-correction of Early Dutch Books Online, a collection of 10,333 18th century books. In prior work we evaluated the new implementation of Text-Induced Corpus Clean-up (TICCL) on the basis of a single book Gold Standard derived from this collection. In the current paper we revisit the same collection on the basis of a sizeable 1020 item random sample of OCR post-corrected strings from the full collection. Both evaluations have their own stories to tell and lessons to teach.

    KW - TICCL

    KW - OCR post-correction

    KW - evaluation

    KW - EDBO

    KW - Nederlab

    KW - CLARIAH

    UR - http://www.lrec-conf.org/proceedings/lrec2016/summaries/596.html

    M3 - Conference contribution

    SP - 967

    EP - 974

    BT - Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

    A2 - Calzolari, null

    PB - ELRA

    ER -

    Reynaert M. OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited. In Calzolari, editor, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). ELRA. 2016. p. 967-974