@PhilosTEI: Building Corpora for Philosophers

Arianna Betti, Martin Reynaert, Hein van den Berg

    Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review

    Abstract

    For philosophers to be able to take a computational turn in their field, especially if that field relies heavily on historical material, it is crucial to be able to build high-quality, easily and freely accessible corpora in a sustainable format composed from multi-language, multi-script books from different historical periods. At the moment, corpora matching these needs are virtually non-existent. Within the CLARIN-NL project @PhilosTEI, we have addressed the problem of building this kind of corpora by developing an open-source, web-based, user-friendly workflow from textual images to TEI, based on state-of-the-art open-source OCR software Tesseract, and a multi-language version of TICCL, a powerful OCR post-correction tool. We have demonstrated the utility of the @PhilosTEI tool by applying it to a multilingual, multi-script corpus of important 18th to 20th century European philosophical texts.
    Original languageEnglish
    Title of host publicationCLARIN-NL in the Low Countries
    EditorsJan Odijk, Arjan van Hessen
    Place of PublicationLondon
    PublisherUbiquity Press, London
    Chapter32
    Pages379-392
    Number of pages13
    ISBN (Electronic)9781911529255
    ISBN (Print)9781911529248
    DOIs
    Publication statusPublished - 28 Dec 2017

      Fingerprint

    Keywords

    • History and philosophy of science and technology
    • History of ideas and intellectual history
    • Software for humanities
    • Textual and linguistic corpora
    • TICCL
    • OCR post-correction
    • @PhilosTEI

    Cite this

    Betti, A., Reynaert, M., & van den Berg, H. (2017). @PhilosTEI: Building Corpora for Philosophers. In J. Odijk, & A. van Hessen (Eds.), CLARIN-NL in the Low Countries (pp. 379-392). Ubiquity Press, London. https://doi.org/10.5334/bbi.32