PICCL: Philosophical Integrator of Computational and Corpus Libraries

Martin Reynaert, Maarten van Gompel, Ko van der Sloot, Antal van den Bosch

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    CLARIN activities in the Netherlands in 2015 are in transition between the first national project CLARIN-NL and its successor CLARIAH. In this paper we give an overview of important infrastructure developments which have taken place throughout the first and which are taken to a further level in the second. We show how relatively small accomplishments in particular projects enable larger steps in further ones and how the synergy of these projects helps the national infrastructure to outgrow mere demonstrators and to move towards mature production systems. The paper centers around a new corpus building tool called PICCL. This integrated pipeline offers a comprehensive range of conversion facilities for legacy electronic text formats, Optical Character Recognition for text images, automatic text correction and normalization, linguistic annotation, and preparation for corpus exploration and exploitation environments. We give a concise overview of PICCL’s components, integrated now or to be incorporated in the foreseeable future.
    Original languageEnglish
    Title of host publicationProceedings of CLARIN Annual Conference 2015
    Subtitle of host publicationBook of Abstracts
    EditorsKoenraad De Smedt
    Place of PublicationWrocław, Poland
    PublisherCLARIN ERIC
    Pages75-79
    Number of pages5
    Publication statusPublished - 15 Oct 2015
    EventCLARIN Annual Conference 2015 - Hotel Sofitel Wroclaw Old Town, Wrocław, Poland
    Duration: 15 Oct 201517 Oct 2015

    Conference

    ConferenceCLARIN Annual Conference 2015
    CountryPoland
    CityWrocław
    Period15/10/1517/10/15

    Fingerprint

    Optical character recognition
    Linguistics
    Pipelines

    Keywords

    • Corpus Building Workflow
    • PICCL
    • TICCL
    • Text Conversion
    • FoLiA XML
    • CLAM

    Cite this

    Reynaert, M., van Gompel, M., van der Sloot, K., & van den Bosch, A. (2015). PICCL: Philosophical Integrator of Computational and Corpus Libraries. In K. De Smedt (Ed.), Proceedings of CLARIN Annual Conference 2015: Book of Abstracts (pp. 75-79). Wrocław, Poland: CLARIN ERIC.
    Reynaert, Martin ; van Gompel, Maarten ; van der Sloot, Ko ; van den Bosch, Antal. / PICCL: Philosophical Integrator of Computational and Corpus Libraries. Proceedings of CLARIN Annual Conference 2015: Book of Abstracts. editor / Koenraad De Smedt. Wrocław, Poland : CLARIN ERIC, 2015. pp. 75-79
    @inproceedings{5fd7babe31494851b7e752f816d62a96,
    title = "PICCL: Philosophical Integrator of Computational and Corpus Libraries",
    abstract = "CLARIN activities in the Netherlands in 2015 are in transition between the first national project CLARIN-NL and its successor CLARIAH. In this paper we give an overview of important infrastructure developments which have taken place throughout the first and which are taken to a further level in the second. We show how relatively small accomplishments in particular projects enable larger steps in further ones and how the synergy of these projects helps the national infrastructure to outgrow mere demonstrators and to move towards mature production systems. The paper centers around a new corpus building tool called PICCL. This integrated pipeline offers a comprehensive range of conversion facilities for legacy electronic text formats, Optical Character Recognition for text images, automatic text correction and normalization, linguistic annotation, and preparation for corpus exploration and exploitation environments. We give a concise overview of PICCL’s components, integrated now or to be incorporated in the foreseeable future.",
    keywords = "Corpus Building Workflow, PICCL, TICCL, Text Conversion, FoLiA XML, CLAM",
    author = "Martin Reynaert and {van Gompel}, Maarten and {van der Sloot}, Ko and {van den Bosch}, Antal",
    year = "2015",
    month = "10",
    day = "15",
    language = "English",
    pages = "75--79",
    editor = "{De Smedt}, Koenraad",
    booktitle = "Proceedings of CLARIN Annual Conference 2015",
    publisher = "CLARIN ERIC",

    }

    Reynaert, M, van Gompel, M, van der Sloot, K & van den Bosch, A 2015, PICCL: Philosophical Integrator of Computational and Corpus Libraries. in K De Smedt (ed.), Proceedings of CLARIN Annual Conference 2015: Book of Abstracts. CLARIN ERIC, Wrocław, Poland, pp. 75-79, CLARIN Annual Conference 2015, Wrocław, Poland, 15/10/15.

    PICCL: Philosophical Integrator of Computational and Corpus Libraries. / Reynaert, Martin; van Gompel, Maarten; van der Sloot, Ko; van den Bosch, Antal.

    Proceedings of CLARIN Annual Conference 2015: Book of Abstracts. ed. / Koenraad De Smedt. Wrocław, Poland : CLARIN ERIC, 2015. p. 75-79.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - PICCL: Philosophical Integrator of Computational and Corpus Libraries

    AU - Reynaert, Martin

    AU - van Gompel, Maarten

    AU - van der Sloot, Ko

    AU - van den Bosch, Antal

    PY - 2015/10/15

    Y1 - 2015/10/15

    N2 - CLARIN activities in the Netherlands in 2015 are in transition between the first national project CLARIN-NL and its successor CLARIAH. In this paper we give an overview of important infrastructure developments which have taken place throughout the first and which are taken to a further level in the second. We show how relatively small accomplishments in particular projects enable larger steps in further ones and how the synergy of these projects helps the national infrastructure to outgrow mere demonstrators and to move towards mature production systems. The paper centers around a new corpus building tool called PICCL. This integrated pipeline offers a comprehensive range of conversion facilities for legacy electronic text formats, Optical Character Recognition for text images, automatic text correction and normalization, linguistic annotation, and preparation for corpus exploration and exploitation environments. We give a concise overview of PICCL’s components, integrated now or to be incorporated in the foreseeable future.

    AB - CLARIN activities in the Netherlands in 2015 are in transition between the first national project CLARIN-NL and its successor CLARIAH. In this paper we give an overview of important infrastructure developments which have taken place throughout the first and which are taken to a further level in the second. We show how relatively small accomplishments in particular projects enable larger steps in further ones and how the synergy of these projects helps the national infrastructure to outgrow mere demonstrators and to move towards mature production systems. The paper centers around a new corpus building tool called PICCL. This integrated pipeline offers a comprehensive range of conversion facilities for legacy electronic text formats, Optical Character Recognition for text images, automatic text correction and normalization, linguistic annotation, and preparation for corpus exploration and exploitation environments. We give a concise overview of PICCL’s components, integrated now or to be incorporated in the foreseeable future.

    KW - Corpus Building Workflow

    KW - PICCL

    KW - TICCL

    KW - Text Conversion

    KW - FoLiA XML

    KW - CLAM

    UR - http://www.clarin.eu/sites/default/files/book%20of%20abstracts%202015.pdf

    M3 - Conference contribution

    SP - 75

    EP - 79

    BT - Proceedings of CLARIN Annual Conference 2015

    A2 - De Smedt, Koenraad

    PB - CLARIN ERIC

    CY - Wrocław, Poland

    ER -

    Reynaert M, van Gompel M, van der Sloot K, van den Bosch A. PICCL: Philosophical Integrator of Computational and Corpus Libraries. In De Smedt K, editor, Proceedings of CLARIN Annual Conference 2015: Book of Abstracts. Wrocław, Poland: CLARIN ERIC. 2015. p. 75-79