Measuring the readability of sustainability reports:

A corpus-based analysis through standard formulae and NLP

N. Smeuninx, B. De Clerck, Walter Aerts

    Research output: Contribution to journalArticleScientificpeer-review

    Abstract

    This study characterises and problematises the language of corporate reporting along region, industry, genre, and content lines by applying readability formulae and more advanced natural language processing (NLP)–based analysis to a manually assembled 2.75-million-word corpus. Readability formulae reveal that, despite its wider readership, sustainability reporting remains a very difficult to read genre, sometimes more difficult than financial reporting. Although we find little industry impact on readability, region does prove an important variable, with NLP-based variables more strongly affected than formulae. These results not only highlight the impact of legislative contexts but also language variety itself as an underexplored variable. Finally, the study reveals some of the weaknesses of default readability formulae, which are largely unable to register syntactic variation between the varieties of English in the reports and demonstrates the merits of NLP in report readability analysis as well as the need for more accessible sustainability reporting.
    Original languageEnglish
    JournalInternational Journal of Business Communication
    DOIs
    Publication statusE-pub ahead of print - 21 Nov 2016

    Fingerprint

    Natural language processing
    Readability
    Sustainability reports
    Industry
    Language
    Sustainability reporting
    Corporate reporting
    Financial reporting

    Keywords

    • corpus linguistics
    • readability
    • sustainability reporting
    • language variety
    • natural language processing

    Cite this

    @article{6fbdb7ca64cb4886800973aa6dba0fcc,
    title = "Measuring the readability of sustainability reports:: A corpus-based analysis through standard formulae and NLP",
    abstract = "This study characterises and problematises the language of corporate reporting along region, industry, genre, and content lines by applying readability formulae and more advanced natural language processing (NLP)–based analysis to a manually assembled 2.75-million-word corpus. Readability formulae reveal that, despite its wider readership, sustainability reporting remains a very difficult to read genre, sometimes more difficult than financial reporting. Although we find little industry impact on readability, region does prove an important variable, with NLP-based variables more strongly affected than formulae. These results not only highlight the impact of legislative contexts but also language variety itself as an underexplored variable. Finally, the study reveals some of the weaknesses of default readability formulae, which are largely unable to register syntactic variation between the varieties of English in the reports and demonstrates the merits of NLP in report readability analysis as well as the need for more accessible sustainability reporting.",
    keywords = "corpus linguistics, readability, sustainability reporting, language variety, natural language processing",
    author = "N. Smeuninx and {De Clerck}, B. and Walter Aerts",
    year = "2016",
    month = "11",
    day = "21",
    doi = "10.1177/2329488416675456",
    language = "English",
    journal = "International Journal of Business Communication",
    issn = "2329-4884",
    publisher = "Sage Publications Ltd",

    }

    Measuring the readability of sustainability reports: A corpus-based analysis through standard formulae and NLP. / Smeuninx, N.; De Clerck, B.; Aerts, Walter.

    In: International Journal of Business Communication, 21.11.2016.

    Research output: Contribution to journalArticleScientificpeer-review

    TY - JOUR

    T1 - Measuring the readability of sustainability reports:

    T2 - A corpus-based analysis through standard formulae and NLP

    AU - Smeuninx, N.

    AU - De Clerck, B.

    AU - Aerts, Walter

    PY - 2016/11/21

    Y1 - 2016/11/21

    N2 - This study characterises and problematises the language of corporate reporting along region, industry, genre, and content lines by applying readability formulae and more advanced natural language processing (NLP)–based analysis to a manually assembled 2.75-million-word corpus. Readability formulae reveal that, despite its wider readership, sustainability reporting remains a very difficult to read genre, sometimes more difficult than financial reporting. Although we find little industry impact on readability, region does prove an important variable, with NLP-based variables more strongly affected than formulae. These results not only highlight the impact of legislative contexts but also language variety itself as an underexplored variable. Finally, the study reveals some of the weaknesses of default readability formulae, which are largely unable to register syntactic variation between the varieties of English in the reports and demonstrates the merits of NLP in report readability analysis as well as the need for more accessible sustainability reporting.

    AB - This study characterises and problematises the language of corporate reporting along region, industry, genre, and content lines by applying readability formulae and more advanced natural language processing (NLP)–based analysis to a manually assembled 2.75-million-word corpus. Readability formulae reveal that, despite its wider readership, sustainability reporting remains a very difficult to read genre, sometimes more difficult than financial reporting. Although we find little industry impact on readability, region does prove an important variable, with NLP-based variables more strongly affected than formulae. These results not only highlight the impact of legislative contexts but also language variety itself as an underexplored variable. Finally, the study reveals some of the weaknesses of default readability formulae, which are largely unable to register syntactic variation between the varieties of English in the reports and demonstrates the merits of NLP in report readability analysis as well as the need for more accessible sustainability reporting.

    KW - corpus linguistics

    KW - readability

    KW - sustainability reporting

    KW - language variety

    KW - natural language processing

    U2 - 10.1177/2329488416675456

    DO - 10.1177/2329488416675456

    M3 - Article

    JO - International Journal of Business Communication

    JF - International Journal of Business Communication

    SN - 2329-4884

    ER -