Measuring the readability of sustainability reports: A corpus-based analysis through standard formulae and NLP

N. Smeuninx, B. De Clerck, Walter Aerts

    Research output: Contribution to journalArticleScientificpeer-review

    Abstract

    This study characterises and problematises the language of corporate reporting along region, industry, genre, and content lines by applying readability formulae and more advanced natural language processing (NLP)–based analysis to a manually assembled 2.75-million-word corpus. Readability formulae reveal that, despite its wider readership, sustainability reporting remains a very difficult to read genre, sometimes more difficult than financial reporting. Although we find little industry impact on readability, region does prove an important variable, with NLP-based variables more strongly affected than formulae. These results not only highlight the impact of legislative contexts but also language variety itself as an underexplored variable. Finally, the study reveals some of the weaknesses of default readability formulae, which are largely unable to register syntactic variation between the varieties of English in the reports and demonstrates the merits of NLP in report readability analysis as well as the need for more accessible sustainability reporting.
    Original languageEnglish
    Pages (from-to)52-85
    JournalInternational Journal of Business Communication
    Volume57
    Issue number1
    DOIs
    Publication statusPublished - Jan 2020

    Keywords

    • corpus linguistics
    • readability
    • sustainability reporting
    • language variety
    • natural language processing

    Fingerprint

    Dive into the research topics of 'Measuring the readability of sustainability reports: A corpus-based analysis through standard formulae and NLP'. Together they form a unique fingerprint.

    Cite this