Measuring the readability of sustainability reports: A corpus-based analysis through standard formulae and NLP

N. Smeuninx, B. De Clerck, Walter Aerts

    Research output: Contribution to journalArticleScientificpeer-review


    This study characterises and problematises the language of corporate reporting along region, industry, genre, and content lines by applying readability formulae and more advanced natural language processing (NLP)–based analysis to a manually assembled 2.75-million-word corpus. Readability formulae reveal that, despite its wider readership, sustainability reporting remains a very difficult to read genre, sometimes more difficult than financial reporting. Although we find little industry impact on readability, region does prove an important variable, with NLP-based variables more strongly affected than formulae. These results not only highlight the impact of legislative contexts but also language variety itself as an underexplored variable. Finally, the study reveals some of the weaknesses of default readability formulae, which are largely unable to register syntactic variation between the varieties of English in the reports and demonstrates the merits of NLP in report readability analysis as well as the need for more accessible sustainability reporting.
    Original languageEnglish
    Pages (from-to)52-85
    JournalInternational Journal of Business Communication
    Issue number1
    Publication statusPublished - Jan 2020



    • corpus linguistics
    • readability
    • sustainability reporting
    • language variety
    • natural language processing

    Cite this