This study characterises and problematises the language of corporate reporting along region, industry, genre, and content lines by applying readability formulae and more advanced natural language processing (NLP)–based analysis to a manually assembled 2.75-million-word corpus. Readability formulae reveal that, despite its wider readership, sustainability reporting remains a very difficult to read genre, sometimes more difficult than financial reporting. Although we find little industry impact on readability, region does prove an important variable, with NLP-based variables more strongly affected than formulae. These results not only highlight the impact of legislative contexts but also language variety itself as an underexplored variable. Finally, the study reveals some of the weaknesses of default readability formulae, which are largely unable to register syntactic variation between the varieties of English in the reports and demonstrates the merits of NLP in report readability analysis as well as the need for more accessible sustainability reporting.
- corpus linguistics
- sustainability reporting
- language variety
- natural language processing
Smeuninx, N., De Clerck, B., & Aerts, W. (2020). Measuring the readability of sustainability reports: A corpus-based analysis through standard formulae and NLP. International Journal of Business Communication, 57(1), 52-85. https://doi.org/10.1177/2329488416675456