Abstract
Research Summary
Human coding of unstructured text can enable scholars to measure complex latent constructs for use in empirical analysis, but also requires substantial time and resources that limit the number and sample sizes of studies using this approach. We demonstrate how supervised machine learning (ML) can overcome these constraints by allowing scholars to scale human-coded data. Using board leadership as an illustrative context, we apply this method to create a large-scale dataset (N = 22,388) from smaller scale human codings of CEO duality and board chair orientations from company proxy statements. We further demonstrate the potential value of this approach by using the resulting dataset to examine the relationships among board leadership, firm performance, and CEO dismissal. The ML code and dataset are available at 10.5281/zenodo.7304697.
Managerial Summary
Manually converting unstructured text into usable data requires considerable time and resources. This paper outlines a replicable process for applying supervised machine learning (ML) to overcome these constraints by scaling manually coded data. While ML is often used to identify patterns or predict relationships within a given dataset, we show how scholars and practitioners can build valuable custom algorithms at an earlier stage in the process—when first building a dataset. We illustrate this approach by training ML algorithms to replicate human codings of CEO duality and board chair control and collaboration orientations from over 22,000 company filings. We then show how this approach can support new knowledge development by using these data to explore the relationships among board leadership, company performance, and CEO dismissal.
Human coding of unstructured text can enable scholars to measure complex latent constructs for use in empirical analysis, but also requires substantial time and resources that limit the number and sample sizes of studies using this approach. We demonstrate how supervised machine learning (ML) can overcome these constraints by allowing scholars to scale human-coded data. Using board leadership as an illustrative context, we apply this method to create a large-scale dataset (N = 22,388) from smaller scale human codings of CEO duality and board chair orientations from company proxy statements. We further demonstrate the potential value of this approach by using the resulting dataset to examine the relationships among board leadership, firm performance, and CEO dismissal. The ML code and dataset are available at 10.5281/zenodo.7304697.
Managerial Summary
Manually converting unstructured text into usable data requires considerable time and resources. This paper outlines a replicable process for applying supervised machine learning (ML) to overcome these constraints by scaling manually coded data. While ML is often used to identify patterns or predict relationships within a given dataset, we show how scholars and practitioners can build valuable custom algorithms at an earlier stage in the process—when first building a dataset. We illustrate this approach by training ML algorithms to replicate human codings of CEO duality and board chair control and collaboration orientations from over 22,000 company filings. We then show how this approach can support new knowledge development by using these data to explore the relationships among board leadership, company performance, and CEO dismissal.
| Original language | English |
|---|---|
| Pages (from-to) | 1780-1802 |
| Journal | Strategic Management Journal |
| Volume | 44 |
| Issue number | 7 |
| Early online date | Dec 2022 |
| DOIs | |
| Publication status | Published - Jul 2023 |
Keywords
- CEO duality
- board leadership
- corporate governance
- machine learning
- measurement
Fingerprint
Dive into the research topics of 'Using supervised machine learning to scale human‐coded data: A method and dataset in the board leadership context'. Together they form a unique fingerprint.Research output
- 13 Citations
- 1 Database
-
Board Leadership Database (U.S. Public Firms) + ML Script for Scaling Human Coded Data
Harrison, J. S., Josefy, M. A., Kalm, M. & Krause, R., 2022Research output: Online publication or Non-textual form › Database
Open Access
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver