Abstract
This dissertation addresses statistical questions in the fields of machine learning
and insurance fraud detection. Random forests are a popular method used in many fields to make predictions. They are averages of decision trees with two layers of randomization: data sampling and split randomization. The first two chapters separately study the effectiveness of each randomization. In particular, the first chapter establishes pointwise consistency of trees and illustrates that growing large trees in ensembles, often the default choice in practice, is not always a good idea. The second chapter looks at the effectiveness of split randomization for different data characteristics. While prior literature has focused on the amount of noise in the data, this chapter offers a novel perspective on forest performance by showing that randomization is effective in the presence of irrelevant and correlated covariates, possibly opening the way for a better understanding of why random forests work well in many applications. The third chapter, separate from the first two, studies how insurance companies can choose which claims to investigate for fraud. Originated from a collaboration with Achmea, this chapter formalizes selection mechanisms, illustrates that selecting based on prior beliefs can lead to inconsistent learning of fraud characteristics and proposes a
randomized strategy conjectured to be consistent.
and insurance fraud detection. Random forests are a popular method used in many fields to make predictions. They are averages of decision trees with two layers of randomization: data sampling and split randomization. The first two chapters separately study the effectiveness of each randomization. In particular, the first chapter establishes pointwise consistency of trees and illustrates that growing large trees in ensembles, often the default choice in practice, is not always a good idea. The second chapter looks at the effectiveness of split randomization for different data characteristics. While prior literature has focused on the amount of noise in the data, this chapter offers a novel perspective on forest performance by showing that randomization is effective in the presence of irrelevant and correlated covariates, possibly opening the way for a better understanding of why random forests work well in many applications. The third chapter, separate from the first two, studies how insurance companies can choose which claims to investigate for fraud. Originated from a collaboration with Achmea, this chapter formalizes selection mechanisms, illustrates that selecting based on prior beliefs can lead to inconsistent learning of fraud characteristics and proposes a
randomized strategy conjectured to be consistent.
| Original language | English |
|---|---|
| Qualification | Doctor of Philosophy |
| Awarding Institution |
|
| Supervisors/Advisors |
|
| Award date | 12 Sept 2025 |
| Place of Publication | Tilburg |
| Publisher | |
| Print ISBNs | 978 90 5668 780 9 |
| DOIs | |
| Publication status | Published - 2025 |
Fingerprint
Dive into the research topics of 'Essays on Consistency and Randomization in Machine Learning and Fraud Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver