Healthcare expenditure prediction with neighbourhood variables: A random forest model

S. M. Mohnen, A. H. Rotteveel*, G. Doornbos, J. J. Polder

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

105 Downloads (Pure)


We investigated the additional predictive value of an individual’s neighbourhood (quality and location), and of changes therein on his/her healthcare costs. To this end, we combined several Dutch nationwide data sources from 2003 to 2014, and selected inhabitants who moved in 2010. We used random forest models to predict the area under the curve of the regular healthcare costs of individuals in the years 2011–2014. In our analyses, the quality of the neighbourhood before the move appeared to be quite important in predicting healthcare costs (i.e. importance rank 11 out of 126 socio-demographic and neighbourhood variables; rank 73 out of 261 in the full model with prior expenditure and medication). The predictive performance of the models was evaluated in terms of R2 (or proportion of explained variance) and MAE (mean absolute (prediction) error). The model containing only socio-demographic information improved marginally when neighbourhood was added (R2 +0.8%, MAE −€5). The full model remained the same for the study population (R2 = 48.8%, MAE of €1556) and for subpopulations. These results indicate that only in prediction models in which prior expenditure and utilization cannot or ought not to be used neighbourhood might be an interesting source of information to improve predictive performance.
Original languageEnglish
Number of pages28
JournalStatistics, Politics and Policy
Issue number2
Publication statusPublished - 2020


Dive into the research topics of 'Healthcare expenditure prediction with neighbourhood variables: A random forest model'. Together they form a unique fingerprint.

Cite this