TY - JOUR
T1 - Healthcare expenditure prediction with neighbourhood variables
T2 - A random forest model
AU - Mohnen, S. M.
AU - Rotteveel, A. H.
AU - Doornbos, G.
AU - Polder, J. J.
PY - 2020
Y1 - 2020
N2 - We investigated the additional predictive value of an individual’s neighbourhood (quality and location), and of changes therein on his/her healthcare costs. To this end, we combined several Dutch nationwide data sources from 2003 to 2014, and selected inhabitants who moved in 2010. We used random forest models to predict the area under the curve of the regular healthcare costs of individuals in the years 2011–2014. In our analyses, the quality of the neighbourhood before the move appeared to be quite important in predicting healthcare costs (i.e. importance rank 11 out of 126 socio-demographic and neighbourhood variables; rank 73 out of 261 in the full model with prior expenditure and medication). The predictive performance of the models was evaluated in terms of R2 (or proportion of explained variance) and MAE (mean absolute (prediction) error). The model containing only socio-demographic information improved marginally when neighbourhood was added (R2 +0.8%, MAE −€5). The full model remained the same for the study population (R2 = 48.8%, MAE of €1556) and for subpopulations. These results indicate that only in prediction models in which prior expenditure and utilization cannot or ought not to be used neighbourhood might be an interesting source of information to improve predictive performance.
AB - We investigated the additional predictive value of an individual’s neighbourhood (quality and location), and of changes therein on his/her healthcare costs. To this end, we combined several Dutch nationwide data sources from 2003 to 2014, and selected inhabitants who moved in 2010. We used random forest models to predict the area under the curve of the regular healthcare costs of individuals in the years 2011–2014. In our analyses, the quality of the neighbourhood before the move appeared to be quite important in predicting healthcare costs (i.e. importance rank 11 out of 126 socio-demographic and neighbourhood variables; rank 73 out of 261 in the full model with prior expenditure and medication). The predictive performance of the models was evaluated in terms of R2 (or proportion of explained variance) and MAE (mean absolute (prediction) error). The model containing only socio-demographic information improved marginally when neighbourhood was added (R2 +0.8%, MAE −€5). The full model remained the same for the study population (R2 = 48.8%, MAE of €1556) and for subpopulations. These results indicate that only in prediction models in which prior expenditure and utilization cannot or ought not to be used neighbourhood might be an interesting source of information to improve predictive performance.
U2 - 10.1515/spp-2019-0010
DO - 10.1515/spp-2019-0010
M3 - Article
SN - 2151-7509
VL - 11
JO - Statistics, Politics and Policy
JF - Statistics, Politics and Policy
IS - 2
ER -