Estimating random-intercept models on data streams

G.J.E. Ippel, M.C. Kaptein, J.K. Vermunt

Research output: Contribution to journalArticleScientificpeer-review

5 Citations (Scopus)

Abstract

Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within individuals. Currently, multilevel models are mostly fit to static datasets. However, recent technological advances in the measurement of social phenomena have led to data arriving in a continuous fashion (i.e., data streams). In these situations the data collection is never “finished”. Traditional methods of fitting multilevel models are ill-suited for the analysis of data streams because of their computational complexity. A novel algorithm for estimating random-intercept models is introduced. The Streaming EM Approximation (SEMA) algorithm is a fully-online (row-by-row) method enabling computationally-efficient estimation of random-intercept models. SEMA is tested in two simulation studies, and applied to longitudinal data regarding individuals’ happiness collected continuously using smart phones. SEMA shows competitive statistical performance to existing static approaches, but with large computational benefits. The introduction of this method allows researchers to broaden the scope of their research, by using data streams.
Original languageEnglish
Pages (from-to)169–182
JournalComputational Statistics & Data Analysis
Volume104
DOIs
Publication statusPublished - 2016

Keywords

  • Data streams
  • Expectation-Maximization algorithm
  • Multilevel models
  • Online learning
  • Random-Intercept model

Fingerprint

Dive into the research topics of 'Estimating random-intercept models on data streams'. Together they form a unique fingerprint.

Cite this