Estimating random-intercept models on data streams

G.J.E. Ippel, M.C. Kaptein, J.K. Vermunt

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within individuals. Currently, multilevel models are mostly fit to static datasets. However, recent technological advances in the measurement of social phenomena have led to data arriving in a continuous fashion (i.e., data streams). In these situations the data collection is never “finished”. Traditional methods of fitting multilevel models are ill-suited for the analysis of data streams because of their computational complexity. A novel algorithm for estimating random-intercept models is introduced. The Streaming EM Approximation (SEMA) algorithm is a fully-online (row-by-row) method enabling computationally-efficient estimation of random-intercept models. SEMA is tested in two simulation studies, and applied to longitudinal data regarding individuals’ happiness collected continuously using smart phones. SEMA shows competitive statistical performance to existing static approaches, but with large computational benefits. The introduction of this method allows researchers to broaden the scope of their research, by using data streams.
Original languageEnglish
Pages (from-to)169–182
JournalComputational Statistics & Data Analysis
Volume104
DOIs
Publication statusPublished - 2016

Fingerprint

Multilevel Models
Intercept
Data Streams
Streaming
Grouped Data
Efficient Estimation
Longitudinal Data
EM Algorithm
Approximation
Approximation Algorithms
Computational Complexity
Simulation Study
Model
Approximation algorithms
Computational complexity

Keywords

  • Data streams
  • Expectation-Maximization algorithm
  • Multilevel models
  • Online learning
  • Random-Intercept model

Cite this

@article{9ed5fdb193be423d9912bc60dc638c40,
title = "Estimating random-intercept models on data streams",
abstract = "Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within individuals. Currently, multilevel models are mostly fit to static datasets. However, recent technological advances in the measurement of social phenomena have led to data arriving in a continuous fashion (i.e., data streams). In these situations the data collection is never “finished”. Traditional methods of fitting multilevel models are ill-suited for the analysis of data streams because of their computational complexity. A novel algorithm for estimating random-intercept models is introduced. The Streaming EM Approximation (SEMA) algorithm is a fully-online (row-by-row) method enabling computationally-efficient estimation of random-intercept models. SEMA is tested in two simulation studies, and applied to longitudinal data regarding individuals’ happiness collected continuously using smart phones. SEMA shows competitive statistical performance to existing static approaches, but with large computational benefits. The introduction of this method allows researchers to broaden the scope of their research, by using data streams.",
keywords = "Data streams, Expectation-Maximization algorithm, Multilevel models, Online learning, Random-Intercept model",
author = "G.J.E. Ippel and M.C. Kaptein and J.K. Vermunt",
year = "2016",
doi = "10.1016/j.csda.2016.06.008",
language = "English",
volume = "104",
pages = "169–182",
journal = "Computational Statistics & Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",

}

Estimating random-intercept models on data streams. / Ippel, G.J.E.; Kaptein, M.C.; Vermunt, J.K.

In: Computational Statistics & Data Analysis, Vol. 104, 2016, p. 169–182.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Estimating random-intercept models on data streams

AU - Ippel, G.J.E.

AU - Kaptein, M.C.

AU - Vermunt, J.K.

PY - 2016

Y1 - 2016

N2 - Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within individuals. Currently, multilevel models are mostly fit to static datasets. However, recent technological advances in the measurement of social phenomena have led to data arriving in a continuous fashion (i.e., data streams). In these situations the data collection is never “finished”. Traditional methods of fitting multilevel models are ill-suited for the analysis of data streams because of their computational complexity. A novel algorithm for estimating random-intercept models is introduced. The Streaming EM Approximation (SEMA) algorithm is a fully-online (row-by-row) method enabling computationally-efficient estimation of random-intercept models. SEMA is tested in two simulation studies, and applied to longitudinal data regarding individuals’ happiness collected continuously using smart phones. SEMA shows competitive statistical performance to existing static approaches, but with large computational benefits. The introduction of this method allows researchers to broaden the scope of their research, by using data streams.

AB - Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within individuals. Currently, multilevel models are mostly fit to static datasets. However, recent technological advances in the measurement of social phenomena have led to data arriving in a continuous fashion (i.e., data streams). In these situations the data collection is never “finished”. Traditional methods of fitting multilevel models are ill-suited for the analysis of data streams because of their computational complexity. A novel algorithm for estimating random-intercept models is introduced. The Streaming EM Approximation (SEMA) algorithm is a fully-online (row-by-row) method enabling computationally-efficient estimation of random-intercept models. SEMA is tested in two simulation studies, and applied to longitudinal data regarding individuals’ happiness collected continuously using smart phones. SEMA shows competitive statistical performance to existing static approaches, but with large computational benefits. The introduction of this method allows researchers to broaden the scope of their research, by using data streams.

KW - Data streams

KW - Expectation-Maximization algorithm

KW - Multilevel models

KW - Online learning

KW - Random-Intercept model

U2 - 10.1016/j.csda.2016.06.008

DO - 10.1016/j.csda.2016.06.008

M3 - Article

VL - 104

SP - 169

EP - 182

JO - Computational Statistics & Data Analysis

JF - Computational Statistics & Data Analysis

SN - 0167-9473

ER -