Estimating multilevel models on data streams

L. Ippel*, M. C. Kaptein, J. K. Vermunt

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or row-by-row). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.

Original languageEnglish
Pages (from-to)41-64
JournalPsychometrika
Volume84
Issue number1
DOIs
Publication statusPublished - 2019

Keywords

  • Algorithms
  • Body Weight
  • Computer Simulation
  • Data Interpretation, Statistical
  • Female
  • Humans
  • Longitudinal Studies
  • Male
  • Multilevel Analysis

Cite this

@article{864d6f7a0a3440ffa28e90d121646e3d,
title = "Estimating multilevel models on data streams",
abstract = "Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or row-by-row). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.",
keywords = "Algorithms, Body Weight, Computer Simulation, Data Interpretation, Statistical, Female, Humans, Longitudinal Studies, Male, Multilevel Analysis",
author = "L. Ippel and Kaptein, {M. C.} and Vermunt, {J. K.}",
year = "2019",
doi = "10.1007/s11336-018-09656-z",
language = "English",
volume = "84",
pages = "41--64",
journal = "Psychometrika",
issn = "0033-3123",
publisher = "Springer",
number = "1",

}

Estimating multilevel models on data streams. / Ippel, L.; Kaptein, M. C.; Vermunt, J. K.

In: Psychometrika, Vol. 84, No. 1, 2019, p. 41-64.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Estimating multilevel models on data streams

AU - Ippel, L.

AU - Kaptein, M. C.

AU - Vermunt, J. K.

PY - 2019

Y1 - 2019

N2 - Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or row-by-row). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.

AB - Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or row-by-row). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.

KW - Algorithms

KW - Body Weight

KW - Computer Simulation

KW - Data Interpretation, Statistical

KW - Female

KW - Humans

KW - Longitudinal Studies

KW - Male

KW - Multilevel Analysis

U2 - 10.1007/s11336-018-09656-z

DO - 10.1007/s11336-018-09656-z

M3 - Article

C2 - 30671789

VL - 84

SP - 41

EP - 64

JO - Psychometrika

JF - Psychometrika

SN - 0033-3123

IS - 1

ER -