Beyond the number of classes: Separating substantive from non-substantive dependence in latent class analysis

Daniel Oberski

Research output: Contribution to journalArticleScientificpeer-review

33 Citations (Scopus)
216 Downloads (Pure)

Abstract

Latent class analysis (LCA) for categorical data is a model-based clustering and classification technique applied in a wide range of fields including the social sciences, machine learning, psychiatry, public health, and epidemiology. Its central assumption is conditional independence of the indicators given the latent class, i.e. "local independence"; violations can appear as model misfit, often leading LCA practitioners to increase the number of classes. However, when not all of the local dependence is of substantive scientific interest this leads to two options, that are both problematic: modeling uninterpretable classes, or retaining a lower number of substantive classes but incurring bias in the final results and classifications of interest due to remaining assumption violations. This paper suggests an alternative procedure, applicable in cases when the number of substantive classes is known in advance, or when substantive interest is otherwise well-defined. I suggest, in such cases, to model substantive local dependencies as additional discrete latent variables, while absorbing nuisance dependencies in additional parameters. An example application to the estimation of misclassification and turnover rates of the decision to vote in elections of 9510 Dutch residents demonstrates the advantages of this procedure relative to increasing the number of classes.
Original languageEnglish
Pages (from-to)171-182
JournalAdvances in Data Analysis and Classification
Volume10
Issue number2
DOIs
Publication statusPublished - 2015

Fingerprint

Dive into the research topics of 'Beyond the number of classes: Separating substantive from non-substantive dependence in latent class analysis'. Together they form a unique fingerprint.

Cite this