Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

Maurits Kaptein, Paul Ketelaar

Research output: Working paperOther research output

2 Downloads (Pure)

Abstract

In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We propose a method for estimating a finite mixture of logistic regression models which can be used to cluster customers based on a continuous stream of responses. This method, which we coin oFMLR, allows segments to be identified in data streams or extremely large static datasets. Contrary to black box algorithms, oFMLR provides model estimates that are directly interpretable. We first introduce oFMLR, explaining in passing general topics such as online estimation and the EM algorithm, making this paper a high level overview of possible methods of dealing with large data streams in marketing practice. Next, we discuss model convergence, identifiability, and relations to alternative, Bayesian, methods; we also identify more general issues that arise from dealing with continuously augmented data sets. Finally, we introduce the oFMLR [R] package and evaluate the method by numerical simulation and by analyzing a large customer clickstream dataset.
Original languageEnglish
PublisherarXiv.org
Number of pages32
Publication statusPublished - 28 Feb 2018

Publication series

NamearXiv

Fingerprint

Maximum likelihood estimation
Logistics
Marketing
Computer simulation

Keywords

  • cs.LG
  • stat.CO
  • stat.ML

Cite this

@techreport{37beebfc12b24ad7ba6e63ca1e690173,
title = "Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream",
abstract = "In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We propose a method for estimating a finite mixture of logistic regression models which can be used to cluster customers based on a continuous stream of responses. This method, which we coin oFMLR, allows segments to be identified in data streams or extremely large static datasets. Contrary to black box algorithms, oFMLR provides model estimates that are directly interpretable. We first introduce oFMLR, explaining in passing general topics such as online estimation and the EM algorithm, making this paper a high level overview of possible methods of dealing with large data streams in marketing practice. Next, we discuss model convergence, identifiability, and relations to alternative, Bayesian, methods; we also identify more general issues that arise from dealing with continuously augmented data sets. Finally, we introduce the oFMLR [R] package and evaluate the method by numerical simulation and by analyzing a large customer clickstream dataset.",
keywords = "cs.LG, stat.CO, stat.ML",
author = "Maurits Kaptein and Paul Ketelaar",
note = "1 figure. Working paper including [R] package",
year = "2018",
month = "2",
day = "28",
language = "English",
series = "arXiv",
publisher = "arXiv.org",
type = "WorkingPaper",
institution = "arXiv.org",

}

Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream. / Kaptein, Maurits; Ketelaar, Paul.

arXiv.org, 2018. (arXiv).

Research output: Working paperOther research output

TY - UNPB

T1 - Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

AU - Kaptein, Maurits

AU - Ketelaar, Paul

N1 - 1 figure. Working paper including [R] package

PY - 2018/2/28

Y1 - 2018/2/28

N2 - In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We propose a method for estimating a finite mixture of logistic regression models which can be used to cluster customers based on a continuous stream of responses. This method, which we coin oFMLR, allows segments to be identified in data streams or extremely large static datasets. Contrary to black box algorithms, oFMLR provides model estimates that are directly interpretable. We first introduce oFMLR, explaining in passing general topics such as online estimation and the EM algorithm, making this paper a high level overview of possible methods of dealing with large data streams in marketing practice. Next, we discuss model convergence, identifiability, and relations to alternative, Bayesian, methods; we also identify more general issues that arise from dealing with continuously augmented data sets. Finally, we introduce the oFMLR [R] package and evaluate the method by numerical simulation and by analyzing a large customer clickstream dataset.

AB - In marketing we are often confronted with a continuous stream of responses to marketing messages. Such streaming data provide invaluable information regarding message effectiveness and segmentation. However, streaming data are hard to analyze using conventional methods: their high volume and the fact that they are continuously augmented means that it takes considerable time to analyze them. We propose a method for estimating a finite mixture of logistic regression models which can be used to cluster customers based on a continuous stream of responses. This method, which we coin oFMLR, allows segments to be identified in data streams or extremely large static datasets. Contrary to black box algorithms, oFMLR provides model estimates that are directly interpretable. We first introduce oFMLR, explaining in passing general topics such as online estimation and the EM algorithm, making this paper a high level overview of possible methods of dealing with large data streams in marketing practice. Next, we discuss model convergence, identifiability, and relations to alternative, Bayesian, methods; we also identify more general issues that arise from dealing with continuously augmented data sets. Finally, we introduce the oFMLR [R] package and evaluate the method by numerical simulation and by analyzing a large customer clickstream dataset.

KW - cs.LG

KW - stat.CO

KW - stat.ML

M3 - Working paper

T3 - arXiv

BT - Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

PB - arXiv.org

ER -