Prediction for Big Data through Kriging: Small Sequential and One-Shot Designs

J.P.C. Kleijnen, W.C.M. van Beers

Research output: Working paperDiscussion paperOther research output

Abstract

Kriging or Gaussian process (GP) modeling is an interpolation method that
assumes the outputs (responses) are more correlated, the closer the inputs (ex-
planatory or independent variables) are. A GP has unknown (hyper)parameters
that must be estimated; the standard estimation method uses the "maximum
likelihood" criterion. However, big data make it hard to compute the estimates
of these GP parameters, and the resulting Kriging predictor and the variance
of this predictor. To solve this problem, some authors select a relatively small
subset from the big set of previously observed "old" data; their method is se-
quential and depends on the variance of the Kriging predictor. The resulting
designs turn out to be "local"; i.e., most design points are concentrated around
the point to be predicted. We develop three alternative one-shot methods that
do not depend on GP parameters: (i) select a small subset such that this sub-
set still covers the original input space–albeit coarser; (ii) select a subset with
relatively many— but not all— combinations close to the new combination that
is to be predicted, and (iii) select a subset with the nearest neighbors (NNs)
of this new combination. To evaluate these designs, we compare their squared
prediction errors in several numerical (Monte Carlo) experiments. These experi-
ments show that our NN design is a viable alternative for the more sophisticated
sequential designs.
LanguageEnglish
Place of PublicationTilburg
PublisherCentER, Center for Economic Research
Number of pages43
Volume2018-022
StatePublished - 9 Jul 2018

Publication series

NameCentER Discussion Paper
Volume2018-022

Fingerprint

Maximum likelihood
Interpolation
Big data
Experiments

Keywords

  • kriging
  • Gaussian process
  • big data
  • experimental design
  • nearest neighbor

Cite this

Kleijnen, J. P. C., & van Beers, W. C. M. (2018). Prediction for Big Data through Kriging: Small Sequential and One-Shot Designs. (CentER Discussion Paper; Vol. 2018-022). Tilburg: CentER, Center for Economic Research.
Kleijnen, J.P.C. ; van Beers, W.C.M./ Prediction for Big Data through Kriging : Small Sequential and One-Shot Designs. Tilburg : CentER, Center for Economic Research, 2018. (CentER Discussion Paper).
@techreport{b0504930f51844f7908c6a147cef26bd,
title = "Prediction for Big Data through Kriging: Small Sequential and One-Shot Designs",
abstract = "Kriging or Gaussian process (GP) modeling is an interpolation method thatassumes the outputs (responses) are more correlated, the closer the inputs (ex-planatory or independent variables) are. A GP has unknown (hyper)parametersthat must be estimated; the standard estimation method uses the {"}maximumlikelihood{"} criterion. However, big data make it hard to compute the estimatesof these GP parameters, and the resulting Kriging predictor and the varianceof this predictor. To solve this problem, some authors select a relatively smallsubset from the big set of previously observed {"}old{"} data; their method is se-quential and depends on the variance of the Kriging predictor. The resultingdesigns turn out to be {"}local{"}; i.e., most design points are concentrated aroundthe point to be predicted. We develop three alternative one-shot methods thatdo not depend on GP parameters: (i) select a small subset such that this sub-set still covers the original input space–albeit coarser; (ii) select a subset withrelatively many— but not all— combinations close to the new combination thatis to be predicted, and (iii) select a subset with the nearest neighbors (NNs)of this new combination. To evaluate these designs, we compare their squaredprediction errors in several numerical (Monte Carlo) experiments. These experi-ments show that our NN design is a viable alternative for the more sophisticatedsequential designs.",
keywords = "kriging, Gaussian process, big data, experimental design, nearest neighbor",
author = "J.P.C. Kleijnen and {van Beers}, W.C.M.",
note = "CentER Discussion Paper Nr. 2018-022",
year = "2018",
month = "7",
day = "9",
language = "English",
volume = "2018-022",
series = "CentER Discussion Paper",
publisher = "CentER, Center for Economic Research",
type = "WorkingPaper",
institution = "CentER, Center for Economic Research",

}

Kleijnen, JPC & van Beers, WCM 2018 'Prediction for Big Data through Kriging: Small Sequential and One-Shot Designs' CentER Discussion Paper, vol. 2018-022, CentER, Center for Economic Research, Tilburg.

Prediction for Big Data through Kriging : Small Sequential and One-Shot Designs. / Kleijnen, J.P.C.; van Beers, W.C.M.

Tilburg : CentER, Center for Economic Research, 2018. (CentER Discussion Paper; Vol. 2018-022).

Research output: Working paperDiscussion paperOther research output

TY - UNPB

T1 - Prediction for Big Data through Kriging

T2 - Small Sequential and One-Shot Designs

AU - Kleijnen,J.P.C.

AU - van Beers,W.C.M.

N1 - CentER Discussion Paper Nr. 2018-022

PY - 2018/7/9

Y1 - 2018/7/9

N2 - Kriging or Gaussian process (GP) modeling is an interpolation method thatassumes the outputs (responses) are more correlated, the closer the inputs (ex-planatory or independent variables) are. A GP has unknown (hyper)parametersthat must be estimated; the standard estimation method uses the "maximumlikelihood" criterion. However, big data make it hard to compute the estimatesof these GP parameters, and the resulting Kriging predictor and the varianceof this predictor. To solve this problem, some authors select a relatively smallsubset from the big set of previously observed "old" data; their method is se-quential and depends on the variance of the Kriging predictor. The resultingdesigns turn out to be "local"; i.e., most design points are concentrated aroundthe point to be predicted. We develop three alternative one-shot methods thatdo not depend on GP parameters: (i) select a small subset such that this sub-set still covers the original input space–albeit coarser; (ii) select a subset withrelatively many— but not all— combinations close to the new combination thatis to be predicted, and (iii) select a subset with the nearest neighbors (NNs)of this new combination. To evaluate these designs, we compare their squaredprediction errors in several numerical (Monte Carlo) experiments. These experi-ments show that our NN design is a viable alternative for the more sophisticatedsequential designs.

AB - Kriging or Gaussian process (GP) modeling is an interpolation method thatassumes the outputs (responses) are more correlated, the closer the inputs (ex-planatory or independent variables) are. A GP has unknown (hyper)parametersthat must be estimated; the standard estimation method uses the "maximumlikelihood" criterion. However, big data make it hard to compute the estimatesof these GP parameters, and the resulting Kriging predictor and the varianceof this predictor. To solve this problem, some authors select a relatively smallsubset from the big set of previously observed "old" data; their method is se-quential and depends on the variance of the Kriging predictor. The resultingdesigns turn out to be "local"; i.e., most design points are concentrated aroundthe point to be predicted. We develop three alternative one-shot methods thatdo not depend on GP parameters: (i) select a small subset such that this sub-set still covers the original input space–albeit coarser; (ii) select a subset withrelatively many— but not all— combinations close to the new combination thatis to be predicted, and (iii) select a subset with the nearest neighbors (NNs)of this new combination. To evaluate these designs, we compare their squaredprediction errors in several numerical (Monte Carlo) experiments. These experi-ments show that our NN design is a viable alternative for the more sophisticatedsequential designs.

KW - kriging

KW - Gaussian process

KW - big data

KW - experimental design

KW - nearest neighbor

M3 - Discussion paper

VL - 2018-022

T3 - CentER Discussion Paper

BT - Prediction for Big Data through Kriging

PB - CentER, Center for Economic Research

CY - Tilburg

ER -

Kleijnen JPC, van Beers WCM. Prediction for Big Data through Kriging: Small Sequential and One-Shot Designs. Tilburg: CentER, Center for Economic Research. 2018 Jul 9, (CentER Discussion Paper).