### Abstract

Kriging or Gaussian process (GP) modeling is an interpolation method that

assumes the outputs (responses) are more correlated, the closer the inputs (ex-

planatory or independent variables) are. A GP has unknown (hyper)parameters

that must be estimated; the standard estimation method uses the "maximum

likelihood" criterion. However, big data make it hard to compute the estimates

of these GP parameters, and the resulting Kriging predictor and the variance

of this predictor. To solve this problem, some authors select a relatively small

subset from the big set of previously observed "old" data; their method is se-

quential and depends on the variance of the Kriging predictor. The resulting

designs turn out to be "local"; i.e., most design points are concentrated around

the point to be predicted. We develop three alternative one-shot methods that

do not depend on GP parameters: (i) select a small subset such that this sub-

set still covers the original input space–albeit coarser; (ii) select a subset with

relatively many— but not all— combinations close to the new combination that

is to be predicted, and (iii) select a subset with the nearest neighbors (NNs)

of this new combination. To evaluate these designs, we compare their squared

prediction errors in several numerical (Monte Carlo) experiments. These experi-

ments show that our NN design is a viable alternative for the more sophisticated

sequential designs.

assumes the outputs (responses) are more correlated, the closer the inputs (ex-

planatory or independent variables) are. A GP has unknown (hyper)parameters

that must be estimated; the standard estimation method uses the "maximum

likelihood" criterion. However, big data make it hard to compute the estimates

of these GP parameters, and the resulting Kriging predictor and the variance

of this predictor. To solve this problem, some authors select a relatively small

subset from the big set of previously observed "old" data; their method is se-

quential and depends on the variance of the Kriging predictor. The resulting

designs turn out to be "local"; i.e., most design points are concentrated around

the point to be predicted. We develop three alternative one-shot methods that

do not depend on GP parameters: (i) select a small subset such that this sub-

set still covers the original input space–albeit coarser; (ii) select a subset with

relatively many— but not all— combinations close to the new combination that

is to be predicted, and (iii) select a subset with the nearest neighbors (NNs)

of this new combination. To evaluate these designs, we compare their squared

prediction errors in several numerical (Monte Carlo) experiments. These experi-

ments show that our NN design is a viable alternative for the more sophisticated

sequential designs.

Original language | English |
---|---|

Place of Publication | Tilburg |

Publisher | CentER, Center for Economic Research |

Number of pages | 43 |

Volume | 2018-022 |

Publication status | Published - 9 Jul 2018 |

### Publication series

Name | CentER Discussion Paper |
---|---|

Volume | 2018-022 |

### Fingerprint

### Keywords

- kriging
- Gaussian process
- big data
- experimental design
- nearest neighbor

### Cite this

Kleijnen, J. P. C., & van Beers, W. C. M. (2018).

*Prediction for Big Data through Kriging: Small Sequential and One-Shot Designs*. (CentER Discussion Paper; Vol. 2018-022). Tilburg: CentER, Center for Economic Research.