Subset Selection from Large Datasets for Kriging Modeling

G. Rennen

Research output: Working paperDiscussion paperOther research output

247 Downloads (Pure)

Abstract

When building a Kriging model, the general intuition is that using more data will always result in a better model. However, we show that when we have a large non-uniform dataset, using a uniform subset can have several advantages. Reducing the time necessary to fit the model, avoiding numerical inaccuracies and improving the robustness with respect to errors in the output data are some aspects which can be improved by using a uniform subset. We furthermore describe several new and current methods for selecting a uniform subset. These methods are tested and compared on several artificial datasets and one real life dataset. The comparison shows how the selected subsets affect different aspects of the resulting Kriging model. As none of the subset selection methods performs best on all criteria, the best method to choose depends on how the different aspects are valued. The comparison made in this paper can be used to facilitate the user in making a good choice.
Original languageEnglish
Place of PublicationTilburg
PublisherOperations research
Number of pages24
Volume2008-26
Publication statusPublished - 2008

Publication series

NameCentER Discussion Paper
Volume2008-26

Keywords

  • Design of computer experiments
  • dispersion problem
  • Kriging model
  • large non-uniform datasets
  • radial basis functions
  • robustness
  • space filling
  • subset selection
  • uniformity

Cite this

Rennen, G. (2008). Subset Selection from Large Datasets for Kriging Modeling. (CentER Discussion Paper; Vol. 2008-26). Tilburg: Operations research.
Rennen, G. / Subset Selection from Large Datasets for Kriging Modeling. Tilburg : Operations research, 2008. (CentER Discussion Paper).
@techreport{9dfe6396193345c0b4e35ead41ba8213,
title = "Subset Selection from Large Datasets for Kriging Modeling",
abstract = "When building a Kriging model, the general intuition is that using more data will always result in a better model. However, we show that when we have a large non-uniform dataset, using a uniform subset can have several advantages. Reducing the time necessary to fit the model, avoiding numerical inaccuracies and improving the robustness with respect to errors in the output data are some aspects which can be improved by using a uniform subset. We furthermore describe several new and current methods for selecting a uniform subset. These methods are tested and compared on several artificial datasets and one real life dataset. The comparison shows how the selected subsets affect different aspects of the resulting Kriging model. As none of the subset selection methods performs best on all criteria, the best method to choose depends on how the different aspects are valued. The comparison made in this paper can be used to facilitate the user in making a good choice.",
keywords = "Design of computer experiments, dispersion problem, Kriging model, large non-uniform datasets, radial basis functions, robustness, space filling, subset selection, uniformity",
author = "G. Rennen",
note = "Pagination: 24",
year = "2008",
language = "English",
volume = "2008-26",
series = "CentER Discussion Paper",
publisher = "Operations research",
type = "WorkingPaper",
institution = "Operations research",

}

Rennen, G 2008 'Subset Selection from Large Datasets for Kriging Modeling' CentER Discussion Paper, vol. 2008-26, Operations research, Tilburg.

Subset Selection from Large Datasets for Kriging Modeling. / Rennen, G.

Tilburg : Operations research, 2008. (CentER Discussion Paper; Vol. 2008-26).

Research output: Working paperDiscussion paperOther research output

TY - UNPB

T1 - Subset Selection from Large Datasets for Kriging Modeling

AU - Rennen, G.

N1 - Pagination: 24

PY - 2008

Y1 - 2008

N2 - When building a Kriging model, the general intuition is that using more data will always result in a better model. However, we show that when we have a large non-uniform dataset, using a uniform subset can have several advantages. Reducing the time necessary to fit the model, avoiding numerical inaccuracies and improving the robustness with respect to errors in the output data are some aspects which can be improved by using a uniform subset. We furthermore describe several new and current methods for selecting a uniform subset. These methods are tested and compared on several artificial datasets and one real life dataset. The comparison shows how the selected subsets affect different aspects of the resulting Kriging model. As none of the subset selection methods performs best on all criteria, the best method to choose depends on how the different aspects are valued. The comparison made in this paper can be used to facilitate the user in making a good choice.

AB - When building a Kriging model, the general intuition is that using more data will always result in a better model. However, we show that when we have a large non-uniform dataset, using a uniform subset can have several advantages. Reducing the time necessary to fit the model, avoiding numerical inaccuracies and improving the robustness with respect to errors in the output data are some aspects which can be improved by using a uniform subset. We furthermore describe several new and current methods for selecting a uniform subset. These methods are tested and compared on several artificial datasets and one real life dataset. The comparison shows how the selected subsets affect different aspects of the resulting Kriging model. As none of the subset selection methods performs best on all criteria, the best method to choose depends on how the different aspects are valued. The comparison made in this paper can be used to facilitate the user in making a good choice.

KW - Design of computer experiments

KW - dispersion problem

KW - Kriging model

KW - large non-uniform datasets

KW - radial basis functions

KW - robustness

KW - space filling

KW - subset selection

KW - uniformity

M3 - Discussion paper

VL - 2008-26

T3 - CentER Discussion Paper

BT - Subset Selection from Large Datasets for Kriging Modeling

PB - Operations research

CY - Tilburg

ER -

Rennen G. Subset Selection from Large Datasets for Kriging Modeling. Tilburg: Operations research. 2008. (CentER Discussion Paper).