Bridging centrality and extremity: Refining empirical data depth using extreme value statistics

John Einmahl, Jun Li, Liu R.Y.

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Data depth measures the centrality of a point with respect to a given distribution or data cloud. It provides a natural center-outward ordering of multivariate data points and yields a systematic nonparametric multivariate analysis scheme. In particular, the halfspace depth is shown to have many desirable properties and broad applicability. However, the empirical halfspace depth is zero outside the convex hull of the data. This property has rendered the empirical halfspace depth useless outside the data cloud, and limited its utility in applications where the extreme outlying probability mass is the focal point, such as in classification problems and control charts with very small false alarm rates. To address this issue, we apply extreme value statistics to refine the empirical halfspace depth in “the tail”. This provides an important linkage between data depth, which is useful for inference on centrality, and extreme value statistics, which is useful for inference on extremity. The refined empirical halfspace depth can thus extend all its utilities beyond the data cloud, and hence broaden greatly its applicability. The refined estimator is shown to have substantially improved upon the empirical estimator in theory and simulations. The benefit of this improvement is also demonstrated through the applications in classification and statistical process control.
Original languageEnglish
Pages (from-to)2738-2765
JournalThe Annals of Statistics
Volume43
Issue number6
DOIs
Publication statusPublished - 2015

Fingerprint

Halfspace Depth
Extreme Value Statistics
Data Depth
Centrality
Empirical Estimator
Statistical Process Control
False Alarm Rate
Multivariate Analysis
Control Charts
Multivariate Data
Convex Hull
Classification Problems
Linkage
Tail
Extremes
Empirical data
Extreme value statistics
Estimator
Zero
Simulation

Keywords

  • Depth
  • extremes
  • nonparametric classification
  • nonparametric multivariate SPC
  • tail

Cite this

@article{17b2279234774fc4b91f0d2f0ef2f15d,
title = "Bridging centrality and extremity: Refining empirical data depth using extreme value statistics",
abstract = "Data depth measures the centrality of a point with respect to a given distribution or data cloud. It provides a natural center-outward ordering of multivariate data points and yields a systematic nonparametric multivariate analysis scheme. In particular, the halfspace depth is shown to have many desirable properties and broad applicability. However, the empirical halfspace depth is zero outside the convex hull of the data. This property has rendered the empirical halfspace depth useless outside the data cloud, and limited its utility in applications where the extreme outlying probability mass is the focal point, such as in classification problems and control charts with very small false alarm rates. To address this issue, we apply extreme value statistics to refine the empirical halfspace depth in “the tail”. This provides an important linkage between data depth, which is useful for inference on centrality, and extreme value statistics, which is useful for inference on extremity. The refined empirical halfspace depth can thus extend all its utilities beyond the data cloud, and hence broaden greatly its applicability. The refined estimator is shown to have substantially improved upon the empirical estimator in theory and simulations. The benefit of this improvement is also demonstrated through the applications in classification and statistical process control.",
keywords = "Depth, extremes, nonparametric classification, nonparametric multivariate SPC, tail",
author = "John Einmahl and Jun Li and Liu R.Y.",
year = "2015",
doi = "10.1214/15-AOS1359",
language = "English",
volume = "43",
pages = "2738--2765",
journal = "The Annals of Statistics",
issn = "0090-5364",
publisher = "Institute of Mathematical Statistics",
number = "6",

}

Bridging centrality and extremity : Refining empirical data depth using extreme value statistics. / Einmahl, John; Li, Jun; R.Y., Liu.

In: The Annals of Statistics, Vol. 43, No. 6, 2015, p. 2738-2765.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Bridging centrality and extremity

T2 - Refining empirical data depth using extreme value statistics

AU - Einmahl, John

AU - Li, Jun

AU - R.Y., Liu

PY - 2015

Y1 - 2015

N2 - Data depth measures the centrality of a point with respect to a given distribution or data cloud. It provides a natural center-outward ordering of multivariate data points and yields a systematic nonparametric multivariate analysis scheme. In particular, the halfspace depth is shown to have many desirable properties and broad applicability. However, the empirical halfspace depth is zero outside the convex hull of the data. This property has rendered the empirical halfspace depth useless outside the data cloud, and limited its utility in applications where the extreme outlying probability mass is the focal point, such as in classification problems and control charts with very small false alarm rates. To address this issue, we apply extreme value statistics to refine the empirical halfspace depth in “the tail”. This provides an important linkage between data depth, which is useful for inference on centrality, and extreme value statistics, which is useful for inference on extremity. The refined empirical halfspace depth can thus extend all its utilities beyond the data cloud, and hence broaden greatly its applicability. The refined estimator is shown to have substantially improved upon the empirical estimator in theory and simulations. The benefit of this improvement is also demonstrated through the applications in classification and statistical process control.

AB - Data depth measures the centrality of a point with respect to a given distribution or data cloud. It provides a natural center-outward ordering of multivariate data points and yields a systematic nonparametric multivariate analysis scheme. In particular, the halfspace depth is shown to have many desirable properties and broad applicability. However, the empirical halfspace depth is zero outside the convex hull of the data. This property has rendered the empirical halfspace depth useless outside the data cloud, and limited its utility in applications where the extreme outlying probability mass is the focal point, such as in classification problems and control charts with very small false alarm rates. To address this issue, we apply extreme value statistics to refine the empirical halfspace depth in “the tail”. This provides an important linkage between data depth, which is useful for inference on centrality, and extreme value statistics, which is useful for inference on extremity. The refined empirical halfspace depth can thus extend all its utilities beyond the data cloud, and hence broaden greatly its applicability. The refined estimator is shown to have substantially improved upon the empirical estimator in theory and simulations. The benefit of this improvement is also demonstrated through the applications in classification and statistical process control.

KW - Depth

KW - extremes

KW - nonparametric classification

KW - nonparametric multivariate SPC

KW - tail

U2 - 10.1214/15-AOS1359

DO - 10.1214/15-AOS1359

M3 - Article

VL - 43

SP - 2738

EP - 2765

JO - The Annals of Statistics

JF - The Annals of Statistics

SN - 0090-5364

IS - 6

ER -