Sains Malaysiana 43(12)(2014):
1973–1977
Eigenstructure-Based
Angle for Detecting Outliers in Multivariate Data
(Sudut Berasaskan Struktur Eigen untuk Mengesan Titik Terpencil
dalam Data Multivariat)
NAZRINA AZIZ*
UUM College of Arts and Sciences, Universiti Utara Malaysia, 06010
Sintok, Kedah, Malaysia
Diserahkan: 20 Februari 2013/Diterima: 2 Mei 2014
ABSTRACT
There are two main reasons that motivate people to detect outliers;
the first is the researchers' intention; see the example of Mr Haldum’s
cases in Barnett and Lewis. The second is the effect of outliers
on analyses. This article does not differentiate between the various
justifications for outlier detection. The aim was to advise the
analyst about observations that are isolated from the other observations
in the data set. In this article, we introduce the eigenstructure
based angle for outlier detection. This method is simple and effective
in dealing with masking and swamping problems. The method proposed
is illustrated and compared with Mahalanobis distance by using several
data sets.
Keywords: Angle; Eigenstructure; masking; outliers; swamping
ABSTRAK
Terdapat dua sebab utama yang mendorong orang
ramai untuk mengesan titik terpencil, yang pertama adalah hasrat penyelidik;
lihat contoh kes Encik Haldum di Barnett dan Lewis. Yang kedua adalah kesan titik
terpencil ke atas analisis. Kertas ini tidak
membezakan antara pelbagai justifikasi untuk mengesan titik terpencil. Tujuannya adalah untuk berkongsi dengan penganalisis mengenai cerapan yang
terpencil daripada cerapan lain dalam set data. Dalam kertas
ini, kami memperkenalkan sudut berasaskan struktur eigen untuk mengesan titik terpencil. Kaedah ini adalah mudah dan
berkesan dalam berurusan dengan masalah litupan dan limpahan. Kaedah yang
dicadangkan digambarkan dan dibandingkan dengan jarak Mahalanobis menggunakan
beberapa set data.
Kata kunci: Limpahan; litupan; struktur eigen; sudut; titik terpencil
RUJUKAN
Atkinson, A.C. 1994. Fast very
robust methods for the detection of multiple outliers. Journal of the
American Statistical Association 89(428): 1329-1339.
Barnett, V. & Lewis, T.
1994. Outliers
in Statistical Data. New York: Wiley and Sons.
Caroni, C. & Billor, N. 2007. Robust detection of multiple outliers in grouped multivariate data. Journal of Applied Statistics 34(10): 1241-1250.
Chatterjee, S. & Hadi, A.S. 1988. Sensitivity Analysis in Linear Regression. United
States: John Wiley.
Cook, R.D. &
Weisberg, S. 1982. Residuals and Influence in Regression. New York: Chapman
and Hall.
Franklin, S., Thomas, S. & Brodeur,
M. 2000. Robust multivariate
outlier detection using Mahalanobis distance and modified Stahel-Donoho
estimators. Proceeding International Conference on Establishment
Surveys, New York. pp. 697- 706.
Gao, S., Li, G. & Wang, D.Q. 2005. A new approach for detecting multivariate
outliers. Communication in Statistics-Theory and
Method. 34: 1857-1865.
Hadi, A.S. 1992. Identyfying multiple
outliers in multivariate data. Journal Royal Statistics Soc. B. 54(3):
761-777.
Hampel, F.R. 1971. A general qualitative
definition of robustness. Annals of Mathematics Statistic 42(6):
1887-1896.
Hawkins, D.M. 1980. Identification of
Outliers. London: Chapman and Hall.
Hawkins, D.M., Bradu, D. & Kass,
G.V. 1984. Location of several
outliers in multiple regression data using elemental sets. Technometrics 26(3): 197-208.
Hodge, V.J. 2004. A survey of outlier
detection methodologies. Artificial Intelligence Review 22(2):
85-126.
Mertens, B.J.A. 1998. Exact principle component influence
measure applied to the analysis of spectroscopic data on rice. Applied
Statistics 47(4): 527-542.
Pena, D. & Prieto, F.J. 2001. Multivariate
outlier detection and robust covariance matrix estimation. Technometrics 43(3): 286-299.
Quinn, G.P. & Keough, M.J. 2002. Experimental Design
and Data Analysis for Biologists. Cambridge: Cambridge University Press.
Rocke, D.M. & Woodruff, D.L. 1996. Identification
of outliers in multivariate data. Journal of the American Statistical
Association 91(435): 1047-1061.
Rousseeuw, P.J. & Driessen, K.V. 1999. A fast algorithm for the minimum covariance determinant estimator. American Statistical Association and the American Society for Quality 41(3):
212-223.
Rousseeuw, P.J. & Leroy, A.M. 1987. Robust Regression
and Outlier Detection. New York: John Wiley.
Rousseeuw, P.J. & von Zomeren, B.C. 1990. Unmasking
multivariate outliers and leverage points. Journal of the American
Statistical Association 85(411): 633-639.
Shapiro, S.S. & Wilk, M.B. 1965. An
analysis of variance test for normality (complete samples). Biometrika 52: 591-611.
Siotani, M. 1959. The extreme value of the
generalized distance of the individual points in the multivariate normal
sample. Annals of the Institute of Statistical Mathematics 10:
183-208.
Wang, S.G. & Liski, E.P. 1993. Effects
of observations on the eigensystem of a sample covariance matrix. Journal
of Statistical Planning and Inference 36: 215-226.
Wang, S.G. & Nyquist, H. 1991. Effects
on the eigenstructure of a data matrix when deleting an observation. Computational
Statistics and Data Analysis 11(2): 179-188.
Wulder, M. 2002. A Practical Guide to
the Use of Selected Multivariate Statistics. Victoria: Canadian
Forest Service.
Wilk, S.S. 1963. Multivariate statistical
outliers. Sankhya 25: 407-426.
*Pengarang
untuk surat-menyurat; email: nazrina@uum.edu.my
|