Sains Malaysiana 50(7)(2021): 2085-2094
http://doi.org/10.17576/jsm-2021-5007-22
Fast Improvised Influential Distance for
the Identification of Influential Observations in Multiple Linear Regression
(Penambahbaikan Pantas Jarak Pengaruh bagi
Pengecaman Cerapan Berpengaruh dalam Regresi Linear Berganda)
HABSHAH MIDI1*,
MUHAMMAD SANI2, SHELAN SAIED ISMAEEL3 & JAYANTHI
ARASAN1
1Department of
Mathematics, Faculty of Science and Institute for Mathematical Research, Universiti
Putra Malaysia, 43400 UPM Serdang, Selangor Darul Ehsan, Malaysia
2Department of
Mathematical Sciences, Federal University Dutsin-Ma, Katsina State, Nigeria
3Department of
Mathematics, Faculty of Science, University of Zakho, Zakho, Iraq
Received: 5
February 2020/Accepted: 19 November 2020
ABSTRACT
Influential observations
(IO) are those observations that are responsible for misleading conclusions
about the fitting of a multiple linear regression model. The existing IO
identification methods such as influential distance (ID) is not very successful
in detecting IO. It is suspected that the ID employed inefficient method with
long computational running time for the identification of the suspected IO at
the initial step. Moreover, this method declares good leverage observations as
IO, resulting in misleading conclusion. In this paper, we proposed fast
improvised influential distance (FIID) that can successfully identify IO, good
leverage observations, and regular observations with shorter computational
running time. Monte Carlo simulation study and real data examples show that the
FIID correctly identify genuine IO in multiple linear regression model with no
masking and a negligible swamping rate.
Keywords: Bad leverage
point; good leverage point; influential distance; influential observations
ABSTRAK
Cerapan berpengaruh (IO)
adalah cerapan yang bertanggungjawab ke atas kesimpulan yang mengelirukan bagi
penyesuaian model regresi linear berganda. Kaedah pengecaman IO sedia ada
seperti jarak berpengaruh (ID) tidak begitu berjaya untuk mengesan IO. Kami mengesyaki
bahawa ID menggunakan kaedah yang kurang cekap dengan masa pengiraan yang
panjang pada langkah awal bagi pengecaman cerapan IO. Tambahan pula, kaedah ini
menunjukkan cerapan tuasan baik sebagai IO yang mengelirukan keputusan kajian.
Dalam kertas ini, kami mencadangkan penambahbaikan jarak berpengaruh pantas
(FIID) yang boleh mengecam IO, cerapan tuasan yang baik dan cerapan biasa
dengan jayanya dengan masa pengiraan yang pantas. Kajian Monte Carlo simulasi
dan contoh data sebenar menunjukkan bahawa FIID mengecam IO dalam model linear
regresi berganda dengan betul tanpa penyorokan dan kadar limpahan yang sangat
kecil.
Kata kunci: Cerapan
berpengaruh; jarak berpengaruh; titik tuasan buruk; titik tuasan tinggi baik
REFERENCES
Atkinson, A.C. 1988. Masking
unmasked. Biometrika 73(3): 533-541.
Atkinson, A.C. & Riani, M. 2000. Robust Diagnostic Regression Analysis. New York: Springer-Verlag.
Belsley, D., Kuh, E. & Welsch, R. 2004. Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity. Hoboken, New Jersey: John
Wiley & Sons, Inc.
Chatterjee, S. & Hadi, A.S. 2006. Regression Analysis by Example. 4th ed. Hoboken, New Jersey: John
Wiley & Sons, Inc.
Chatterjee, S. & Hadi, A.S. 1986. Influential
observations, high leverage points, and outliers in regression. Statistical Science 1(3): 379-393.
Cook, R.D. 1998. Regression
Graphic: Ideas for Studying Regression through Graphics. Hoboken, New
Jersey: John Wiley & Sons, Inc.
Gray, J.B. 1985. Graphics for regression diagnostics. In American
Statistical Association Proceedings
of Statistical Computing Section. American Statistical Association. pp.
102-107.
Habshah, M., Norazan, M.R. & Rahmatullah Imon, A.H.M.
2009. The performance of diagnostic-robust generalized potentials for the identification
of multiple high leverage points in linear regression. Journal of Applied Statistics 36(5): 507-520.
Hadi, A.S. 1992. A new measure of overall potential influence
in linear regression. Computational
Statistics & Data Analysis 14(1): 1-27.
Hadi, A.S. & Simonoff, J. 1993. Procedure for the
identification of outliers in linear models. Journal of the American Statistics Association 88(424): 1264-1272.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. & Stahel,
W.A. 2011. Robust Statistics: The Approach
based on Influence Functions. Hoboken, Ney Jersey: John Wiley &
Sons, Inc.
Hawkins, D.M., Bradu, D. & Kass, G.V. 1984. Location of
several outliers in multiple regression data using elemental sets. Technometrics 26(3): 197-208.
Lim,
H.A. & Habshah, M. 2016. Diagnostic robust generalized potential based on
index set equality (DRGP(ISE)) for the identification of high leverage points
in linear models. Computational
Statistics 31(3): 859-877.
Mohammed, A., Habshah, M. & Rahmatullah Imon, A.H.M.
2015. A new robust diagnostic plot for classifying good and bad high leverage
points in a multiple linear regression model. Mathematical Problems in Engineering 2015: Article
ID. 279472.
Nurunnabi, A.A.M.,
Nasser, M. & Rahmatullah Imon, A.H.M. 2016. Identification of multiple
outliers, high leverage points and influential observations in linear
regression. Journal of Applied Statistics 43(3): 509-525.
Rahmatullah Imon, A.H.M. 2005. Identifying multiple
influential observations in linear regression. Journal of Applied Statistics 32(9): 929-946.
Rahmatullah Imon, A.H.M. 2002. Identifying multiple high
leverage points in linear regression. Journal
of Statistical Studies 3: 207-218.
Rousseeuw,
P.J. & Leroy, A.M. 1987. Robust
Regression and Outlier Detection. Wiley series in probability and
mathematical statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.
Welsch, R.E. 1980. Regression
sensitivity analysis and bounded-influence estimation. In Evaluation of
Econometric Models, edited by
Kemnta, J. & Ramsey, J.B. New York: Academic Press, Inc. pp.
153-167.
*Corresponding
author; email: habshahmidi@gmail.com
|