Sains Malaysiana 50(7)(2021): 2085-2094

http://doi.org/10.17576/jsm-2021-5007-22

 

Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression

(Penambahbaikan Pantas Jarak Pengaruh bagi Pengecaman Cerapan Berpengaruh dalam Regresi Linear Berganda)

 

HABSHAH MIDI1*, MUHAMMAD SANI2, SHELAN SAIED ISMAEEL3 & JAYANTHI ARASAN1

 

1Department of Mathematics, Faculty of Science and Institute for Mathematical Research, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor Darul Ehsan, Malaysia

 

2Department of Mathematical Sciences, Federal University Dutsin-Ma, Katsina State, Nigeria

 

3Department of Mathematics, Faculty of Science, University of Zakho, Zakho, Iraq

 

Received: 5 February 2020/Accepted: 19 November 2020

 

ABSTRACT

Influential observations (IO) are those observations that are responsible for misleading conclusions about the fitting of a multiple linear regression model. The existing IO identification methods such as influential distance (ID) is not very successful in detecting IO. It is suspected that the ID employed inefficient method with long computational running time for the identification of the suspected IO at the initial step. Moreover, this method declares good leverage observations as IO, resulting in misleading conclusion. In this paper, we proposed fast improvised influential distance (FIID) that can successfully identify IO, good leverage observations, and regular observations with shorter computational running time. Monte Carlo simulation study and real data examples show that the FIID correctly identify genuine IO in multiple linear regression model with no masking and a negligible swamping rate.

 

Keywords: Bad leverage point; good leverage point; influential distance; influential observations

 

ABSTRAK

Cerapan berpengaruh (IO) adalah cerapan yang bertanggungjawab ke atas kesimpulan yang mengelirukan bagi penyesuaian model regresi linear berganda. Kaedah pengecaman IO sedia ada seperti jarak berpengaruh (ID) tidak begitu berjaya untuk mengesan IO. Kami mengesyaki bahawa ID menggunakan kaedah yang kurang cekap dengan masa pengiraan yang panjang pada langkah awal bagi pengecaman cerapan IO. Tambahan pula, kaedah ini menunjukkan cerapan tuasan baik sebagai IO yang mengelirukan keputusan kajian. Dalam kertas ini, kami mencadangkan penambahbaikan jarak berpengaruh pantas (FIID) yang boleh mengecam IO, cerapan tuasan yang baik dan cerapan biasa dengan jayanya dengan masa pengiraan yang pantas. Kajian Monte Carlo simulasi dan contoh data sebenar menunjukkan bahawa FIID mengecam IO dalam model linear regresi berganda dengan betul tanpa penyorokan dan kadar limpahan yang sangat kecil.

 

Kata kunci: Cerapan berpengaruh; jarak berpengaruh; titik tuasan buruk; titik tuasan tinggi baik

 

REFERENCES

Atkinson, A.C. 1988. Masking unmasked. Biometrika 73(3): 533-541.

Atkinson, A.C. & Riani, M. 2000. Robust Diagnostic Regression Analysis. New York: Springer-Verlag.

Belsley, D., Kuh, E. & Welsch, R. 2004. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Hoboken, New Jersey: John Wiley & Sons, Inc.

Chatterjee, S. & Hadi, A.S. 2006. Regression Analysis by Example. 4th ed. Hoboken, New Jersey: John Wiley & Sons, Inc.

Chatterjee, S. & Hadi, A.S. 1986. Influential observations, high leverage points, and outliers in regression. Statistical Science 1(3): 379-393.

Cook, R.D. 1998. Regression Graphic: Ideas for Studying Regression through Graphics. Hoboken, New Jersey: John Wiley & Sons, Inc.

Gray, J.B. 1985. Graphics for regression diagnostics. In American Statistical Association Proceedings of Statistical Computing Section. American Statistical Association. pp. 102-107.

Habshah, M., Norazan, M.R. & Rahmatullah Imon, A.H.M. 2009. The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression. Journal of Applied Statistics 36(5): 507-520.

Hadi, A.S. 1992. A new measure of overall potential influence in linear regression. Computational Statistics & Data Analysis 14(1): 1-27.

Hadi, A.S. & Simonoff, J. 1993. Procedure for the identification of outliers in linear models. Journal of the American Statistics Association 88(424): 1264-1272.

Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. & Stahel, W.A. 2011. Robust Statistics: The Approach based on Influence Functions. Hoboken, Ney Jersey: John Wiley & Sons, Inc.

Hawkins, D.M., Bradu, D. & Kass, G.V. 1984. Location of several outliers in multiple regression data using elemental sets. Technometrics 26(3): 197-208.

Lim, H.A. & Habshah, M. 2016. Diagnostic robust generalized potential based on index set equality (DRGP(ISE)) for the identification of high leverage points in linear models. Computational Statistics 31(3): 859-877.

Mohammed, A., Habshah, M. & Rahmatullah Imon, A.H.M. 2015. A new robust diagnostic plot for classifying good and bad high leverage points in a multiple linear regression model. Mathematical Problems in Engineering 2015: Article ID. 279472. 

Nurunnabi, A.A.M., Nasser, M. & Rahmatullah Imon, A.H.M. 2016. Identification of multiple outliers, high leverage points and influential observations in linear regression. Journal of Applied Statistics 43(3): 509-525.

Rahmatullah Imon, A.H.M. 2005. Identifying multiple influential observations in linear regression. Journal of Applied Statistics 32(9): 929-946.

Rahmatullah Imon, A.H.M. 2002. Identifying multiple high leverage points in linear regression. Journal of Statistical Studies 3: 207-218.

Rousseeuw, P.J. & Leroy, A.M. 1987. Robust Regression and Outlier Detection. Wiley series in probability and mathematical statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.

Welsch, R.E. 1980. Regression sensitivity analysis and bounded-influence estimation. In Evaluation of Econometric Models, edited by Kemnta, J. & Ramsey, J.B. New York: Academic Press, Inc. pp. 153-167.

 

*Corresponding author; email: habshahmidi@gmail.com

 

 

 

previous