SAINS MALAYSIANA

Sains Malaysiana 46(2)(2017): 317–326

http://dx.doi.org/10.17576/jsm-2017-4602-17

Missing Value Estimation Methods for Data in Linear Functional Relationship Model

(Kaedah Menganggar Data Lenyap menggunakan Model Linear Hubungan Fungsian)

ADILAH ABDUL GHAPOR¹, YONG ZULINA ZUBAIRI²* & A.H.M. RAHMATULLAH IMON³

¹Institute of Graduate Studies, University of Malaya, 50603 Kuala Lumpur, Federal Territory

Malaysia

²Centre for Foundation Studies in Science, University of Malaya, 50603 Kuala Lumpur,

Federal Territory, Malaysia

³Department of Mathematical Sciences, Ball State University, 47306 Indiana, United States

of America

Diserahkan: 1 Disember 2015/Diterima: 9 Jun 2016

ABSTRACT

Missing value problem is common when analysing quantitative data. With the rapid growth of computing capabilities, advanced methods in particular those based on maximum likelihood estimation has been suggested to best handle the missing values problem. In this paper, two modern imputing approaches namely expectation-maximization (EM) and expectation-maximization with bootstrapping (EMB) are proposed in this paper for two kinds of linear functional relationship (LFRM) models, namely LFRM1 for full model and LFRM2 for linear functional relationship model when slope parameter is estimated using a nonparametric approach. The performance of EM and EMB are measured using mean absolute error, root-mean-square error and estimated bias. The results of the simulation study suggested that both EM and EMB methods are applicable to the LFRM with EMB algorithm outperforms the standard EM algorithm. Illustration using a practical example and a real data set is provided.

Keywords: Bootstrap; expectation-maximization; linear functional relationship model; missing value

ABSTRAK

Data lenyap sering terjadi dalam analisis data kuantitatif. Dengan berkembangnya keupayaan pengiraan, kaedah terkini iaitu kaedah kebolehjadian maksimum merupakan antara cara yang terbaik untuk menguruskan masalah data lenyap. Di dalam kertas ini, dua kaedah gantian moden diperkenalkan iaitu jangkaan pemaksimuman (EM) dan jangkaan pemaksimum bootstrap (EMB) untuk digunakan di dalam model linear hubungan fungsian (LFRM) iaitu LFRM1 bagi model penuh dan LFRM2 bagi model linear hubungan fungsian apabila parameter kecerunan dianggarkan menggunakan kaedah bukan berparameter. Prestasi EM dan EMB diukur berdasarkan purata ralat mutlak, punca purata kuasa dua ralat, dan anggaran terpincang. Melalui simulasi, kami dapati EM dan EMB kedua-duanya boleh digunakan oleh LFRM dan keputusan menunjukkan bahawa algoritma EMB adalah lebih baik daripada algoritma EM. Kajian ini disertakan dengan contoh data set yang sebenar.

Kata kunci: Bootsrap; data lenyap; jangkaan pemaksimum; model linear hubungan fungsian

RUJUKAN

Acock, A.C. 2005. Working with missing values. Journal of Marriage and Family 67: 1012-1028.

Al-Nasser, A.D. 2005. A new nonparametric method for estimating the slope of simple linear measure error model in the presence of outliers. Pak. J. Statist. 21(3): 265-274.

Baraldi, A.N. & Enders, C.K. 2010. An introduction to modern missing data analyses. Journal of School Psychology 48: 5-37.

Barzi, F. & Woodward, M. 2004. Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology 160(1): 34-45.

Bilmes, J.A. 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute. pp. 2-7.

Bock, R.D. & Murray, A. 1981. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 46(4): 443-459.

Couvreur, C. 1997. The EM algorithm: A guided tour computer intensive methods in control and signal processing. New York: Springer. pp. 209-222.

Dempster, A.P., Laird, N.M. & Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39(1): 1-38.

Dziura, J.D., Post, L.A., Zhao, Q., Fu, Z. & Peduzzi, P. 2013. Strategies for dealing with missing data in clinical trials: From design to analysis. The Yale Journal of Biology and Medicine 86(3): 343-358.

George, N.I., Bowyer, J.F., Crabtree, N.M. & Chang, C.W. 2015. An iterative leave-one-out approach to outlier detection in RNA-Seq data. PLoS ONE 10(6): e0125224. doi:10.1371/ journal. pone.0125224G.

Ghapor, A.A., Zubairi, Y.Z., Mamun, A.S.M.A. & Imon, A.H.M.R. 2015. A robust nonparametric slope estimation in linear functional relationship model. Pak. J. Statist. 31(3): 339-350.

Gold, M.S. & Bentler, P.M. 2000. Treatments of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization. Structural Equation Modelling: A Multidisciplinary Journal 7(3): 319-355.

Goran, M.I., Driscoll, P., Johnson, R., Nagy, T.R. & Hunter, G.R. 1996. Cross-calibration of body-composition techniques against dual-energy X-Ray absorptiometry in young children. American Journal of Clinical Nutrition 63: 299-305.

Guan, N.C. & Yusoff, N.S.B. 2011. Missing values in data analysis: Ignore or Impute? Education in Medicine Journal 3(1): 6-11.

Honaker, J., King, G. & Blackwell, M. 2013. Amelia II: A Program for missing data. http://gking.harvard.edu/amelia.

Howell, D.C. 2008. The analysis of missing data. In Handbook of Social Science Methodology, edited by Outhwaite, W. & Turner, S. London: Sage.

Junger, W.L. & de Leon, A.P. 2015. Imputation of missing data in time series for air pollutants. Atmospheric Environment 102: 96-104.

Junninen, H., Niska, H., Tuppurrainen, K., Ruuskanen, J. & Kolehmainen, M. 2004. Methods for imputation of missing values in air quality data sets. Atoms Environ. 38: 2895-2907.

Kendall, M.G. & Stuart, A. 1973. The Advance Theory of Statistics. Vol. 2, London: Griffin.

Lindley, D.V. 1947. Regression lines and the linear functional relationship. J. R. Statist. Soc., Suppl., 9: 218-244.

Little, R.J.A. & Rubin, D.B. 1987. Statistical Analysis with Missing Data. New York: Wiley.

Morita, T. & Kimura, M. 2014. A fundamental study on missing value treatment for software quality prediction. Advanced Science and Technology Letters 67: 70-73.

Rancoita, P.M.V., Zaffalon, M., Zucca, E., Bertoni, F. & Campos, C.P. 2015. Bayesian network data imputation with application to survival tree analysis. Computational Statistics and Data Analysis 93: 373-387.

Razak, N.A., Zubairi, Y.Z. & Yunus, R.M. 2014. Imputing missing values in modelling the PM10 concentrations. Sains Malaysiana 43(10): 1599-1607.

Schafer, J.L. 1997. Analysis of Incomplete Multivariate Data. New York: Chapman and Hall.

Schafer, J.L. & Graham, J.W. 2002. Missing data: Our view of the state of the art. Psychological Methods 7: 147-177.

Sprent, P. 1969. Models in Regression and Related Topics. London: Methuen.

Takahashi, M. & Ito, T. 2013. Multiple imputation of missing values in economic surveys: Comparison of competing algorithms. Proceedings 59th ISI World Statistics Congress, Hong Kong, August 25-30th.

Wang, J. & Miao, Y. 2009. Note on the EM Algorithm in linear regression model. International Mathematical Forum 38: 1883-1889.

Wu, C.F.J. 1983. On the convergence properties of the EM algorithm. The Annals of Statistics 11(1): 95-103.

Zainuri, N., Jemain, A. & Muda, N. 2015. A comparison of various imputation methods for missing values in air quality data. Sains Malaysiana 44(3): 449-456.

*Pengarang untuk surat-menyurat; email: yzulina@um.edu.my