Sains Malaysiana 46(2)(2017): 317–326
http://dx.doi.org/10.17576/jsm-2017-4602-17
Missing Value Estimation Methods for Data
in Linear Functional Relationship Model
(Kaedah Menganggar Data Lenyap menggunakan Model Linear Hubungan
Fungsian)
ADILAH ABDUL
GHAPOR1,
YONG
ZULINA
ZUBAIRI2*
& A.H.M. RAHMATULLAH
IMON3
1Institute of Graduate
Studies, University of Malaya, 50603 Kuala Lumpur, Federal Territory
Malaysia
2Centre for Foundation
Studies in Science, University of Malaya, 50603 Kuala Lumpur,
Federal Territory,
Malaysia
3Department of Mathematical
Sciences, Ball State University, 47306 Indiana, United States
of America
Diserahkan: 1 Disember 2015/Diterima: 9 Jun 2016
ABSTRACT
Missing value problem is common
when analysing quantitative data. With
the rapid growth of computing capabilities, advanced methods in
particular those based on maximum likelihood estimation has been
suggested to best handle the missing values problem. In this paper,
two modern imputing approaches namely expectation-maximization
(EM) and expectation-maximization with
bootstrapping (EMB) are proposed in this paper for two
kinds of linear functional relationship (LFRM)
models, namely LFRM1 for full model and LFRM2
for linear functional relationship model when slope parameter
is estimated using a nonparametric approach. The performance of
EM and
EMB
are measured using mean absolute error, root-mean-square
error and estimated bias. The results of the simulation study
suggested that both EM
and EMB
methods are applicable to the LFRM with
EMB
algorithm outperforms the standard EM algorithm.
Illustration using a practical example and a real data set is
provided.
Keywords: Bootstrap; expectation-maximization;
linear functional relationship model; missing value
ABSTRAK
Data
lenyap sering
terjadi dalam analisis
data kuantitatif. Dengan berkembangnya keupayaan pengiraan, kaedah terkini iaitu kaedah kebolehjadian
maksimum merupakan
antara cara
yang terbaik untuk
menguruskan masalah data lenyap. Di dalam kertas ini, dua
kaedah gantian
moden diperkenalkan iaitu jangkaan pemaksimuman (EM) dan
jangkaan pemaksimum
bootstrap (EMB) untuk digunakan
di dalam model linear hubungan
fungsian (LFRM) iaitu
LFRM1
bagi model penuh
dan LFRM2 bagi
model linear hubungan fungsian
apabila parameter kecerunan
dianggarkan menggunakan
kaedah bukan berparameter.
Prestasi
EM
dan EMB diukur
berdasarkan purata ralat mutlak, punca
purata kuasa
dua ralat, dan
anggaran terpincang.
Melalui
simulasi, kami dapati EM
dan EMB kedua-duanya
boleh digunakan
oleh LFRM dan keputusan menunjukkan bahawa algoritma EMB adalah lebih baik
daripada algoritma
EM.
Kajian ini
disertakan dengan contoh data set yang sebenar.
Kata kunci: Bootsrap;
data lenyap; jangkaan
pemaksimum; model linear hubungan
fungsian
RUJUKAN
Acock,
A.C. 2005.
Working with missing values. Journal
of Marriage and Family 67: 1012-1028.
Al-Nasser, A.D.
2005. A new nonparametric method for estimating
the slope of simple linear measure error model in the presence
of outliers. Pak. J. Statist. 21(3): 265-274.
Baraldi,
A.N. & Enders, C.K. 2010. An introduction to modern
missing data analyses. Journal of School Psychology
48: 5-37.
Barzi,
F. & Woodward, M. 2004. Imputations of missing values in practice:
Results from imputations of serum cholesterol in 28 cohort studies.
American Journal of Epidemiology 160(1): 34-45.
Bilmes, J.A. 1998. A gentle tutorial of the EM algorithm and its application to parameter
estimation for Gaussian mixture and hidden Markov models.
International Computer Science Institute. pp. 2-7.
Bock,
R.D. & Murray, A. 1981. Marginal maximum likelihood estimation
of item parameters: Application of an EM algorithm. Psychometrika
46(4): 443-459.
Couvreur, C. 1997. The
EM algorithm: A guided tour computer intensive methods in control
and signal processing. New York: Springer. pp. 209-222.
Dempster,
A.P., Laird, N.M. & Rubin, D.B. 1977. Maximum likelihood
from incomplete data via the EM algorithm. Journal of
the Royal Statistical Society, Series B (Methodological)
39(1): 1-38.
Dziura, J.D., Post, L.A.,
Zhao, Q., Fu, Z. & Peduzzi, P. 2013.
Strategies for dealing with missing data in clinical trials: From
design to analysis. The Yale Journal of Biology
and Medicine 86(3): 343-358.
George,
N.I., Bowyer, J.F., Crabtree, N.M. & Chang, C.W. 2015. An iterative leave-one-out approach to outlier detection in RNA-Seq data. PLoS
ONE 10(6): e0125224. doi:10.1371/
journal. pone.0125224G.
Ghapor,
A.A., Zubairi, Y.Z., Mamun,
A.S.M.A. & Imon, A.H.M.R. 2015. A robust nonparametric slope estimation in linear functional
relationship model. Pak. J. Statist. 31(3): 339-350.
Gold,
M.S. & Bentler, P.M. 2000. Treatments of
missing data: A Monte Carlo comparison of RBHDI, iterative stochastic
regression imputation, and expectation-maximization. Structural
Equation Modelling: A Multidisciplinary Journal 7(3):
319-355.
Goran,
M.I., Driscoll, P., Johnson, R., Nagy, T.R. & Hunter, G.R.
1996. Cross-calibration of body-composition techniques against dual-energy
X-Ray absorptiometry in young children. American Journal
of Clinical Nutrition 63: 299-305.
Guan,
N.C. & Yusoff, N.S.B. 2011. Missing values in data analysis: Ignore
or Impute? Education in Medicine Journal 3(1): 6-11.
Honaker,
J., King, G. & Blackwell, M. 2013. Amelia II: A Program for missing
data. http://gking.harvard.edu/amelia.
Howell, D.C. 2008.
The analysis of missing data. In Handbook
of Social Science Methodology, edited by Outhwaite,
W. & Turner, S. London: Sage.
Junger, W.L. & de
Leon, A.P. 2015. Imputation of missing data
in time series for air pollutants. Atmospheric Environment
102: 96-104.
Junninen,
H., Niska, H., Tuppurrainen, K., Ruuskanen,
J. & Kolehmainen, M. 2004. Methods for imputation of missing values in air quality data sets.
Atoms Environ. 38: 2895-2907.
Kendall,
M.G. & Stuart, A. 1973. The Advance Theory
of Statistics. Vol. 2, London: Griffin.
Lindley, D.V. 1947.
Regression lines and the linear functional relationship.
J. R. Statist. Soc., Suppl., 9: 218-244.
Little,
R.J.A. & Rubin, D.B. 1987. Statistical Analysis
with Missing Data. New York: Wiley.
Morita,
T. & Kimura, M. 2014. A fundamental study
on missing value treatment for software quality prediction.
Advanced Science and Technology Letters 67: 70-73.
Rancoita,
P.M.V., Zaffalon, M., Zucca,
E., Bertoni, F. & Campos, C.P. 2015. Bayesian network data imputation with application to survival tree
analysis. Computational Statistics and Data Analysis
93: 373-387.
Razak,
N.A., Zubairi, Y.Z. & Yunus,
R.M. 2014.
Imputing missing values in modelling the PM10
concentrations. Sains
Malaysiana 43(10): 1599-1607.
Schafer, J.L. 1997. Analysis of Incomplete Multivariate Data. New York:
Chapman and Hall.
Schafer, J.L. & Graham,
J.W. 2002. Missing data: Our view of the state of the
art. Psychological Methods 7: 147-177.
Sprent, P. 1969. Models in Regression and Related Topics. London: Methuen.
Takahashi,
M. & Ito, T. 2013. Multiple imputation of missing values in economic
surveys: Comparison of competing algorithms. Proceedings 59th
ISI World Statistics Congress, Hong Kong, August 25-30th.
Wang,
J. & Miao, Y. 2009. Note on the EM Algorithm in linear regression model.
International Mathematical Forum 38: 1883-1889.
Wu, C.F.J. 1983.
On the convergence properties of the EM algorithm.
The Annals of Statistics 11(1): 95-103.
Zainuri,
N., Jemain, A. & Muda, N. 2015. A comparison of various imputation methods for missing values in air
quality data. Sains
Malaysiana 44(3): 449-456.
*Pengarang untuk surat-menyurat; email: yzulina@um.edu.my