Sains Malaysiana 44(10)(2015): 1417–1422
Outlier
Detection using Generalized Linear Model in Malaysian Breast Cancer Data
(Pengesanan
Nilai Tersisih menggunakan Model Linear Teritlak dalam Data Kanser Payudara
Malaysia)
M. NAWAMA1,
A.I.N.
IBRAHIM1*,
I.B.
MOHAMED1,
M.S.
YAHYA1
& N.A.M. TAIB2
1Institute of
Mathematical Sciences, University of Malaya, 59100 Kuala Lumpur, Malaysia
2Department of Surgery, University
of Malaya Medical Centre, 59100 Kuala Lumpur, Malaysia
Diserahkan: 22 Mac 2013/Diterima: 15 Jun 2015
ABSTRACT
We consider the problem of outlier
detection in bivariate exponential data fitted using the generalized
linear model via Bayesian approach. We follow closely the work outlined
by Unnikrishnan (2010) and present every step of the detection procedure
in details. Due to the complexity of the resulting joint posterior
distribution, we obtain the information on the posterior distribution
from samples generated by Markov Chain Monte Carlo sampling, in
particular, using either the Gibbs sampler or the Metropolis-Hastings
algorithm. We use local breast cancer patients’ data to illustrate
the implementation of the method.
Keywords: Bayesian; Gibbs sampler;
Metropolis-Hastings algorithm; Outlier
ABSTRAK
Kami mempertimbangkan
masalah pengesanan nilai tersisih dalam data bivariat eksponen dengan
menggunakan model linear teritlak melalui pendekatan Bayesian. Kami mengikuti secara rapat kajian yang digariskan oleh Unnikrishnan
(2010) dan membentangkan setiap langkah prosedur pengesanan secara
terperinci. Disebabkan kerumitan taburan posterior tercantum
yang terhasil, kami mendapatkan maklumat mengenai taburan posterior
tersebut daripada sampel yang dijana oleh pensampelan Markov Chain
Monte Carlo, khususnya, menggunakan sama ada kaedah pensampelan
Gibbs atau algoritma Metropolis-Hastings yang umum. Kami
menggunakan data tempatan iaitu data pesakit kanser payudara untuk
menggambarkan pelaksanaan kaedah tersebut.
Kata
kunci: Algoritma Metropolis-Hastings; Bayesian; kaedah pensampelan Gibbs; nilai
tersisih
RUJUKAN
Anscombe,
F.J. & Guttman, I. 1960. Rejection of outliers. Technometrics 2: 123-147.
Barnett, V.
& Lewis T. 1983. Outliers in Statistical Data, Chichester: John
Wiley & Sons .
Bayarri, M.J. & Morales, J. 2003 Bayesian measures of surprise
for outlier detection. Journal of Statistical Planning and
Inference 111: 3-22.
Collet, D. 2003. Modelling Survival Data in
Medical Research. Boca Raton, FL: Chapman & Hall / CRC.
Ferguson,
T.S. 1961. Rules for rejection of outliers. Review
of the International Statistical Institute 29: 29-43.
Freeman,
P.R. 1980. On the number of outliers in data from a linear
model. In Bayesian Statistics, edited by Bernardo, J.M., DeGroot,
M.H., Lindley, D.V. & Smith, A.F.M. pp. 349-65.
Valencia: University Press.
Ishwaran, H.
1999. Applications of hybrid Monte Carlo to Bayesian generalized linear models:
quasicomplete separation and neural networks. Journal of Computational and
Graphical Statistics 8: 779-799.
Kuhnt, S. &
Pawlitschko, J. 2003. Outlier Identification Rules for Generalized
Linear Models. Technical Report no 12, Department of Statistics,
University of Dortmund.
Maller, R.A.
& Zhou, S. 1994. Testing for sufficient follow-up and outliers in survival
data. Journal of the American Statistical Association 89: 1499-509.
Marshall,
E.C. & Spiegelhalter, D.J. 2007. Identifying outliers in Bayesian
hierarchical models: A simulation-based approach. Bayesian Analysis 2:
409-444.
Nardi, A. & Schemper, M. 1999. New residuals for Cox regression and their application to outlier
screening. Biometrics 55(2): 523-529.
Page, G.L.
& Dunson, D.B. 2011. Bayesian local contamination models for multivariate
outliers. Technometrics 53: 152-162.
Pettit, L.I.
1994. Bayesian approaches to the detection of outliers in Poisson samples. Communication
in Statistics-Theory and Methods 23: 1785-1795.
Taib, N.A., Akmal, M.N.,
Mohamed, I.B. & Yip, C.H. 2011 Improvement in survival of breast cancer
patients trends in survival over two time periods in a single institution in
an Asia Pacific Country Malaysia. Asian Pacific J. of Can. Prev. 12:
345-349.
Taib, N.A., Yip, C.H.
& Mohamed, I. 2008. Survival analysis of Malaysian women with breast
cancer: Results from the University of Malaya Medical Centre. Asian Pacific
J. of Can. Prev. 9: 197-202.
Therneau, T.M.,
Grambcsh, P.M. & Fleming, T.R. 1990. Martingale-based
residuals for survival models. Biometrika 77(1): 147-60.
Unnikrishnan, N.K. 2010. Bayesian analysis for outliers in survey sampling. Computational
Statist. and Data Analysis 54: 1962-1974.
Williams, A.D. 1987.
Generalized linear model diagnostic using the deviance and single case
deletions. Appl. Statistics 36: 181-191.
Zeger, L.S. & Karim,
M.R. 1991. Generalized linear models with random effects: A Gibbs sampling
approach. Journal of the American Statistical Association 86: 79-86.
*Pengarang untuk surat-menyurat; email: adrianaibrahim@um.edu.my
|