Sains Malaysiana 46(2)(2017):
255–265
http://dx.doi.org/10.17576/jsm-2017-4602-10
Feature Selection Algorithms for
Malaysian Dengue Outbreak Detection Model
(Pemilihan Ciri Algoritma untuk Model Pengesanan
Wabak Denggi)
HUSAM,
I.S1.,
ABUHAMAD1,
AZURALIZA
ABU
BAKAR1,
SUHAILA
ZAINUDIN1*,
MAZRURA SAHANI2
& ZAINUDIN MOHD ALI2
1Center for Artificial Intelligence
Technology, Faculty of Information Science and Technology
Universiti Kebangsaan Malaysia,
43600, UKM Bangi, Selangor Darul Ehsan, Malaysia
2Faculty of Health Sciences, Universiti
Kebangsaan Malaysia, Jalan Raja Muda Abd Aziz
50300 Kuala Lumpur, Wilayah Persekutuan,
Malaysia
3Public Health Department, Ministry
of Health, Jalan Rasah, 70300 Seremban, Negeri Sembilan Darul
Khusus, Malaysia
Diserahkan: 11 Mac 2016/Diterima:
8 Jun 2016
ABSTRACT
Dengue fever is considered
as one of the most common mosquito borne diseases worldwide. Dengue
outbreak detection can be very useful in terms of practical efforts
to overcome the rapid spread of the disease by providing the knowledge
to predict the next outbreak occurrence. Many studies have been
conducted to model and predict dengue outbreak using different
data mining techniques. This research aimed to identify the best
features that lead to better predictive accuracy of dengue outbreaks
using three different feature selection algorithms; particle swarm
optimization (PSO), genetic algorithm (GA)
and rank search (RS). Based on the selected features,
three predictive modeling techniques (J48, DTNB and
Naive Bayes) were applied for dengue outbreak detection. The dataset
used in this research was obtained from the Public Health Department,
Seremban, Negeri Sembilan, Malaysia. The experimental results
showed that the predictive accuracy was improved by applying feature
selection process before the predictive modeling process. The
study also showed the set of features to represent dengue outbreak
detection for Malaysian health agencies.
Keywords: Feature selection;
dengue outbreak; knowledge discovery from databases; nature-based
algorithms; outbreak detection
ABSTRAK
Demam denggi merupakan penyakit
bawaan nyamuk yang wujud di merata dunia. Pengesanan wabak denggi
bermanfaat sebagai satu usaha praktikal mengawal penyebaran penyakit
ini dengan menyediakan pengetahuan untuk meramal kejadian wabak
yang seterusnya. Penyelidikan lepas telah dijalankan untuk memodel
dan meramal pengesanan wabak denggi menggunakan pelbagai teknik
perlombongan data. Penyelidikan ini bertujuan untuk mengenal pasti
ciri yang meningkatkan ketepatan ramalan wabak denggi menggunakan
tiga algoritma pemilihan ciri; particle
swarm optimization (PSO), genetic algorithm (GA)
dan rank search (RS). Berdasarkan ciri yang dipilih,
tiga teknik permodelan ramalan (J48, DTNB dan Naive Bayes) dijalankan
untuk peramalan wabak denggi. Set data yang digunakan dalam penyelidikan
ini diperoleh dari Jabatan Kesihatan Awam, Negeri Sembilan, Malaysia.
Keputusan kajian menunjukkan bahawa ketepatan ramalan meningkat
apabila proses pemilihan ciri dijalankan sebelum proses permodelan.
Kajian ini turut menghasilkan set ciri baru untuk mewakilkan
pengesanan wabak denggi untuk agensi berkaitan kesihatan di Malaysia.
Kata kunci: Algoritma berasaskan alam; pemilihan ciri; penemuan
ilmu dari pangkalan data; pengawalan wabak; wabak denggi
RUJUKAN
Ambu,
S., Lim, L.H., Sahani, M. & Bakar, A.B. 2003. Climate change-impact
on public health in Malaysia. Environ Health Focus 1: 13-21.
Andrick,
B., Clark, B., Nygaard, K., Logar, A., Penaloza, M. & Welch,
R. 1997. Infectious disease and climate change: Detecting contributing
factors and predicting future outbreaks. Geoscience and Remote
Sensing, 1997. IGARSS ‘97. Remote Sensing - A Scientific Vision
for Sustainable Development 4: 1947-1949. IEEE International.
Bakar,
A.A., Kefli, Z., Abdullah, S. & Sahani, M. 2011. Predictive
models for dengue outbreak using multiple rulebase classifiers.
Electrical Engineering and Informatics (ICEEI), 2011 International
Conference, Bandung. pp. 1-6.
Barbazan,
P., Yoksan, S. & Gonzalez, J.P. 2002. Dengue hemorrhagic fever
epidemiology in Thailand: Description and forecasting of epidemics.
Microbes Infect. 4: 699-705.
Beltz,
L.A. 2011. Emerging Infectious Diseases: A Guide to Diseases,
Causative Agents, and Surveillance. New York: John Wiley &
Sons. pp. 315-322.
Bolón-Canedo,
V., Sánchez-Maroño, N. & Alonso-Betanzos, A. 2012. An ensemble
of filters and classifiers for microarray data classification.
Pattern Recognition 45: 531-539.
Bolón-Canedo,
V., Sánchez-Maroño, N. & Alonso-Betanzos, A. 2011. Feature
selection and classification in multiple class datasets: An application
to KDD Cup 99 dataset. Expert Systems with Applications 38:
5947-5957.
Buckeridge,
D.L., Burkom, H., Campbell, M., Hogan, W.R. & Moore, A.W.
2005. Algorithms for rapid outbreak detection: A research synthesis.
Journal of Biomedical Informatics 38: 99-113.
Chidlovskii,
B. & Lecerf, L. 2008. Scalable feature selection for multi-class
problems. 2008. Proceedings of the 2008 European Conference
on Machine Learning and Knowledge Discovery in Databases - Part
I (ECML PKDD ‘08), Walter Daelemans, Bart Goethals, and Katharina
Morik (Eds.). Springer-Verlag, Berlin, Heidelberg. pp. 227-240.
Chong,
C. 2010. Scenario of dengue in Malaysia. Paper presented at Europe-South
East Asia Symposium on Dengue, 5-6 August 2010, Ministry of
Health, Malaysia.
Delatte,
H., Gimonneau, G., Triboire, A. & Fontenille, D. 2009. Influence
of temperature on immature development, survival, longevity, fecundity,
and gonotrophic cycles of Aedes albopictus, vector of chikungunya
and dengue in the Indian Ocean. Journal of Medical Entomology
46: 33-41.
Edelman,
R. 2007. Dengue vaccines approach the finish line. Clin. Infect.
45(Suppl. 1): S56-S60.
El
Akadi, A., Amine, A., El Ouardighi, A. & Aboutajdine, D. 2011.
A two-stage gene selection scheme utilizing MRMR filter and GA
wrapper. Knowledge and Information Systems 26: 487-500.
Fu,
X., Liew, C., Hung, T., Goh, H. & Lee, G. 2007. Time-series
infectious disease data analysis using SVM and genetic algorithm.
In IEEE Congress on Evolutionary Computation. pp. 1276-1280.
Goh,
K. 1997. Dengue-a re-emerging infectious disease in Singapore.
Annals of the Academy of Medicine Singapore 26(5): 664-670.
Gomez,
J.C., Boiy, E. & Moens, M.F. 2012. Highly discriminative statistical
features for email classification. Knowledge and Information
Systems 31(1): 23-53.
Gubler,
D.J. 2008. Dengue viruses. In Encyclopedia of Virology. 3rd
ed., edited by Mahy, B.W.J. & van Regenmortel, M.H.V. Boston:
Academic Press. pp. 5-14.
Guha-Sapir,
D. & Schimmer, B. 2005. Dengue fever: New paradigms for a
changing epidemiology. Emerg. Themes. Epidemiol. 2(1):
1-10.
Guy,
B. & Almond, J.W. 2008. Towards a dengue vaccine: Progress
to date and remaining challenges. Comparative Immunology. Microbiology
and Infectious Diseases 31(2- 3): 239-252.
Guyon,
I. 2003. An introduction to variable and feature selection.
Journal of Machine Learning Research 3: 1157-1182.
Hombach,
J. 2007. Vaccines against dengue: A review of current candidate
vaccines at advanced development stages. Revista Panamericana
de Salud Pública 21(4): 254-260.
Husin,
N.A. & Salim, N. 2008. A comparative study for back propagation
neural network and non-linear regression models for dengue outbreak
prediction. Jurnal Teknologi Maklumat 20(4): 97-112.
Hussin,
N., Jaafar, J., Naing, N.N., Mat, H.A., Muhamad, A.H. & Mamat,
M.N. 2005. A review of dengue fever incidence in Kota Bharu, Kelantan,
Malaysia during the years 1998- 2003. Southeast Asian J. Trop.
Med. Public Health 36(5): 1179-1186.
Li,
C., Lim, T., Han, L. & Fang, R. 1985. Rainfall, abundance
of Aedes aegypti and dengue infection in Selangor, Malaysia. Southeast
Asian J. Trop. Med. Public Health 16(4): 560-568.
Liu,
H. & Yu, L. 2005. Toward integrating feature selection algorithms
for classification and clustering. IEEE Trans. on Knowl. and
Data Eng. 17(4): 491-502.
Long,
Z., Abu Bakar, A., Razak Hamdan, A. & Sahani, M. 2010. Multiple
attribute frequent mining-based for dengue outbreak. In Proceedings
of the 6th International Conference on Advanced Data Mining and
Applications: Part I (ADMA’10), edited by Longbing Cao, Yong
Feng and Jiang Zhong. Berlin, Heidelberg: Springer-Verlag. pp.
489-496.
Loscalzo,
S., Yu, L. & Ding, C. 2009. Consensus group stable feature
selection. In Proceedings of the 15th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD ‘09). New
York: ACM. pp. 567-576.
Mousavi,
M., Bakar, A.A., Zainudin, S. & Awang, Z. 2013. Negative selection
algorithm for dengue outbreak detection. Turkish Journal of
Electrical Engineering & Computer Science 21: 2345-2356.
Nemati,
S. & Basiri, M. 2010. Particle swarm optimization for feature
selection in speaker verification. Applications of Evolutionary
Computation. Lecture Notes in Computer Science 6024: 371-380.
Nyamah, M., Sulaiman, S. & Omar, B. 2010. Categorization of potential
breeding sites of dengue vectors in Johor, Malaysia. Tropical
Biomedicine 27(1): 33-40.
Patz, J.A. & Reisen, W.K. 2001.
Immunology, climate change and vector-borne diseases. Trends
in Immunology 22(4): 171-172.
Que, J. & Tsui, F.C. 2011. Rank-based
spatial clustering: An algorithm for rapid outbreak detection.
Journal of the American Medical Informatics Association 18(3):
218-224.
Reiter, P. 2001. Climate change
and mosquito-borne disease. Environ. Health Perspect. 109(Suppl
1): 141-161.
Research, S. P. f., Diseases, T.
i. T., & Diseases, W. H. O. D. o. C. o. N. T. (2010). Dengue
Bulletin. 34.
Runge-Ranzinger, S., Horstick, O.,
Marx, M. & Kroeger, A. 2008. What does dengue disease surveillance
contribute to predicting and detecting outbreaks and describing
trends?. Tropical Medicine & International Health 13:
1022-1041.
Saari, P., Eerola, T. & Lartillot,
O. 2011. Generalizability and simplicity as criteria in feature
selection: Application to mood classification in music. IEEE
Transactions on Audio, Speech, and Language Processing 19(6):
1802-1812.
Seng, S.B., Chong, A.K. & Moore,
A. 2005. Geostatistical modelling, analysis and mapping of epidemiology
of Dengue Fever in Johor State, Malaysia. Presented at the 17th
Annual Colloquium of the Spatial Information Research Centre (SIRC
2005: A Spatio-temporal Workshop). pp. 109-123.
Shekhar, K.C. & Huat, O.L. 1992.
Epidemiology of dengue/ dengue hemorrhagic fever in Malaysia -
A retrospective epidemiological study 1973-1987. Part I: Dengue
hemorrhagic fever (DHF). Asia Pac. J. Public Health 6(3):
126-133.
Skae, F. 1902. Dengue fever in Penang.
Br. Med. J. 2(2185): 1581-1582.
Sun, Y., Babbs, C. & Delp, E.
2005. A comparison of feature selection methods for the detection
of breast cancers in mammograms: Adaptive sequential floating
search vs. genetic algorithm. IEEE-EMBS 2005. 27th Annual
International Conference, Shanghai. pp. 6532-6535.
Talarmin, A., Peneau, C., Dussart,
P., Pfaff, F., Courcier, M., de Rocca-Serra, B. & Sarthou,
J. 2000. Surveillance of dengue fever in French Guiana by monitoring
the results of negative malaria diagnoses. Epidemiol. Infect.
125(1): 189-193.
Toth, E., Brath, A. & Montanari,
A. 2000. Comparison of short-term rainfall prediction models for
real-time flood forecasting. Journal of Hydrology 239(1-4):
132-147.
Tuv, E., Borisov, A., Runger, G.
& Torkkola, K. 2009. Feature selection with ensembles, artificial
variables, and redundancy elimination. J. Mach. Learn. Res.
10: 1341-1366.
Vainer, I., Kraus, S., Kaminka,
G.A. & Slovin, H. 2011. Obtaining scalable and accurate classification
in large-scale spatio-temporal domains. Knowledge and Information
Systems 29(3): 527-564.
World Health Organization. 2009.
Research SPF, Diseases TIT, Diseases WHOD, Epidemic WHO and P.
Alert, Dengue, Guidelines for Diagnosis, Treatment, Prevention
and Control.
Wu, Y., Lee, G., Fu, X., Soh, H.
& Hung, T. 2009. Mining weather information in dengue outbreak:
Predicting future cases based on wavelet, SVM and GA. Advances
in Electrical Engineering and Computational Science. Netherlands:
Springer. pp. 483-494.
Zhang, Y., Ding, C. & Li, T.
2008. Gene selection algorithm by combining reliefF and mRMR.
BMC Genomics Supp1 2: S27.
*Pengarang untuk surat-menyurat;
email: suhaila.zainudin@ukm.edu.my