Sains Malaysiana 46(2)(2017): 255–265

http://dx.doi.org/10.17576/jsm-2017-4602-10

Feature Selection Algorithms for Malaysian Dengue Outbreak Detection Model

(Pemilihan Ciri Algoritma untuk Model Pengesanan Wabak Denggi)

 

HUSAM, I.S1., ABUHAMAD1, AZURALIZA ABU BAKAR1, SUHAILA ZAINUDIN1*,

MAZRURA SAHANI2 & ZAINUDIN MOHD ALI2

 

1Center for Artificial Intelligence Technology, Faculty of Information Science and Technology

Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor Darul Ehsan, Malaysia

 

2Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Jalan Raja Muda Abd Aziz

50300 Kuala Lumpur, Wilayah Persekutuan, Malaysia

 

3Public Health Department, Ministry of Health, Jalan Rasah, 70300 Seremban, Negeri Sembilan Darul Khusus, Malaysia

 

Diserahkan: 11 Mac 2016/Diterima: 8 Jun 2016

 

ABSTRACT

Dengue fever is considered as one of the most common mosquito borne diseases worldwide. Dengue outbreak detection can be very useful in terms of practical efforts to overcome the rapid spread of the disease by providing the knowledge to predict the next outbreak occurrence. Many studies have been conducted to model and predict dengue outbreak using different data mining techniques. This research aimed to identify the best features that lead to better predictive accuracy of dengue outbreaks using three different feature selection algorithms; particle swarm optimization (PSO), genetic algorithm (GA) and rank search (RS). Based on the selected features, three predictive modeling techniques (J48, DTNB and Naive Bayes) were applied for dengue outbreak detection. The dataset used in this research was obtained from the Public Health Department, Seremban, Negeri Sembilan, Malaysia. The experimental results showed that the predictive accuracy was improved by applying feature selection process before the predictive modeling process. The study also showed the set of features to represent dengue outbreak detection for Malaysian health agencies.

 

Keywords: Feature selection; dengue outbreak; knowledge discovery from databases; nature-based algorithms; outbreak detection

 

ABSTRAK

Demam denggi merupakan penyakit bawaan nyamuk yang wujud di merata dunia. Pengesanan wabak denggi bermanfaat sebagai satu usaha praktikal mengawal penyebaran penyakit ini dengan menyediakan pengetahuan untuk meramal kejadian wabak yang seterusnya. Penyelidikan lepas telah dijalankan untuk memodel dan meramal pengesanan wabak denggi menggunakan pelbagai teknik perlombongan data. Penyelidikan ini bertujuan untuk mengenal pasti ciri yang meningkatkan ketepatan ramalan wabak denggi menggunakan tiga algoritma pemilihan ciri; particle swarm optimization (PSO), genetic algorithm (GA) dan rank search (RS). Berdasarkan ciri yang dipilih, tiga teknik permodelan ramalan (J48, DTNB dan Naive Bayes) dijalankan untuk peramalan wabak denggi. Set data yang digunakan dalam penyelidikan ini diperoleh dari Jabatan Kesihatan Awam, Negeri Sembilan, Malaysia. Keputusan kajian menunjukkan bahawa ketepatan ramalan meningkat apabila proses pemilihan ciri dijalankan sebelum proses permodelan. Kajian ini turut menghasilkan set ciri baru untuk mewakilkan pengesanan wabak denggi untuk agensi berkaitan kesihatan di Malaysia.

 

Kata kunci: Algoritma berasaskan alam; pemilihan ciri; penemuan ilmu dari pangkalan data; pengawalan wabak; wabak denggi

RUJUKAN

Ambu, S., Lim, L.H., Sahani, M. & Bakar, A.B. 2003. Climate change-impact on public health in Malaysia. Environ Health Focus 1: 13-21.

Andrick, B., Clark, B., Nygaard, K., Logar, A., Penaloza, M. & Welch, R. 1997. Infectious disease and climate change: Detecting contributing factors and predicting future outbreaks. Geoscience and Remote Sensing, 1997. IGARSS ‘97. Remote Sensing - A Scientific Vision for Sustainable Development 4: 1947-1949. IEEE International.

Bakar, A.A., Kefli, Z., Abdullah, S. & Sahani, M. 2011. Predictive models for dengue outbreak using multiple rulebase classifiers. Electrical Engineering and Informatics (ICEEI), 2011 International Conference, Bandung. pp. 1-6.

Barbazan, P., Yoksan, S. & Gonzalez, J.P. 2002. Dengue hemorrhagic fever epidemiology in Thailand: Description and forecasting of epidemics. Microbes Infect. 4: 699-705.

Beltz, L.A. 2011. Emerging Infectious Diseases: A Guide to Diseases, Causative Agents, and Surveillance. New York: John Wiley & Sons. pp. 315-322.

Bolón-Canedo, V., Sánchez-Maroño, N. & Alonso-Betanzos, A. 2012. An ensemble of filters and classifiers for microarray data classification. Pattern Recognition 45: 531-539.

Bolón-Canedo, V., Sánchez-Maroño, N. & Alonso-Betanzos, A. 2011. Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset. Expert Systems with Applications 38: 5947-5957.

Buckeridge, D.L., Burkom, H., Campbell, M., Hogan, W.R. & Moore, A.W. 2005. Algorithms for rapid outbreak detection: A research synthesis. Journal of Biomedical Informatics 38: 99-113.

Chidlovskii, B. & Lecerf, L. 2008. Scalable feature selection for multi-class problems. 2008. Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I (ECML PKDD ‘08), Walter Daelemans, Bart Goethals, and Katharina Morik (Eds.). Springer-Verlag, Berlin, Heidelberg. pp. 227-240.

Chong, C. 2010. Scenario of dengue in Malaysia. Paper presented at Europe-South East Asia Symposium on Dengue, 5-6 August 2010, Ministry of Health, Malaysia.

Delatte, H., Gimonneau, G., Triboire, A. & Fontenille, D. 2009. Influence of temperature on immature development, survival, longevity, fecundity, and gonotrophic cycles of Aedes albopictus, vector of chikungunya and dengue in the Indian Ocean. Journal of Medical Entomology 46: 33-41.

Edelman, R. 2007. Dengue vaccines approach the finish line. Clin. Infect. 45(Suppl. 1): S56-S60.

El Akadi, A., Amine, A., El Ouardighi, A. & Aboutajdine, D. 2011. A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowledge and Information Systems 26: 487-500.

Fu, X., Liew, C., Hung, T., Goh, H. & Lee, G. 2007. Time-series infectious disease data analysis using SVM and genetic algorithm. In IEEE Congress on Evolutionary Computation. pp. 1276-1280.

Goh, K. 1997. Dengue-a re-emerging infectious disease in Singapore. Annals of the Academy of Medicine Singapore 26(5): 664-670.

Gomez, J.C., Boiy, E. & Moens, M.F. 2012. Highly discriminative statistical features for email classification. Knowledge and Information Systems 31(1): 23-53.

Gubler, D.J. 2008. Dengue viruses. In Encyclopedia of Virology. 3rd ed., edited by Mahy, B.W.J. & van Regenmortel, M.H.V. Boston: Academic Press. pp. 5-14.

Guha-Sapir, D. & Schimmer, B. 2005. Dengue fever: New paradigms for a changing epidemiology. Emerg. Themes. Epidemiol. 2(1): 1-10.

Guy, B. & Almond, J.W. 2008. Towards a dengue vaccine: Progress to date and remaining challenges. Comparative Immunology. Microbiology and Infectious Diseases 31(2- 3): 239-252.

Guyon, I. 2003. An introduction to variable and feature selection. Journal of Machine Learning Research 3: 1157-1182.

Hombach, J. 2007. Vaccines against dengue: A review of current candidate vaccines at advanced development stages. Revista Panamericana de Salud Pública 21(4): 254-260.

Husin, N.A. & Salim, N. 2008. A comparative study for back propagation neural network and non-linear regression models for dengue outbreak prediction. Jurnal Teknologi Maklumat 20(4): 97-112.

Hussin, N., Jaafar, J., Naing, N.N., Mat, H.A., Muhamad, A.H. & Mamat, M.N. 2005. A review of dengue fever incidence in Kota Bharu, Kelantan, Malaysia during the years 1998- 2003. Southeast Asian J. Trop. Med. Public Health 36(5): 1179-1186.

Li, C., Lim, T., Han, L. & Fang, R. 1985. Rainfall, abundance of Aedes aegypti and dengue infection in Selangor, Malaysia. Southeast Asian J. Trop. Med. Public Health 16(4): 560-568.

Liu, H. & Yu, L. 2005. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowl. and Data Eng. 17(4): 491-502.

Long, Z., Abu Bakar, A., Razak Hamdan, A. & Sahani, M. 2010. Multiple attribute frequent mining-based for dengue outbreak. In Proceedings of the 6th International Conference on Advanced Data Mining and Applications: Part I (ADMA’10), edited by Longbing Cao, Yong Feng and Jiang Zhong. Berlin, Heidelberg: Springer-Verlag. pp. 489-496.

Loscalzo, S., Yu, L. & Ding, C. 2009. Consensus group stable feature selection. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘09). New York: ACM. pp. 567-576.

Mousavi, M., Bakar, A.A., Zainudin, S. & Awang, Z. 2013. Negative selection algorithm for dengue outbreak detection. Turkish Journal of Electrical Engineering & Computer Science 21: 2345-2356.

Nemati, S. & Basiri, M. 2010. Particle swarm optimization for feature selection in speaker verification. Applications of Evolutionary Computation. Lecture Notes in Computer Science 6024: 371-380.

Nyamah, M., Sulaiman, S. & Omar, B. 2010. Categorization of potential breeding sites of dengue vectors in Johor, Malaysia. Tropical Biomedicine 27(1): 33-40.

Patz, J.A. & Reisen, W.K. 2001. Immunology, climate change and vector-borne diseases. Trends in Immunology 22(4): 171-172.

Que, J. & Tsui, F.C. 2011. Rank-based spatial clustering: An algorithm for rapid outbreak detection. Journal of the American Medical Informatics Association 18(3): 218-224.

Reiter, P. 2001. Climate change and mosquito-borne disease. Environ. Health Perspect. 109(Suppl 1): 141-161.

Research, S. P. f., Diseases, T. i. T., & Diseases, W. H. O. D. o. C. o. N. T. (2010). Dengue Bulletin. 34.

Runge-Ranzinger, S., Horstick, O., Marx, M. & Kroeger, A. 2008. What does dengue disease surveillance contribute to predicting and detecting outbreaks and describing trends?. Tropical Medicine & International Health 13: 1022-1041.

Saari, P., Eerola, T. & Lartillot, O. 2011. Generalizability and simplicity as criteria in feature selection: Application to mood classification in music. IEEE Transactions on Audio, Speech, and Language Processing 19(6): 1802-1812.

Seng, S.B., Chong, A.K. & Moore, A. 2005. Geostatistical modelling, analysis and mapping of epidemiology of Dengue Fever in Johor State, Malaysia. Presented at the 17th Annual Colloquium of the Spatial Information Research Centre (SIRC 2005: A Spatio-temporal Workshop). pp. 109-123.

Shekhar, K.C. & Huat, O.L. 1992. Epidemiology of dengue/ dengue hemorrhagic fever in Malaysia - A retrospective epidemiological study 1973-1987. Part I: Dengue hemorrhagic fever (DHF). Asia Pac. J. Public Health 6(3): 126-133.

Skae, F. 1902. Dengue fever in Penang. Br. Med. J. 2(2185): 1581-1582.

Sun, Y., Babbs, C. & Delp, E. 2005. A comparison of feature selection methods for the detection of breast cancers in mammograms: Adaptive sequential floating search vs. genetic algorithm. IEEE-EMBS 2005. 27th Annual International Conference, Shanghai. pp. 6532-6535.

Talarmin, A., Peneau, C., Dussart, P., Pfaff, F., Courcier, M., de Rocca-Serra, B. & Sarthou, J. 2000. Surveillance of dengue fever in French Guiana by monitoring the results of negative malaria diagnoses. Epidemiol. Infect. 125(1): 189-193.

Toth, E., Brath, A. & Montanari, A. 2000. Comparison of short-term rainfall prediction models for real-time flood forecasting. Journal of Hydrology 239(1-4): 132-147.

Tuv, E., Borisov, A., Runger, G. & Torkkola, K. 2009. Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10: 1341-1366.

Vainer, I., Kraus, S., Kaminka, G.A. & Slovin, H. 2011. Obtaining scalable and accurate classification in large-scale spatio-temporal domains. Knowledge and Information Systems 29(3): 527-564.

World Health Organization. 2009. Research SPF, Diseases TIT, Diseases WHOD, Epidemic WHO and P. Alert, Dengue, Guidelines for Diagnosis, Treatment, Prevention and Control.

Wu, Y., Lee, G., Fu, X., Soh, H. & Hung, T. 2009. Mining weather information in dengue outbreak: Predicting future cases based on wavelet, SVM and GA. Advances in Electrical Engineering and Computational Science. Netherlands: Springer. pp. 483-494.

Zhang, Y., Ding, C. & Li, T. 2008. Gene selection algorithm by combining reliefF and mRMR. BMC Genomics Supp1 2: S27.

 

 

*Pengarang untuk surat-menyurat; email: suhaila.zainudin@ukm.edu.my

 

 

 

 

sebelumnya