SAINS MALAYSIANA

Sains Malaysiana 49(5)(2020): 1165-1174

http://dx.doi.org/10.17576/jsm-2020-4905-22

Imputation Techniques for Incomplete Load Data Based on Seasonality and Orientation of the Missing Values

(Teknik Pengimputan untuk Data Beban tak Lengkap Berdasarkan Kemusiman dan Orientasi Nilai yang Hilang)

NUR ARINA BAZILAH KAMISAN¹*, MUHAMMAD HISYAM LEE¹, ABDUL GHAPOR HUSSIN² & YONG ZULINA ZUBAIRI³

¹Mathematics Department, Faculty of Science, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Darul Takzim, Malaysia

²Faculty of Science and Defence Technology, Universiti Pertahanan Nasional Malaysia, 50300 Kuala Lumpur, Federal Territory, Malaysia

³Pusat Asasi Sains Universiti Malaya, Universiti Malaya, 50300 Kuala Lumpur, Federal Territory, Malaysia

Diserahkan: 12 Ogos 2019/Diterima: 24 Januari 2020

ABSTRACT

In load data, the missing problem always occurs in a set of data. Since it has a seasonal pattern according to days, most of the time, the load usage for the next day is predictable. For this reason, a new model has been developed based on these characteristics. Data containing missing values being divided to its seasonality pattern and for each subdivision, the values from mean, the mean with standard deviation and third quartile are calculated before being rearrange to form a new set of values that will replace the missing values. These three values will be used as imputations for the missing values. To examine the effects of the orientation of the missing values with the choices of imputation, the missing values from the data are divided into three parts: at the front, in the middle and at the end of the data with 5%, 15%, and 25% of missing values. The results from root mean square error and mean absolute error show that the proposed techniques, particularly the mean and the third quartile value, are superior to the other complex methods when dealing with the missing values. The mean imputation is ample when the missing values is presence at the front and in the middle of the data while the third quartile value is superior when the missing values is at the end of the data.

Keywords: Data orientation; missing values; multiple imputation; seasonal load data; seasonality

ABSTRAK

Dalam data beban, masalah kehilangan data selalu berlaku dalam satu set data. Memandangkan ia mempunyai corak bermusim mengikut hari, kebanyakan masa, penggunaan beban untuk hari berikutnya boleh diramal. Atas sebab ini, satu model baru telah dibangunkan berdasarkan ciri-ciri ini. Data yang mengandungi nilai yang hilang yang dibahagikan kepada bentuk pola bermusimnya dan bagi setiap subdata, nilai min, min bersama hasil tambah sisihan piawai dan kuartil ketiga dihitung sebelum disusun semula untuk membentuk satu set nilai baru yang akan menggantikan nilai data yang hilang. Ketiga-tiga nilai ini akan digunakan sebagai pengimputan untuk nilai yang hilang. Untuk mengkaji kesan kedudukan nilai-nilai yang hilang dengan pilihan pengimputan, nilai-nilai yang hilang daripada data dibahagikan kepada tiga bahagian iaitu: di bahagian depan data, di tengah data dan di akhir data dengan 5%, 15% dan 25% nilai yang hilang. Keputusan daripada ralat min punca kuasa dan ralat min mutlak menunjukkan bahawa teknik yang dicadangkan, terutamanya pengimputan nilai min dan kuartil ketiga, memberikan hasil yang lebih bagus daripada kaedah kompleks lain ketika berurusan dengan nilai yang hilang. Pengimputan min adalah bagus apabila nilai-nilai yang hilang berada di hadapan dan di tengah data manakala nilai kuartil ketiga lebih bagus apabila nilai-nilai yang hilang berada pada bahagian akhir data.

Kata kunci: Data beban bermusim; data orientasi; kepelbagaian pengimputan; nilai yang hilang; kemusiman

RUJUKAN

Acock, A.C. 2005. Working with missing values. Journal of Marriage and Family 67(4): 1012-1028.

Bennett, D.A. 2001. How can I deal with missing data in my study? Australian and New Zealand Journal of Public Health 25(5): 464-469.

Brockwell, P.J. & Davis, R.A. 2013. Time Series: Theory and Methods. New York: Springer Science & Business Media.

Chatfield, C. 2000. Time-Series Forecasting. Boca Raton: Chapman & Hall/CRC.

Cokluk, O. & Kayri, M. 2011. The effects of methods of imputation for missing values on the validity and reliability of scales. Educational Sciences: Theory and Practice 11(1): 303-309.

Cumming, G., Fidler, F. & Vaux, D.L. 2007. Error bars in experimental biology. The Journal of Cell Biology 177(1): 7-11.

Damsleth, E. 1980. Interpolating missing values in a time series. Scandinavian Journal of Statistics7(1): 33-39.

Ferreiro, O. 1987. Methodologies for the estimation of missing observations in time series. Statistics & Probability Letters 5(1): 65-69.

Gerald, C.F. & Wheatley, P.O. 2004. Applied Numerical Analysis with MAPLE. Boston: Addison-Wesley.

Gómez, V., Maravall, A. & Peña, D. 1992. Computing missing values in time series. Computational Statistics 1: 283-296.

Hamilton, J.D. 1994. Time Series Analysis. Volume 2. New Jersey: Princeton University Press.

Harvey, A.C. 1990. Forecasting, Structural Time Series Models and The Kalman Filter. Cambridge: Cambridge University Press.

Honaker, J. & King, G. 2010. What to do about missing values in time‐series cross‐section data. American Journal of Political Science 54(2): 561-581.

Hyndman, R.J. & Koehler, A.B. 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22(4): 679-688.

Janacek, G.J. & Swift, L. 1993. Time Series: Forecasting, Simulation, Applications. New York: Ellis Horwood.

Kihoro, J. & Athiany, K. 2013. Imputation of incomplete non-stationary seasonal time series data. Mathematical Theory and Modeling 3(12): 142-154.

Peng, C.Y.J., Harwell, M., Liou, S.M. & Ehman, L.H. 2006. Advances in missing data methods and implications for educational research. In Real Data Analysis, edited by Sawilowsky, S.S. North Carolina: IAP. pp. 31-78.

Penn, D.A. 2007. Estimating missing values from the general social survey: An application of multiple imputation. Social Science Quarterly 88(2): 573-584.

Ruiz, E. & Nieto, F.H. 2000. A note on linear combination of predictors. Statistics & Probability Letters 47(4): 351-356.

Schafer, J.L. 1999. Multiple imputation: A primer. Statistical Methods in Medical Research 8(1): 3-15.

Schlomer, G.L., Bauman, S. & Card, N.A. 2010. Best practices for missing data management in counseling psychology. Journal of Counseling Psychology 57(1): 1-10.

Shukur, O.B. & Lee, M.H. 2015. Imputation of missing values in daily wind speed data using hybrid AR-ANN method. Modern Applied Science 9(11): 1-11.

Sorjamaa, A. & Lendasse, A. 2007. Time series prediction as a problem of missing values: Application to ESTSP2007 and NN3 competition benchmarks. Paper presented at the, International Joint Conference on Neural Networks 2007 (IJCNN 2007).

Willmott, C.J. & Matsuura, K. 2005. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research 30(1): 79-82.

Winkler, A. & McCarthy, P. 2005. Maximising the value of missing data. Journal of Targeting, Measurement and Analysis for Marketing 13(2): 168-178.

Zhang, Z., Yang, X., Li, H., Li, W., Yan, H. & Shi, F. 2017. Application of a novel hybrid method for spatiotemporal data imputation: A case study of the Minqin County groundwater level. Journal of Hydrology 553: 384-397.

*Pengarang untuk surat-menyurat; email: nurarinabazilah@utm.my

sebelumnya

kandungan

seterusnya