Sains Malaysiana 47(8)(2018): 1931–1940

http://dx.doi.org/10.17576/jsm-2018-4708-35

 

The Extra Zeros in Traffic Accident Data: A Study on the Mixture of Discrete Distributions

(Lebihan Sifar dalam Data Kemalangan Jalan Raya: Satu Kajian bagi Taburan Diskret Campuran)

 

ZAMIRA HASANAH ZAMZURI*, MOHD SYAFIQ SAPUAN & KAMARULZAMAN IBRAHIM

 

Pusat Pengajian Sains Matematik, Fakulti Sains dan Teknologi, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor Darul Ehsan, Malaysia

 

Diserahkan: 29 Mac 2018/Diterima: 2 April 2018

 

 

ABSTRACT

The presence of extra zeros is commonly observed in traffic accident count data. Past research opt to the zero altered models and explain that the zeros are sourced from under reporting situation. However, there is also an argument against this statement since the zeros could be sourced from Poisson trial process. Motivated by the argument, we explore the possibility of mixing several discrete distributions that can contribute to the presence of extra zeros. Four simulation studies were conducted based on two accident scenarios and two discrete distributions: Poisson and negative binomial; by considering six combinations of proportion values correspond to low, moderate and high mean values in the distribution. The results of the simulation studies concur with the claim as the presence of extra zeros is detected in most cases of mixed Poisson and mixed negative binomial data. Data sets that are dominated by Poisson (or negative binomial) with low mean show an apparent existence of extra zeros although the sample size is only 30. An illustration using a real data set concur the same findings. Hence, it is essential to consider the mixed discrete distributions as potential distributions when dealing with count data with extra zeros. This study contributes on creating awareness of the possible alternative distributions for count data with extra zeros especially in traffic accident applications.

 

Keywords: Hurdle models; negative binomial; Poisson; proportion; simulation study; traffic accident; zero-inflated models

 

ABSTRAK

Kehadiran lebihan sifar sering dicerap dalam data bilangan kemalangan jalan raya. Kajian lepas cenderung kepada penggunaan model dengan ubah suaian sifar dan menjelaskan bahawa lebihan sifar ini berpunca daripada keadaan kemalangan tidak terlapor. Walau bagaimanapun, terdapat tentangan terhadap pernyataan ini dengan kehadiran lebihan sifar ini boleh berpunca daripada campuran beberapa taburan diskret yang mewakili taburan bagi masa atau lokasi berbeza. Maka, kajian ini bermatlamat untuk meneroka teori bahawa taburan disket tercampur boleh menyumbang kepada lebihan sifar dalam data bilangan. Empat kajian simulasi dijalankan berdasarkan dua senario kemalangan dan dua taburan diskret: Poisson dan binomial negatif; dengan mengambil kira enam gabungan nilai perkadaran bagi nilai purata rendah, sederhana dan tinggi dalam taburan tersebut. Keputusan kajian bersetuju dengan teori tersebut dengan kehadiran lebihan sifar dapat dikenal pasti dalam kebanyakan kes data Poisson tercampur dan binomial negatif tercampur. Set data yang didominasi oleh Poisson (atau binomial negatif) dengan nilai purata rendah menunjukkan bilangan lebihan sifar yang ketara walaupun saiz sampel hanyalah 30. Oleh itu, adalah amat penting bagi pengkaji untuk mengambil kira taburan diskret tercampur ini apabila berhadapan data bilangan dengan lebihan sifar. Kajian ini menyumbang dalam mencetus kesedaran berkenaan potensi taburan alternatif untuk data bilangan terlebih sifar terutamanya dalam aplikasi kemalangan jalan raya.

 

Kata kunci: Binomial negatif; kajian simulasi; kemalangan jalan raya; model lebihan sifar; model terpangkas; perkadaran; Poisson

RUJUKAN

 

Breunning, S.M. & Bone, A.J. 1959. Interchange Accident Exposure Highway Research Board Bulletin 240, Washington D.C: National Research Council. pp: 44-52.

Bruin, J. 2006. Newtest: Command to compute new test. UCLA: Statistical Consulting Group. https://stats.idre.ucla.edu/stata/ ado/analysis/.

Chen, F., Suren, C. & Ma, X. 2016. Crash frequency modeling using real-time environmental and traffic data and unbalanced panel data models. Int. J. Environ. Res. Public Health 13(6). doi: 10.3390/ijerph13060609.

Chin, H.C.C. & Quddus, M.A. 2003. Applying the random effect negative binomial model to examine traffic accident occurrence at signalized intersections. Accident Analysis and Prevention 35(2): 253-259.

Dalrymple, M.L., Hudson, I.L. & Hudson, R.P. 2003. Finite mixture, zero-inflated poisson and hurdle models with application to SIDS. Computational Statistics & Data Analysis 41(3): 491-504.

Dong, C., Clarke, D.B., Yan, X., Khattak, A. & Huang, B. 2016. Multivariate random-parameters zero-inflated negative binomial regression model: An application to estimate crash frequencies at intersections. Accident Analysis and Prevention 70: 320-329.

Hauer, E., Ng, J.C.N. & Lovell, J. 1988. Estimation of safety at signalized intersections. Transportation Research Record 1185: 48-61.

Ismail, N., Mohd Ali, K.M. & Chiew, A.C. 2004. A model for insurance claim count with single and finite mixture distribution. Sains Malaysiana 33(2): 173-194.

Kim, D.H., Ramjan, M.N. & Mak, K. 2016. Prediction of vehicle crashes by drivers’ characteristics and past traffic violations in Korea using a zero-inflated negative binomial model. Traffic Injury Prevention 17(1): 86-90.

Kumara, S.S.P. & Chin, H.C. 2003. Modelling accident occurrence at signalized tee intersections with special emphasis on excess zeros. Traffic Injury Prevention 4(1): 53-57.

Kweon, Y.J. & Kockelman, K.M. 2003. Overall injury risk to different drivers: Combining exposure, frequency, and severity models. Accident Analysis & Prevention 35(4): 441-450.

Li, Z., Knight, S., Cook, L.J., Holubkov, R. & Olson, L.M. 2008. Modeling motor vehicle crashes for street racers using zero-inflated models. Accident Analysis and Prevention 40(2): 835-839.

Lord, D., Washington, S.P. & Ivan, J.N. 2005. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: Balancing statistical fit and theory. Accident Analysis and Prevention 37(1): 35-46 .

Mahdavi, M. & Mahdavi, M. 2014. Stochastic lead time demand estimation via monte carlo simulation technique in supply chain planning. Sains Malaysiana 43(4): 629-636.

Manan, M. & Varhelyi, A. 2012. Motorcycle fatalities in Malaysia. IATSS Research 36: 30-39.

Martin, T.G., Wintle, B.A., Rhodes, J.R., Kuhnert, P.M., Field, S.A., LowChoy, S.J., Tyre, A.J. & Possingham, H.P. 2005. Zero tolerance ecology: Improving ecological inference by modelling the source of zero observations. Ecology Letters 8(11): 1235-1246.

Maycock, G. & Hall, R.D. 1984. Accidents at 4-arm roundabouts. Laboratory Report LR1120, Transport Research Laboratory, Crowthorne, Berks, UK (Unpublished).

Miaou, S.P. 2001. Estimating Roadside Encroachment Rates with the Combined Strengths of Accident and Encroachment- Based Approaches (FHWARD-01-124). Oak Ridge, TN: Oak Ridge National Laboratory (Unpublished).

Miao, S.P. 1994. The relationship between truck accidents and geometric design of road sections: Poisson versus negative binomial regressions. Accident Analysis & Prevention 26: 471-482.

Miaou, S.P. & Lum, H. 1993. Modeling vehicle accidents and highway geometric design relationships. Accident Analysis & Prevention 25(6): 689-709.

Miaou, S.P., Hu, P.S., Wright, T., Rathi, A.K. & Davis, S.C. 1992. Relationship between truck accidents and highway geometric design: A Poisson regression approach. Transportation Research Record 1376: 10-18.

Oh, J., Washington, S.P. & Nam, D. 2006. Accident prediction model for railway- highway interfaces. Accident Analysis & Prevention 38: 346-356.

Roshandeh, A.M., Agbelie, B. & Lee, Y. 2016. Statistical modelling of total crash frequency at highway intersections. Journal of Traffic and Transportation Engineering 3(2): 166-171.

Qin, X., Ivan, J.N. & Ravishanker, N. 2004. Selecting exposure measures in crash rate prediction for two-lane highway segments. Accident Analysis & Prevention 36: 183-191.

Ridout, M., Clarice, G.B. & Hinde, J. 1998. Models for count data with many zeros. International Biometric Conference, Cape Town.

Shankar, V., Milton, J. & Mannering, F.L. 1997. Modelling accident frequency as zero-altered probability processes: An empirical enquiry. Accident Analysis & Prevention 29: 829-837.

Shankar, V.N., Gudmundur, F.U., Ram, M.P. & MaryLou, B.N. 2003. Modelling crashes involving pedestrians and motorized traffics. Safety Science 41: 627-640.

Tanner, J.C. 1953. Accidents at rural three way junctions. Journal of the Institution of Highway Engineers 2(11): 56-67.

Ullaha, S., Caroline, F. & Fincha, L.D. 2010. Statistical modelling for falls count data. Accident Analysis & Prevention 42(2): 384-392.

Warton, D.I. 2005. Many zeros does not mean zero inflation: Comparing the goodness- of-fit of parametric models to multivariate abundance data. Environmetrics 16(3): 275- 289.

Welsh, A.H., Cunningham, R.B., Donnelly, C.F. & Lindenmayer, D.B. 1996. Modelling the abundance of rare species: Statistical models for counts with extra zeros. Ecological Modelling 88(13): 297-308.

Zamzuri, Z.H. 2016. Selected models for correlated traffic accident count data. Advances in industrial and applied mathematics. Proceedings of 23rd Malaysian National Symposium of Mathematical Sciences, SKSM 2015. American Institute of Physics Inc. p. 1750.

Zamzuri, Z. 2015. An alternative method for fitting a zero inflated negative binomial distribution. Global Journal of Pure and Applied Mathematics 11(4): 2461-2467.

Zegeer, C.V., Stewart, J.R., Huang, H.H. & Lagerwey, P.A. 2001. Safety effects of marked vs. unmarked crosswalks at uncontrolled locations: Analysis of pedestrian crashes in 30 cities (with discussion and closure). Transportation Research Record 1773: 56-68.

 

*Pengarang untuk surat-menyurat; email: zamira@ukm.edu.my

 

 

 

 

 

sebelumnya