Sains Malaysiana 47(8)(2018): 1931–1940
http://dx.doi.org/10.17576/jsm-2018-4708-35
The Extra Zeros in Traffic Accident Data: A
Study on the Mixture of Discrete Distributions
(Lebihan Sifar dalam Data Kemalangan Jalan Raya:
Satu Kajian bagi Taburan Diskret Campuran)
ZAMIRA HASANAH ZAMZURI*, MOHD SYAFIQ SAPUAN
& KAMARULZAMAN IBRAHIM
Pusat Pengajian Sains
Matematik, Fakulti Sains dan Teknologi, Universiti Kebangsaan Malaysia, 43600 UKM Bangi,
Selangor Darul Ehsan, Malaysia
Diserahkan: 29 Mac 2018/Diterima:
2 April 2018
ABSTRACT
The presence of extra zeros is
commonly observed in traffic accident count data. Past research opt to the zero
altered models and explain that the zeros are sourced from under reporting
situation. However, there is also an argument against this statement since the
zeros could be sourced from Poisson trial process. Motivated
by the argument, we explore the possibility of mixing several discrete
distributions that can contribute to the presence of extra zeros. Four simulation
studies were conducted based on two accident scenarios and two discrete
distributions: Poisson and negative binomial; by considering six
combinations of proportion values correspond to low, moderate and high mean
values in the distribution. The results of the simulation studies concur with
the claim as the presence of extra zeros is detected in most cases of mixed Poisson and mixed negative binomial data. Data sets that are dominated by Poisson (or negative binomial) with low mean show an apparent existence of extra
zeros although the sample size is only 30. An illustration using a real data
set concur the same findings. Hence, it is essential to consider the mixed
discrete distributions as potential distributions when dealing with count data
with extra zeros. This study contributes on creating awareness of the possible
alternative distributions for count data with extra zeros especially in traffic
accident applications.
Keywords: Hurdle models; negative binomial;
Poisson; proportion; simulation study; traffic accident;
zero-inflated models
ABSTRAK
Kehadiran lebihan sifar sering dicerap
dalam data bilangan kemalangan jalan raya. Kajian lepas cenderung
kepada penggunaan model dengan ubah suaian sifar dan menjelaskan
bahawa lebihan sifar ini berpunca daripada keadaan kemalangan
tidak terlapor. Walau bagaimanapun, terdapat tentangan terhadap
pernyataan ini dengan kehadiran lebihan sifar ini boleh berpunca
daripada campuran beberapa taburan diskret yang mewakili taburan
bagi masa atau lokasi berbeza. Maka, kajian ini bermatlamat untuk
meneroka teori bahawa taburan disket tercampur boleh menyumbang
kepada lebihan sifar dalam data bilangan. Empat kajian simulasi
dijalankan berdasarkan dua senario kemalangan dan dua taburan
diskret: Poisson dan
binomial negatif; dengan mengambil kira enam gabungan nilai perkadaran
bagi nilai purata rendah, sederhana dan tinggi dalam taburan tersebut.
Keputusan kajian bersetuju dengan teori tersebut dengan kehadiran
lebihan sifar dapat dikenal pasti dalam kebanyakan kes data Poisson
tercampur dan binomial negatif tercampur. Set data yang didominasi
oleh Poisson (atau binomial negatif) dengan nilai purata
rendah menunjukkan bilangan lebihan sifar yang ketara walaupun
saiz sampel hanyalah 30. Oleh itu, adalah amat penting bagi pengkaji
untuk mengambil kira taburan diskret tercampur ini apabila berhadapan
data bilangan dengan lebihan sifar. Kajian ini menyumbang dalam
mencetus kesedaran berkenaan potensi taburan alternatif untuk
data bilangan terlebih sifar terutamanya dalam aplikasi kemalangan
jalan raya.
Kata kunci: Binomial negatif; kajian simulasi; kemalangan jalan raya;
model lebihan sifar; model terpangkas; perkadaran; Poisson
RUJUKAN
Breunning, S.M. &
Bone, A.J. 1959. Interchange Accident Exposure Highway Research Board
Bulletin 240, Washington D.C: National Research Council. pp: 44-52.
Bruin, J. 2006. Newtest:
Command to compute new test. UCLA: Statistical Consulting Group.
https://stats.idre.ucla.edu/stata/ ado/analysis/.
Chen, F., Suren, C.
& Ma, X. 2016. Crash frequency modeling using real-time environmental and
traffic data and unbalanced panel data models. Int. J. Environ. Res. Public
Health 13(6). doi: 10.3390/ijerph13060609.
Chin, H.C.C. &
Quddus, M.A. 2003. Applying the random effect negative binomial model to
examine traffic accident occurrence at signalized intersections. Accident
Analysis and Prevention 35(2): 253-259.
Dalrymple, M.L., Hudson,
I.L. & Hudson, R.P. 2003. Finite mixture, zero-inflated poisson and
hurdle models with application to SIDS. Computational Statistics & Data
Analysis 41(3): 491-504.
Dong, C., Clarke, D.B.,
Yan, X., Khattak, A. & Huang, B. 2016. Multivariate random-parameters
zero-inflated negative binomial regression model: An application to estimate
crash frequencies at intersections. Accident Analysis and Prevention 70:
320-329.
Hauer, E., Ng, J.C.N.
& Lovell, J. 1988. Estimation of safety at signalized intersections. Transportation
Research Record 1185: 48-61.
Ismail, N., Mohd Ali,
K.M. & Chiew, A.C. 2004. A model for insurance claim count with single and
finite mixture distribution. Sains Malaysiana 33(2): 173-194.
Kim, D.H., Ramjan, M.N.
& Mak, K. 2016. Prediction of vehicle crashes by drivers’ characteristics
and past traffic violations in Korea using a zero-inflated negative binomial
model. Traffic Injury Prevention 17(1): 86-90.
Kumara, S.S.P. &
Chin, H.C. 2003. Modelling accident occurrence at signalized tee intersections
with special emphasis on excess zeros. Traffic Injury Prevention 4(1):
53-57.
Kweon, Y.J. &
Kockelman, K.M. 2003. Overall injury risk to different drivers: Combining
exposure, frequency, and severity models. Accident Analysis & Prevention 35(4): 441-450.
Li, Z., Knight, S.,
Cook, L.J., Holubkov, R. & Olson, L.M. 2008. Modeling motor vehicle crashes
for street racers using zero-inflated models. Accident Analysis and
Prevention 40(2): 835-839.
Lord, D., Washington,
S.P. & Ivan, J.N. 2005. Poisson, Poisson-gamma and zero-inflated regression
models of motor vehicle crashes: Balancing statistical fit and theory. Accident
Analysis and Prevention 37(1): 35-46 .
Mahdavi, M. &
Mahdavi, M. 2014. Stochastic lead time demand estimation via monte carlo
simulation technique in supply chain planning. Sains Malaysiana 43(4):
629-636.
Manan, M. &
Varhelyi, A. 2012. Motorcycle fatalities in Malaysia. IATSS Research 36:
30-39.
Martin, T.G., Wintle,
B.A., Rhodes, J.R., Kuhnert, P.M., Field, S.A., LowChoy, S.J., Tyre, A.J. &
Possingham, H.P. 2005. Zero tolerance ecology: Improving ecological inference
by modelling the source of zero observations. Ecology Letters 8(11):
1235-1246.
Maycock, G. & Hall,
R.D. 1984. Accidents at 4-arm roundabouts. Laboratory Report LR1120, Transport
Research Laboratory, Crowthorne, Berks, UK (Unpublished).
Miaou, S.P. 2001.
Estimating Roadside Encroachment Rates with the Combined Strengths of Accident
and Encroachment- Based Approaches (FHWARD-01-124). Oak Ridge, TN: Oak Ridge
National Laboratory (Unpublished).
Miao, S.P.
1994. The relationship between truck accidents and geometric design of road
sections: Poisson versus negative binomial regressions. Accident Analysis
& Prevention 26: 471-482.
Miaou, S.P. & Lum, H. 1993. Modeling vehicle accidents
and highway geometric design relationships. Accident Analysis &
Prevention 25(6): 689-709.
Miaou, S.P., Hu, P.S.,
Wright, T., Rathi, A.K. & Davis, S.C. 1992. Relationship between truck
accidents and highway geometric design: A Poisson regression approach. Transportation
Research Record 1376: 10-18.
Oh, J., Washington, S.P.
& Nam, D. 2006. Accident prediction model for railway- highway interfaces. Accident
Analysis & Prevention 38: 346-356.
Roshandeh, A.M.,
Agbelie, B. & Lee, Y. 2016. Statistical modelling of total crash frequency
at highway intersections. Journal of Traffic and Transportation Engineering 3(2):
166-171.
Qin, X., Ivan, J.N.
& Ravishanker, N. 2004. Selecting exposure measures in crash rate
prediction for two-lane highway segments. Accident Analysis & Prevention 36: 183-191.
Ridout, M., Clarice,
G.B. & Hinde, J. 1998. Models for count data with many zeros. International
Biometric Conference, Cape Town.
Shankar, V., Milton, J.
& Mannering, F.L. 1997. Modelling accident frequency as zero-altered
probability processes: An empirical enquiry. Accident Analysis &
Prevention 29: 829-837.
Shankar, V.N.,
Gudmundur, F.U., Ram, M.P. & MaryLou, B.N. 2003. Modelling crashes
involving pedestrians and motorized traffics. Safety Science 41:
627-640.
Tanner, J.C. 1953. Accidents
at rural three way junctions. Journal of the Institution of Highway
Engineers 2(11): 56-67.
Ullaha, S., Caroline, F.
& Fincha, L.D. 2010. Statistical modelling for falls count data. Accident
Analysis & Prevention 42(2): 384-392.
Warton, D.I. 2005. Many
zeros does not mean zero inflation: Comparing the goodness- of-fit of
parametric models to multivariate abundance data. Environmetrics 16(3):
275- 289.
Welsh, A.H., Cunningham,
R.B., Donnelly, C.F. & Lindenmayer, D.B. 1996. Modelling the abundance of
rare species: Statistical models for counts with extra zeros. Ecological
Modelling 88(13): 297-308.
Zamzuri, Z.H. 2016.
Selected models for correlated traffic accident count data. Advances in
industrial and applied mathematics. Proceedings of 23rd Malaysian National
Symposium of Mathematical Sciences, SKSM 2015. American Institute of
Physics Inc. p. 1750.
Zamzuri, Z. 2015. An
alternative method for fitting a zero inflated negative binomial distribution. Global
Journal of Pure and Applied Mathematics 11(4): 2461-2467.
Zegeer, C.V., Stewart,
J.R., Huang, H.H. & Lagerwey, P.A. 2001. Safety effects of marked vs.
unmarked crosswalks at uncontrolled locations: Analysis of pedestrian crashes
in 30 cities (with discussion and closure). Transportation Research Record 1773:
56-68.
*Pengarang untuk surat-menyurat; email: zamira@ukm.edu.my