Sains
Malaysiana 51(3)(2022): 911-927
http://doi.org/10.17576/jsm-2022-5103-24
Improved
Spatial Outlier Detection Method Within a River Network
(Kaedah
Pengesanan Pencilan Reruang DiPerbaik dalam Suatu Jaringan Sungai)
NUR FATIHAH MOHD ALI1,
ROSSITA MOHAMAD YUNUS1,*, IBRAHIM MOHAMED1 & FARIDAH
OTHMAN2
1Institute
of Mathematical Sciences, Faculty of Science, Universiti Malaya, 50603 Kuala
Lumpur, Federal Territory, Malaysia
2Department
of Civil Engineering, Faculty of Engineering, Universiti Malaya, 50603 Kuala
Lumpur, Federal Territory, Malaysia
Diserahkan: 1 Februari 2021/Diterima: 13 Ogos 2021
Abstract
A spatial outlier refers to the
observation whose non-spatial attribute values are significantly different from
those of its neighbors. Such observations can also be found in water quality
data at monitoring stations within a river network. However, existing spatial
outlier detection procedures based on distance measures such as the Euclidean
distance between monitoring stations do not take into account the river network
topology. In general, water quality levels in lower streams will be affected by
the flow from the upper streams. Similarly, the water quality at some
tributaries may have little influence on the other tributaries. Hence, a method
for identifying spatial outliers in a river network, taking into account the
effect of river flow connectivity on the determination of the neighbors of the
monitoring stations, is proposed. While the robust Mahalalobis distance is used
in both methods, the proposed method uses river distance instead of the
Euclidean distance. The performance of the proposed method is shown to be superior using a
synthetic river dataset through simulation. For illustration, we apply the
proposed method on the water quality data from Sg. Klang Basin in 2016 provided
by the Department of Environment, Malaysia. The finding provides a better
identification of the water quality in some stations that significantly differ
from their neighbouring stations. Such information is useful for the authorities in their planning of the
environmental monitoring of water quality in the areas.
Keywords: Euclidean distance; river
distance; robust multivariate; spatial outlier; water quality
Abstrak
Reruang terpencil merujuk kepada
cerapan dengan nilai atribut reruang berbeza secara signifikan berbanding
daripada nilai kejiranannya. Cerapan ini boleh dikesan daripada data kualiti
air yang dikumpul di stesen-stesen dalam jaringan sungai. Walau bagaimanapun,
kaedah semasa untuk mengenal pasti pencilan reruang menggunakan jarak yang
diukur antara stesen seperti jarak Euclidean tidak mengambil kira aspek
topologi jaringan sungai. Secara umumnya, aras kualiti air pada hilir jaringan
sungai dipengaruhi oleh aliran daripada hulu sungai. Begitu juga, kualiti air
pada sesuatu jaringan sungai mungkin mempengaruhi sedikit kualiti air pada
jaringan sungai yang berbeza. Kaedah dalam mengenal pasti reruang terpencil
dalam jaringan sungai dengan mengambil kira kesan terhadap hubung kait aliran
sungai bagi menentukan kejiranan sesebuah stesen dicadangkan. Walaupun
penganggar kukuh jarak Mahalanobis digunakan dalam kedua-dua kaedah, tetapi
kaedah yang dicadangkan ini menggunakan jarak aliran sungai dan bukannya jarak
Euclidean. Berpandukan kaedah simulasi set data sungai sintetik, prestasi
kaedah yang diperkenalkan ini terbukti lebih baik. Sebagai ilustrasi, kaedah
yang diperkenalkan ini diterapkan pada data kualiti air yang diperoleh daripada
Sg. Klang pada tahun 2016 yang disediakan oleh Jabatan Alam Sekitar,
Malaysia. Keputusan daripada hasil kajian dapat membantu mengenal pasti kualiti
air di beberapa buah stesen yang jauh lebih baik daripada stesen berdekatan.
Maklumat ini sangat berguna kepada pihak berwajib dalam merancang pemantauan
kualiti air di kawasan sekitarnya.
Kata kunci: Jarak aliran sungai; jarak Euclidean; kualiti air; penganggar
multivariat; reruang terpencil
RUJUKAN
Alok Kumar, S. & Lalitha, S. 2018. A novel spatial outlier
detection technique. Communications in
Statistics-Theory and Methods 47(1): 247-257.
Anselin,
L. 1995. Local Indicators of Spatial Association - LISA. Geographical Analysis 27(2): 93-115.
Azimi, A., Bagheri, N., Mostafavi, S.M., Furst, M.A.,
Hashtarkhani, S., Amin, F.H. & Kiani, B. 2021. Spatial-time analysis of
cardiovascular emergency medical requests: Enlightening policy and
practice. BMC Public Health 21(1):
1-12.
Baur,
C., Denner, S., Wiestler, B., Navab, N. & Albarqouni, S. 2021. Autoencoders
for unsupervised anomaly segmentation in brain MR images: A comparative
study. Medical Image Analysis 69: 101952.
de
Fouquet, C. & Bernard-Michel, C. 2006. Geostatistical models for
concentrations or flow rates in streams.
Comptes Rendus Geoscience 338(5): 307-318.
Cai,
Q., He, H. & Hong Man. 2009. SOMSO: A self-organizing map approach for
spatial outlier detection with multiple attributes. In IEEE International Joint
Conference on Neural Networks. pp. 425-431.
Chen,
D., Lu, C-T., Kou, Y. & Chen, F. 2008. On detecting spatial outliers. Geoinformatica 12(4): 455-475.
Cressie,
N.A.C. 1993. Spatial Statistics. New
York: John Wiley and Sons. Inc.
Cressie, N., Frey, J., Harch,
B. & Smith, M. 2006. Spatial prediction on a river network. Journal of
Agricultural, Biological, and Environmental Statistics 11: 127-150.
Ernst,
M. & Haesbroeck, G. 2017. Comparison of local outlier detection techniques
in spatial multivariate data. Data
Mining and Knowledge Discovery 31(2): 371-399.
Fawcett,
T. 2006. An introduction to ROC analysis. Pattern
Recognition Letters 27(8): 861-874.
Filzmoser,
P., Ruiz-Gazen, A. & Thomas-Agnan, C. 2014. Identification of local multivariate
outliers. Statistical Papers 55(1):
29-47.
Hasib,
N.A. & Othman, Z. 2020. Assessing the relationship between pollution
sources and water quality parameters of Sungai Langat Basin using association
rule mining. Sains Malaysiana 49(10):
2345-2358.
Haslett,
J. 1992. Spatial data analysis-challenges. Journal
of the Royal Statistical Society: Series D (The Statistician) 41(3):
271-284.
Ibrahim Mohamed, Faridah Othman, Adriana IN Ibrahim, ME Alaa-Eldin
& Rossita M Yunus. 2015. Assessment of water quality parameters using
multivariate analysis for Klang River basin, Malaysia. Environmental Monitoring and Assessment 187(1): 4182.
Jat,
P. 2017. Geostatistical estimation of water quality using river and flow
covariance models. PhD Thesis. The University of North Carolina at Chapel Hill
(Unpublished).
Kelleher,
C. & Braswell, A. 2021. Introductory overview: Recommendations for
approaching scientific visualization with large environmental datasets. Environmental Modelling & Software 143: 105113.
Kou,
Y. 2006. Abnormal pattern recognition in spatial data. PhD thesis. Virginia
Tech. (Unpublished).
Kou,
Y., Lu, C-T. & Chen, D. 2016. Spatial weighted outlier detection. In Proceedings of the 2006 SIAM
International Conference on Data Mining. SIAM, 2006. pp. 614-618.
Lachhab,
A., Trent, M.M. & Motsko, J. 2021. Multimetric approach in the effects of
small impoundments on stream water quality: Case study of Faylor and Walker
Lakes on Middle Creek, Snyder County, PA. Water and Environment Journal 35(3): 1007-1017.
Laporan
Kualiti Alam Sekeliling. 2019. {Enviro Knowledge Center. Technical report,
Department of Environment Malaysia, Nov 2020. https://enviro2.doe.gov.my/ekmc/digital-content/laporan-kualiti-alam-sekeliling-2019/.
Liu,
F., Su, W., Zhao, J. & Liang, X. 2017. On-line detection method for
outliers of dynamic instability measurement data in geological exploration
control process. Sains Malaysiana 46(11):
2205-2213.
Lu, C-T., Chen, D. & Kou, Y. 2003. Algorithms for spatial
outlier detection. In Third IEEE International Conference on Data
Mining. pp. 597-600.
Mainali,
J. & Chang, H. 2021. Environmental and spatial factors affecting surface
water quality in a Himalayan watershed, Central Nepal. Environmental and Sustainability Indicators 9:
100096.
Money, E.S., Sackett, D.K., Aday, D.D. & Serre, M.L. 2011.
Using river distance and existing hydrography data can improve the
geostatistical estimation of fish tissue mercury at unsampled locations. Environmental Science & Technology 45(18): 7746-7753.
Money, E., Carter, G.P. & Serre, M.L. 2009a. Using river
distances in the space/time estimation of dissolved oxygen along two impaired
river networks in New Jersey. Water
Research 43(7): 1948-1958.
Money, E., Carter, G.P. & Serre, M.L. 2009b. Modern space/time
geostatistics using river distances: Data integration of turbidity and E. coli measurements to assess fecal
contamination along the Raritan River in New Jersey. Environmental Science & Technology 43(10): 3736-3742.
Peters,
N.E. & Meybeck, M. 2000. Water quality degradation effects on freshwater
availability: impacts of human activities. Water
International 25(2): 185-193.
Peiman
Asadi, Davison, A.C. & Engelke, S. 2015. Extremes on river networks. The Annals of Applied Statistics 9(4):
2023-2050.
Peter
Chu Su. 2011. Statistical geocomputing: Spatial outlier detection in precision
agriculture. Master’s thesis. University of Waterloo (Unpublished).
Peterson, E.E. & Urquhart, N.S. 2006. Predicting water quality
impaired stream segments using landscape-scale data and a regional
geostatistical model: A case study in Maryland. Environmental Monitoring and Assessment 121(1-3): 615-638.
Peterson, E.E., Merton, A.A., Theobald, D.M. & Urquhart, N.S.
2006. Patterns of spatial autocorrelation in stream water chemistry. Environmental Monitoring and Assessment 121(1-3): 571-596.
Rouquette, J.R., Dallimer, M., Armsworth, P.R., Gaston, K.J.,
Maltby, L. & Warren, P.H. 2013. Species turnover and geographic distance in
an urban river network. Diversity and
Distributions 19(11): 1429-1439.
Rousseeuw,
P.J. & Van Driessen, K. 1999. A fast algorithm for the minimum covariance
determinant estimator. Technometrics 41(3): 212-223.
Sajesh,
T.A. & Srinivasan, M.R. 2013. An overview of multiple outliers in
multi-dimensional data. Sri Lankan
Journal of Applied Statistics 14(2): 87-120.
Shekhar,
S., Lu, C-T. & Zhang, P. 2003. A unified approach to detecting spatial
outliers. GeoInformatica 7(2):
139-166.
Talagala,
P.D., Hyndman, R.J., Leigh, C., Mengersen, K. & Smith‐Miles, K. 2019.
A feature‐based procedure for detecting technical outliers in
water‐quality data from in situ sensors. Water Resources Research 55(11):
8547-8568.
Tortorelli,
R.L. & Pickup, B.E. 2006. Phosphorus concentrations, loads, and yields in
the Illinois river basin, Arkansas and Oklahoma. 2000-2004. Technical report.
Ver Hoef, J.M. & Peterson, E.E. 2010. A moving average
approach for spatial statistical models of stream networks. Journal of the American Statistical
Association 105(489): 6-18.
Ver Hoef, J.M.,
Peterson, E., Clifford, D. & Shah, R. 2014. SSN: An R package for spatial
statistical modeling on stream networks. Journal
of Statistical Software 56(3): 1-45.
Ver Hoef, J.M., Peterson, E. & Theobald, D. 2006. Spatial
statistical models that use flow and stream distance. Environmental and Ecological Statistics 13(4): 449-464.
Wang,
S. & Serfling, R. 2018. On masking and swamping robustness of leading
nonparametric outlier identifiers for multivariate data. Journal of Multivariate Analysis 166: 32-49.
Yang,
M., Chen, Z., Zhou, M., Liang, X. & Bai, Z. 2021. The impact of COVID-19 on
crime: A spatial temporal analysis in Chicago. ISPRS International Journal of Geo-Information 10(3): 152.
Zheng, G., Brantley, S.L., Lauvaux,
T. & Li, Z. 2017. Contextual spatial outlier detection with metric
learning. In Proceedings of the 23rd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
pp. 2161-2170.
*Pengarang untuk surat-menyurat; email: rossita@um.edu.my
|