Sains Malaysiana 50(7)(2021): 2059-2077
http://doi.org/10.17576/jsm-2021-5007-20
Digital Economy Tax Compliance
Model in Malaysia using Machine Learning Approach
(Model Pematuhan Cukai Ekonomi Digital di
Malaysia menggunakan Pendekatan Pembelajaran Mesin)
RAJA
AZHAN SYAH RAJA WAHAB1* & AZURALIZA ABU BAKAR2
1Sub Section of Strategic Planning, Strategic
Management and Information ICT, Department of Information Technology, Inland
Revenue Board of Malaysia, 63000 Cyberjaya, Selangor Darul Ehsan, Malaysia
2Center for Artificial Intelligence
Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi,
Selangor Darul Ehsan, Malaysia
Received:
10 June 2020/Accepted: 19 November 2020
ABSTRACT
The field of digital economy income
tax compliance is still in its infancy. The limited collection of government
income taxes has forced the Inland Revenue Board of Malaysia (IRBM) to develop
a solution to improve the tax compliance of the digital economy sector so that
its taxpayers may report voluntary income or take firm action. The ability to
diagnose the taxpayer's compliance will ensure the IRBM effectively collects
the income tax and gives revenues to the country. However, it gives challenges
in extracting necessary knowledge from a large amount of data, leading to the
need for a predictive model to detect the taxpayers' compliance level. This
paper proposes the descriptive and predictive analytics models for predicting
the digital economic income tax compliance in Malaysia. We conduct descriptive
analytics to explore and extract a summary of data for initial understanding.
Through a brief description of the descriptive model, the data distribution in
a histogram shows that the information extracted can give a clear picture in
influencing the results to classify digital economic tax compliance. In
predictive modeling, single and ensemble approaches are employed to find the
best model and important factors contributing to the incompliance of tax
payment among the digital economic retailers. Based on the validation of
training data with the presence of seven single classifier algorithms, three
performance improvements have been established through ensemble classification,
namely wrapper, boosting, and voting methods, and two techniques involving grid
search and evolution parameters. The experimental results show that the
ensemble method can improve the single classification model's accuracy with the
highest classification accuracy of 87.94% compared to the best single classification
model. The knowledge analysis phase learns meaningful features and hidden
knowledge that could classify the contexts of taxpayers that could potentially
influence the degree of tax compliance in the digital economy are categorized.
Overall, this collection of information has the potential to help stakeholders
make future decisions on the tax compliance of the digital economy.
Keywords:
Accuracy; compliance; ensemble; parameter tuning; single classification;
taxpayer
ABSTRAK
Bidang pematuhan cukai pendapatan ekonomi digital masih di peringkat awal. Pengumpulan cukai pendapatan kerajaan yang terhad telah memaksa Lembaga Hasil Dalam Negeri Malaysia (LHDNM) untuk mengembangkan penyelesaian untuk meningkatkan kepatuhan cukai sektor ekonomi digital sehingga pembayar cukai dapat melaporkan pendapatan secara sukarela atau tindakan tegas dapat diambil. Keupayaan untuk mendiagnosis kepatuhan pembayar cukai akan memastikan LHDNM memungut cukai pendapatan dengan berkesan dan memberi pendapatan kepada negara. Namun, ini memberikan cabaran dalam mengekstrak pengetahuan yang diperlukan dari sejumlah besar data, yang menyebabkan perlunya model ramalan untuk mengesan tahap kepatuhan pembayar cukai. Makalah ini mencadangkan model analisis deskriptif dan ramalan untuk meramalkan pematuhan cukai pendapatan ekonomi digital di Malaysia. Analisis deskriptif dijalankan untuk meneroka dan mengekstrak ringkasan data untuk pemahaman awal. Melalui penerangan ringkas model deskriptif, taburan data histogram menunjukkan bahawa maklumat yang diekstrak dapat memberikan gambaran yang jelas dalam mempengaruhi hasil untuk mengelaskan pematuhan cukai ekonomi digital. Dalam pemodelan ramalan, pendekatan tunggal dan bergabung digunakan untuk mencari model terbaik dan faktor penting yang menyumbang kepada ketidakpatuhan pembayaran cukai di kalangan peruncit ekonomi digital. Berdasarkan pengesahan data latihan dengan adanya tujuh algoritma pengelasan tunggal, tiga peningkatan prestasi telah dibuat melalui pengelasan bergabung, iaitu kaedah pembalut, pemeringkatan dan undian,
dan dua teknik yang melibatkan parameter pencarian dan evolusi grid. Hasil uji kaji menunjukkan bahawa kaedah bergabung dapat meningkatkan ketepatan model pengelasan tunggal dengan ketepatan tertinggi iaitu 87.94% berbanding dengan model pengelasan tunggal terbaik. Fasa analisis pengetahuan mempelajari ciri-ciri yang bermakna dan pengetahuan tersembunyi yang dapat mengelaskan konteks pembayar cukai yang berpotensi mempengaruhi tahap pematuhan cukai dalam ekonomi digital dikategorikan. Secara keseluruhan, pengumpulan maklumat ini berpotensi untuk membantu pihak berkepentingan membuat keputusan pada masa depan mengenai pematuhan cukai ekonomi digital.
Kata kunci: Ketepatan; model bergabung; pematuhan; pembayar cukai; pengelasan tunggal
REFERENCES
Adejo, O. & Connolly, T. 2017. An integrated system framework for predicting students’ academic performance in higher educational. International Journal of Computer Science & Information Technology (IJCSIT) 9(3): 149-157. doi:10.5121/ijcsit.2017.93013.
Ali,
K., Pazzani, M. & Science, C. 1995. HYDRA-MM:
Learning multiple descriptions to improve classification accuracy. International
Journal on Artificial Intelligence Tools 4: 1-22.
Breiman, L.E.O. 1996. Bagging
Predictors. Boston: Academic Publishers. pp. 123-140.
Castellón González, P.
& Velásquez, J.D. 2013. Characterization and
detection of taxpayers with false invoices using data mining techniques. Expert
Systems with Applications 40(5): 1427-1436.
Cleary,
D. 2011. Predictive analytics in the public sector: Using data mining to assist
better target selection for audit. Proceeding of the 11th European Conference on EGovernment: ECEG. pp. 132-140.
Dhrubajyoti, D. 2017. Machine
learning. European Journal of Multidisciplinary Studies 2(7): 255-258.
Freund,
Y. & Schapires, R.E. 1997. A decision-theoretic
generalization of on-line learning and an application to boosting. AT&T
Labs 139: 119-139.
Hamsagayathri, P.
& Sampath, P. 2017. Decision tree classifiers for classification of breast
cancer. International Journal of Current Pharmaceutical Research 9(2):
31-35.
Han,
B.J. & Kamber, M. 2002. Data Mining: Concepts
and Techniques. Beijing Machinery Industry Press 84: 92-99.
Jupri, M. & Sarno, R. 2018. Taxpayer compliance classification using C4.5, SVM, KNN, Naive Bayes and MLP. International Conference on Information and Communications Technology (ICOIACT). pp. 297-303.
Lakshmi,
R.D. & Radha, N. 2011. Machine learning approach for taxation analysis
using classification techniques. International Journal of Computer
Applications 12(10): 1-6.
LHDNM.
2018. Risalah Ekonomi Digital LHDNM.
Lin,
C. & Lin, I. 2012. The application of decision tree and artificial neural
network to income tax audit: The examples of profit- seeking enterprise income
tax and individual income tax in Taiwan. Journal of the Chinese Institute of
Engineers 35: 37-41.
Loo,
E.C., Evans, C. & McKerchar, M.A. 2012.
Challenges in understanding compliance behaviour of
taxpayers in Malaysia. Asian Journal of Business and Accounting 3(2):
145-162.
Mithal, V., Nayak, G., Khandelwal, A. & Kumar, V. 2017. RAPT: Rare Class Prediction in Absence of True Labels. IEEE Transactions on Knowledge and Data Engineering 4347(c): 1-14. doi:10.1109/TKDE.2017.2739739.
Mohd Rizal, P., Mohd Rusyidi, M.A. & Wan Fadillah, B.W.A. 2013. The perception of tax payers on tax
knowledge and tax education with level of tax compliance: A study the
influences of religiosity. ASEAN Journal of Economics, Management and
Accounting 1(1): 118-129.
Nellen, B. 2015.
Taxation and today’s digital economy. Journal of Tax Practice &
Procedure 17: 17.
Pham, B.T., Bui, D.T., Prakash, I. & Dholakia, M.B. 2016. Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan Area (India) using GIS. Catena 149(Part 1): 52-63. doi:10.1016/j.catena.2016.09.007
Tretter, M.J. 2003. Data
Mining. Encyclopedia of information systems. Executive report.
*Corresponding
author; email: rajazhan@hasil.gov.my
|