Sains Malaysiana 50(7)(2021): 2059-2077

http://doi.org/10.17576/jsm-2021-5007-20

 

Digital Economy Tax Compliance Model in Malaysia using Machine Learning Approach

(Model Pematuhan Cukai Ekonomi Digital di Malaysia menggunakan Pendekatan Pembelajaran Mesin)

 

RAJA AZHAN SYAH RAJA WAHAB1* & AZURALIZA ABU BAKAR2

 

1Sub Section of Strategic Planning, Strategic Management and Information ICT, Department of Information Technology, Inland Revenue Board of Malaysia, 63000 Cyberjaya, Selangor Darul Ehsan, Malaysia

 

2Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor Darul Ehsan, Malaysia

 

Received: 10 June 2020/Accepted: 19 November 2020

 

ABSTRACT

The field of digital economy income tax compliance is still in its infancy. The limited collection of government income taxes has forced the Inland Revenue Board of Malaysia (IRBM) to develop a solution to improve the tax compliance of the digital economy sector so that its taxpayers may report voluntary income or take firm action. The ability to diagnose the taxpayer's compliance will ensure the IRBM effectively collects the income tax and gives revenues to the country. However, it gives challenges in extracting necessary knowledge from a large amount of data, leading to the need for a predictive model to detect the taxpayers' compliance level. This paper proposes the descriptive and predictive analytics models for predicting the digital economic income tax compliance in Malaysia. We conduct descriptive analytics to explore and extract a summary of data for initial understanding. Through a brief description of the descriptive model, the data distribution in a histogram shows that the information extracted can give a clear picture in influencing the results to classify digital economic tax compliance. In predictive modeling, single and ensemble approaches are employed to find the best model and important factors contributing to the incompliance of tax payment among the digital economic retailers. Based on the validation of training data with the presence of seven single classifier algorithms, three performance improvements have been established through ensemble classification, namely wrapper, boosting, and voting methods, and two techniques involving grid search and evolution parameters. The experimental results show that the ensemble method can improve the single classification model's accuracy with the highest classification accuracy of 87.94% compared to the best single classification model. The knowledge analysis phase learns meaningful features and hidden knowledge that could classify the contexts of taxpayers that could potentially influence the degree of tax compliance in the digital economy are categorized. Overall, this collection of information has the potential to help stakeholders make future decisions on the tax compliance of the digital economy.

 

Keywords: Accuracy; compliance; ensemble; parameter tuning; single classification; taxpayer

 

ABSTRAK

Bidang pematuhan cukai pendapatan ekonomi digital masih di peringkat awal. Pengumpulan cukai pendapatan kerajaan yang terhad telah memaksa Lembaga Hasil Dalam Negeri Malaysia (LHDNM) untuk mengembangkan penyelesaian untuk meningkatkan kepatuhan cukai sektor ekonomi digital sehingga pembayar cukai dapat melaporkan pendapatan secara sukarela atau tindakan tegas dapat diambil. Keupayaan untuk mendiagnosis kepatuhan pembayar cukai akan memastikan LHDNM memungut cukai pendapatan dengan berkesan dan memberi pendapatan kepada negara. Namun, ini memberikan cabaran dalam mengekstrak pengetahuan yang diperlukan dari sejumlah besar data, yang menyebabkan perlunya model ramalan untuk mengesan tahap kepatuhan pembayar cukai. Makalah ini mencadangkan model analisis deskriptif dan ramalan untuk meramalkan pematuhan cukai pendapatan ekonomi digital di Malaysia. Analisis deskriptif dijalankan untuk meneroka dan mengekstrak ringkasan data untuk pemahaman awal. Melalui penerangan ringkas model deskriptif, taburan data histogram menunjukkan bahawa maklumat yang diekstrak dapat memberikan gambaran yang jelas dalam mempengaruhi hasil untuk mengelaskan pematuhan cukai ekonomi digital. Dalam pemodelan ramalan, pendekatan tunggal dan bergabung digunakan untuk mencari model terbaik dan faktor penting yang menyumbang kepada ketidakpatuhan pembayaran cukai di kalangan peruncit ekonomi digital. Berdasarkan pengesahan data latihan dengan adanya tujuh algoritma pengelasan tunggal, tiga peningkatan prestasi telah dibuat melalui pengelasan bergabung, iaitu kaedah pembalut, pemeringkatan dan undian, dan dua teknik yang melibatkan parameter pencarian dan evolusi grid.  Hasil uji kaji menunjukkan bahawa kaedah bergabung dapat meningkatkan ketepatan model pengelasan tunggal dengan ketepatan tertinggi iaitu 87.94% berbanding dengan model pengelasan tunggal terbaik. Fasa analisis pengetahuan mempelajari ciri-ciri yang bermakna dan pengetahuan tersembunyi yang dapat mengelaskan konteks pembayar cukai yang berpotensi mempengaruhi tahap pematuhan cukai dalam ekonomi digital dikategorikan. Secara keseluruhan, pengumpulan maklumat ini berpotensi untuk membantu pihak berkepentingan membuat keputusan pada masa depan mengenai pematuhan cukai ekonomi digital.

 

Kata kunci: Ketepatan; model bergabung; pematuhan; pembayar cukai; pengelasan tunggal

 

REFERENCES

 

Adejo, O. & Connolly, T. 2017. An integrated system framework for predicting students’ academic performance in higher educational. International Journal of Computer Science & Information Technology (IJCSIT) 9(3): 149-157. doi:10.5121/ijcsit.2017.93013.

Ali, K., Pazzani, M. & Science, C. 1995. HYDRA-MM: Learning multiple descriptions to improve classification accuracy. International Journal on Artificial Intelligence Tools 4: 1-22.

Breiman, L.E.O. 1996. Bagging Predictors. Boston: Academic Publishers. pp. 123-140.

Castellón González, P. & Velásquez, J.D. 2013. Characterization and detection of taxpayers with false invoices using data mining techniques. Expert Systems with Applications 40(5): 1427-1436.

Cleary, D. 2011. Predictive analytics in the public sector: Using data mining to assist better target selection for audit. Proceeding of the 11th European Conference on EGovernment: ECEG. pp. 132-140.

Dhrubajyoti, D. 2017. Machine learning. European Journal of Multidisciplinary Studies 2(7): 255-258.

Freund, Y. & Schapires, R.E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. AT&T Labs 139: 119-139.

Hamsagayathri, P. & Sampath, P. 2017. Decision tree classifiers for classification of breast cancer. International Journal of Current Pharmaceutical Research 9(2): 31-35.

Han, B.J. & Kamber, M. 2002. Data Mining: Concepts and Techniques. Beijing Machinery Industry Press 84: 92-99.

Jupri, M. & Sarno, R. 2018. Taxpayer compliance classification using C4.5, SVM, KNN, Naive Bayes and MLP. International Conference on Information and Communications Technology (ICOIACT). pp. 297-303.

Lakshmi, R.D. & Radha, N. 2011. Machine learning approach for taxation analysis using classification techniques. International Journal of Computer Applications 12(10): 1-6.

LHDNM. 2018. Risalah Ekonomi Digital LHDNM.

Lin, C. & Lin, I. 2012. The application of decision tree and artificial neural network to income tax audit: The examples of profit- seeking enterprise income tax and individual income tax in Taiwan. Journal of the Chinese Institute of Engineers 35: 37-41.

Loo, E.C., Evans, C. & McKerchar, M.A. 2012. Challenges in understanding compliance behaviour of taxpayers in Malaysia. Asian Journal of Business and Accounting 3(2): 145-162.

Mithal, V., Nayak, G., Khandelwal, A. & Kumar, V. 2017. RAPT: Rare Class Prediction in Absence of True Labels. IEEE Transactions on Knowledge and Data Engineering 4347(c): 1-14. doi:10.1109/TKDE.2017.2739739.

Mohd Rizal, P., Mohd Rusyidi, M.A. & Wan Fadillah, B.W.A. 2013. The perception of tax payers on tax knowledge and tax education with level of tax compliance: A study the influences of religiosity. ASEAN Journal of Economics, Management and Accounting 1(1): 118-129.

Nellen, B. 2015. Taxation and today’s digital economy. Journal of Tax Practice & Procedure 17: 17.

Pham, B.T., Bui, D.T., Prakash, I. & Dholakia, M.B. 2016. Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan Area (India) using GIS. Catena 149(Part 1): 52-63. doi:10.1016/j.catena.2016.09.007

Tretter, M.J. 2003. Data Mining. Encyclopedia of information systems. Executive report. 

 

*Corresponding author; email: rajazhan@hasil.gov.my

 

 

 

previous