Implementation of Naïve Bayes Algorithm in Sentiment Analysis of Twitter Social Media Users Regarding Their Interest to Pay the Tax

  • Bagas Wahyu Andrian Universitas Multimedia Nusantara
  • Fenina Adline Twince Tobing Universitas Multimedia Nusantara
  • Ivransa Zuhdi Pane Universitas Multimedia Nusantara
  • Adhi Kusnadi Universitas Multimedia Nusantara
Keywords: Naïve Bayes, Sentiment Analysis, Tax Payment, Twitter

Abstract

Since 2008, tax revenue has failed to reach the target set in the State Budget each year. Until 2021, tax revenue managed to reach the target that had been targeted in the 2021 state budget. In the midst of improving tax revenue, towards the end of February 2023, a case involving the son of a Directorate General of Taxes (DGT) that made the father called by the Corruption Eradication Commission (CEC) to be asked for an explanation of his assets. After the case, there were many calls in the community to stop paying taxes, which was assessed by Tauhid Ahmad as Executive Director of Indef as a form of decreased trust in tax collecting institutions. This can affect the amount of revenue from taxes because trust in the government is one of the factors that tend to affect public compliance in paying taxes. Which can affect the amount of revenue from taxes because trust in the government is one of the factors that tend to affect public compliance in paying taxes. One of the crowded calls is the pros and cons of the tax boycott movement on Twitter. With the pros and cWith the pros and cons of the movement that can affect tax revenues on Twitter social media, an assessment based on sentiment analysis is needed which is divided into positive, neutral, or negative categories. Sentiment analysis in this research is carried out using three variations of Naïve Bayes assisted by the TF-IDF word weighting model, namely Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes. Then Confussion Matrix is used to evaluate the model by obtaining the accuracy, precission, recall, and f1-score values and the use of Synthetic Minority Oversampling Technique (SMOTE) to handle unbalanced data. The results of this study on unbalanced data, the implementation of Bernoulli Naïve Bayes using the SMOTE technique on a dataset comparison of 80:20 resulted in better performance than the variations of Gaussian and Multinomial Naïve Bayes with accuracy results of 91.03%, precision, 71.11%, recall 71.43%, and f1-score of 71.18%.

Downloads

Download data is not yet available.

References

[1] “Minister of Finance: Exceptional state revenue performance for two consecutive years” Kemenkeu. https://www.kemenkeu.go.id/informasipublik/publikasi/berita-utama/Kinerja-Penerimaan-Negara-Luar-Biasa (accessed Oct. 30, 2023)
[2] L.J. Sembiring-Kembaren. “Finally! After 12 years of waiting, tax revenues have reached the target”, CNCBIndonesia. https://www.cnbcindonesia.com/news/20220103163543-4-304218/ akhirnya-menanti-12-tahun-setoran-pajak-capai-target-juga (accessed Oct. 30, 2023)
[3] A. Djajanti. “Developing The Voluntary Taxpayer Compliance: The Scale of The Tax Authority’s Power, Trust and The Fairness of The Tax System“ Indonesian Journal of Business and Entrepreneurship (IJBE), Vol. 6, No. 1, pp. 86–86, Jan. 2020, DOI: http://dx.doi.org/10.17358/ IJBE.6.1.86.
[4] Y. Farouk. “Chronology of Mario Dandy, Son of Tax Official, Assaulting David Due to Romantic Issues“. Suara.com. https: //www.suara.com/entertainment/2023/02/22/164009/kronologi-mariodandy-anak-pejabat-pajak-aniaya-david-gara-gara-persoalan-asmara. (accessed Oct. 30, 2023)
[5] A.Rachman. “Chronology of the RAT Case, from Wealthy Civil Servant to Incarceration by the Corruption Eradication Commission (CEC)“ CNCBIndonesia. https://www.cnbcindonesia.com/news/ 20230404080107-4-427072/kronologi-kasus-rat-dari-pns-berhartajumbo-hingga-dibui-kpk (accessed Oct. 30, 2023)
[6] R.K.B. Pardede. “The Rafael Case Could Result in a Decrease in Tax Compliance“ Kompas. https://www.kompas.id/baca/ekonomi/ 2023/03/02/kasus-rafael-dapat-berimbas-kepada-penurunan-kepatuhanmasyarakat-bayar-pajak?status=sukses login&%3Bstatus login= login (accessed Oct. 30, 2023)
[7] Kiwi. “Danger of the Stop Paying Taxes Hashtag, Citizens Upset about the Hedonism of Government Officials“. SuaraPemred. https://www.suarapemredkalbar.com/read/ponticity/14032023/ bahaya-tagar-stop-bayar-pajak-warga-kesal-hedonisme-pejabat-negara (accessed Oct. 30, 2023).
[8] M. Ashraf et al. “Real-Time Extraction and Annotation of Social Media Contents for Predicting National Consumer Confidence Index“. Journal of Policy Research, Vol. 8, No. 4, pp. 292-309, Dec. 2021, DOI: https: //doi.org/10.5281/zenodo.7635142
[9] A.S. Neogi. “Sentiment analysis and classification of Indian farmers’ protest using twitter data“. International Journal of Information Management Data Insights, Vol. 1, No. 4, pp. 100019, Nov. 2021, doi: https://doi.org/10.1016/j.jjimei.2021.100019
[10] R.Nainggolan, F.A.T. Tobing, and E.J.G.Harianja. “Analysis Sentiment in Bukalapak Comments with K-Means Clustering Method“. IJNMT : INTERNATIONAL JOURNAL OF NEW MEDIA TECHNOLOGY) , Vol. 9, No. 2, pp. 87-92, Dec. 2022, doi: https://doi.org/10.31937/ ijnmt.v9i2.2914.
[11] M. Wankhade, A.C.S.Rao, C.Kulkarni. “A survey on sentiment analysis methods, applications, and challenges“. Artifcial Intelligence Review, Vol. 55, No. 7, pp. 5731-5780, Feb. 2022, doi: https://doi.org/10.1007/ s10462-022-10144-1.
[12] Riyanto and A. Azis. “Application of the Vector Machine Support Method in Twitter Social Media Sentiment Analysis Regarding the Covid-19 Vaccine Issue in Indonesia“. Journal of Applied Data Sciences, Vol. 2, No. 3, pp. 102-108, Sep. 2021, doi: https://doi.org/10.47738/ jads.v2i3.40.
[13] N.L. Lavenia and R. Permatasari. “Sentiment Analysis on Twitter Social Media Regarding Depression Disorder Using the Naive Bayes Method“. CoreID Journal, Vol. 1, No. 2, pp. 66-74, Jul. 2023, doi: https://doi.org/ 10.60005/coreid.v1i2.14.
[14] M. Wongkar and A. Angdresey, ”Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter,” 2019 Fourth International Conference on Informatics and Computing (ICIC), Semarang, Indonesia, 2019, pp. 1-5, doi: https://doi.org/10.1109/ICIC47613.2019.8985884.
[15] V.O. Tama, Y.Sibaroni, Adiwijaya. “Labeling Analysis in the Classification of Product Review Sentiments by using Multinomial Naive Bayes Algorithm“. Journal of Physics: Conference Series, Vol. 1192, No. 1, 2019, DOI: http://doi.org/10.1088/1742-6596/1192/1/012036.
[16] M. Pota, M. Ventura, H. Fujita, and M. Esposito, “Multilingual evaluation of pre-processing for bert-based sentiment analysis of tweets,” Expert Systems with Applications, Vol. 181, pp. 115119, Nov. 2021, DOI: https://doi.org/10.1016/j.eswa.2021.115119. [17] V.R. Sastri, Applications, Modern Aspects of Rare Earths and Their Complexes (Editors: V.R. Sastri, J.C. Bünzli, V. Ramachandra Rao, G.V.S. Rayudu, J.R. Perumareddi), First edition, Elsevier, 2003, pp. 893-981.
[17] M. Adnan, R. Sarno and K. R. Sungkono, ”Sentiment Analysis of Restaurant Review with Classification Approach in the Decision Tree-J48 Algorithm,” 2019 International Seminar on Application for Technology of Information and Communication (iSemantic), Semarang, Indonesia, 2019, pp. 121-126, doi: https://doi.org/10.1109/ ISEMANTIC.2019.8884282.
[18] M.R. Kurniawanda and F.A.T. Tobing. “Analysis Sentiment Cyberbullying in Instagram Comments with XGBoost Method“. IJNMT (INTERNATIONAL JOURNAL OF NEW MEDIA TECHNOLOGY), Vol. 9, No. 1, pp. 28-34, June. 2022, DOI: https://doi.org/10.31937/ijnmt.v9i1.2670.
[19] R. Rahmanda and E.B. Setiawan. “Word2Vec on Sentiment Analysis with Synthetic Minority Oversampling Technique and Boosting Algorithm“. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), Vol. 6, No. 4, pp. 599-605, Aug. 2022, DOI: https://doi.org/10.29207/resti.v6i4.4186.
[20] T. Zhang and S.S.Ge, “An Improved TF-IDF Algorithm Based on Class Discriminative Strength for Text Categorization on Desensitized Data,“ Proceedings of the 2019 3rd international conference on innovation in artificial intelligence, Suzhou, China, 2019, pp. 39-44, doi: https: //doi.org/10.1145/3319921.3319924.
[21] R. Ahuja et al. “The Impact of Features Extraction on the Sentiment Analysis“. Procedia Computer Science, Vol. 152, pp. 341-348, 2019, DOI: https://doi.org/10.1016/j.procs.2019.05.008
[22] A.I. Kadhim, ”Term Weighting for Feature Extraction on Twitter: A Comparison Between BM25 and TF-IDF,” 2019 International Conference on Advanced Science and Engineering (ICOASE), Zakho - Duhok, Iraq, 2019, pp. 124-128, doi: https://doi.org/10.1109/ ICOASE.2019.8723825.
[23] A. Prasetyo, B. D. Septianto, G. F. Shidik and A. Z. Fanani. ”Evaluation of Feature Extraction TF-IDF in Indonesian Hoax News Classification,” 2019 International Seminar on Application for Technology of Informa- tion and Communication (iSemantic), Semarang, Indonesia, 2019, pp. 1-6, doi: https://doi.org/10.1109/ISEMANTIC.2019.8884291.
[24] Aldinata et al. “Sentiments comparison on Twitter about LGBT“. Procedia Computer Science, Vol. 216, pp. 765-773, 2023, doi: https: //doi.org/10.1016/j.procs.2022.12.194.
[25] D. T. Barus, R. Elfarizy, F. Masri and P. H. Gunawan, ”Parallel Programming of Churn Prediction Using Gaussian Na¨ıve Bayes,” 2020 8th International Conference on Information and Communication Technology (ICoICT), Yogyakarta, Indonesia, 2020, pp. 1-4, doi: https: //doi.org/10.1109/ICoICT49345.2020.9166319.
[26] H. Kamel, D. Abdulah and J. M. Al-Tuwaijari, ”Cancer Classification Using Gaussian Naive Bayes Algorithm,” 2019 International Engineering Conference (IEC), Erbil, Iraq, 2019, pp. 165-170, doi: https://doi.org/10.1109/IEC47844.2019.8950650.
[27] V. Z. Kamila, E. Subastian and Rosmasari. ”KNN and Naive Bayes for Optional Advanced Courses Recommendation,” 2019 International Conference on Electrical, Electronics and Information Engineering (ICEEIE), Denpasar, Indonesia, 2019, pp. 306-309, doi: https://doi.org/ 10.1109/ICEEIE47180.2019.8981450
[28] G. Singh, B. Kumar, L. Gaur and A. Tyagi. ”Comparison between Multinomial and Bernoulli Na¨ıve Bayes for Text Classification,” 2019 International Conference on Automation, Computational and Technology Management (ICACTM), London, UK, 2019, pp. 593-596, doi: https://doi.org/10.1109/ICACTM.2019.8776800.
[29] M. Oljira. “Sentiment analysis of afaan oromo using machine learning approach,”. International Journal of Research Studies in Science, Engineering and Technology, vol. 7, no. 9, 2020, pp. 7–15.
[30] M. B. Ressan and R. F. Hassan, “Naive-bayes family for sentiment analysis during covid-19 pandemic and classification tweets,”. Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 1, Okt. 2022, pp. 375, doi: http://doi.org/10.11591/ijeecs.v28.i1.pp375- 383.
[31] A. Kelly and M.A. Johnson, ”Investigating the Statistical Assumptions of Na¨ıve Bayes Classifiers,” 2021 55th Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 2021, pp. 1-6, doi: http://doi.org/10.1109/CISS50987.2021.9400215.
[32] P. Karthika, R. Murugeswari and R. Manoranjithem, ”Sentiment Analysis of Social Media Network Using Random Forest Algorithm,” 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), Tamilnadu, India, 2019, pp. 1-5, doi: http://doi.org/10.1109/INCOS45849.2019.8951367.
[33] R. A. Laksono, K. R. Sungkono, R. Sarno and C. S. Wahyuni. ”Sentiment Analysis of Restaurant Customer Reviews on TripAdvisor using Na¨ıve Bayes,” 2019 12th International Conference on Information & Communication Technology and System (ICTS), Surabaya, Indonesia, 2019, pp. 49-54, doi: http://doi.org/10.1109/ICTS.2019.8850982.
Published
2023-11-30
How to Cite
Wahyu Andrian, B., Adline Twince Tobing, F., Zuhdi Pane, I., & Kusnadi, A. (2023). Implementation of Naïve Bayes Algorithm in Sentiment Analysis of Twitter Social Media Users Regarding Their Interest to Pay the Tax. International Journal of Science, Technology & Management, 4(6), 1733-1742. https://doi.org/10.46729/ijstm.v4i6.1015

Most read articles by the same author(s)