Fraud Detection in Credit Card Transactions Using HDBSCAN, UMAP and SMOTE Methods
Abstract
Credit card abuse and fraud in credit card transactions pose a serious threat to financial companies and consumers. To overcome this problem, accurate and effective fraud detection is essential. In this study, we propose an approach that combines HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise), UMAP (Uniform Manifold Approximation and Projection), and SMOTE (Synthetic Minority Over-sampling Technique) methods to detect fraud in credit card transactions. The HDBSCAN method is used to group transactions based on their spatial density, allowing identification of suspicious groups of transactions. UMAP is used to reduce the dimension of transaction data, thus enabling better visualization and more efficient data analysis. In addition, we use SMOTE to overcome class imbalances, namely differences in the number of fraudulent and non-fraudulent transactions. In our experiments, we used. In this experiment, we used a dataset of credit card transactions that included both fraudulent and non-fraudulent transactions. The experimental results show that the proposed approach is able to detect fraud with high accuracy. The HDBSCAN method is able to effectively identify suspicious groups of transactions, while UMAP helps in better understanding and visualization of data. The use of SMOTE has successfully overcome class imbalances, resulting in more balanced fraud detection results between fraud and non-fraud. The results of this study show that the combination of HDBSCAN, UMAP, and SMOTE methods is effective in detecting fraud in credit card transactions. This approach can help financial companies identify suspicious transactions with high accuracy, reduce fraud losses, and improve the security of credit card transactions.
Downloads
References
[2] X. H. Y. X. W. &. W. Q. Zhang, “HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture,” Information Sciences, vol. 557, pp. 302-316, 2021.
[3] P. C. &. G. S. T. Cynthia, “An outlier detection approach on credit card fraud detection using machine learning: a comparative analysis on supervised and unsupervised learning,” Intelligence in Big Data …, Springer, https://doi.org/10.1, 2021.
[4] B. M. Y. O. B. &. M. Q. Itri, “Composition of Feature Selection Methods And Oversampling Techniques For Banking Fraud Detection With Artifical Intelligence.,” International Journal of Engineering, vol. 11, pp. 216-226, 2021.
[5] “Federal Trade Commission. (2021). Consumer Sentinel Network Data Book 2020. Retrieved from https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-2020/consumer_sentinel_network_data_book_2020.pdf”.
[6] G. Stewart dan M. Al-Khassaweneh, “An implementation of the HDBSCAN* clustering algorithm.,” Applied Sciences, 12(5), 2405., 2022.
[7] M. M. Breunig, H. P. Kriegel, R. T. Ng dan J. Sander, “LOF: identifying density-based local outliers,” dalam In Proceedings of the 2000 ACM SIGMOD international conference on Management of data (pp. 93-104)., 2000, May.
[8] C. GEMINTANG, “AUTOMATIC CREDIT CARD FRAUD DETECTION SYSTEM USING DBSCAN OUTLIER DETECTION,” etd.repository.ugm.ac.id, http://etd.repository.ugm.ac.id/penelitian/detail/176098, 2019.
[9] F. L. B. Y. A. C. O. K. Y. O. F. &. B. G. Carcillo, “Combining unsupervised and supervised learning in credit card fraud detection.,” Information sciences, vol. 557, pp. 317-331, 2021.
[10] O. Vlasovets, “Unsupervised anomaly detection in merchant vessel data.,” 2020.
[11] L. McInnes, J. Healy dan J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction.,” arXiv preprint arXiv:1802.03426., 2018.
[12] L. Weijler, F. Kowarsch, M. Wödlinger, M. Reiter, M. Maurer-Granofszky, A. Schumich dan M. N. Dworzak, “UMAP based anomaly detection for minimal residual disease quantification within acute myeloid leukemia.,” Cancers, 14(4), 898., 2022.
[13] K. M. I. M. S. S. F. M. A. M. J. P. M. H. S. M. I. H. A. S. &. R. O. Hasib, “A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem.,” Journal of Computer Science, , pp. 16(11), 1546-1, 2020.
[14] P. Fergus, D. Huang dan Hamdan, “Prediction of intrapartum hypoxia from cardiotocography data using machine learning,” Applied Computing in Medicine and Health—Emerging Topics in Computer Science and Applied Computing, pp. Volume 1, pp. 125–146, 2016.
[15] N. Chawla, K. Bowyer, L. Hall dan W. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell., pp. 16, 321–357, 2002.
[16] R. J. Campello, M. D. GB dan S. J., “Density-based clustering based on hierarchical density estimates.,” Lecture Notes in Computer Science, 7819, 160-172., 2013.
[17] S. M. N. B. M. K. A. &. M. A. Mishra, “An Evaluative Measure of Clustering Methods Incorporating Hyperparameter Sensitivity,” Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 7, pp. 7788, (2022, June).
[18] S. C. &. Z. S. Tana, “Binary search of the optimal cut-point value in ROC analysis using the F1 score.,” 2019.
[19] F. D. F. S. A. &. M. G. A. Kakhki, “Evaluating machine learning performance in predicting injury severity in agribusiness industries.,” Safety science, 117, 257-262., 2019.
[20] K. M. &. L. J. Rashid, “Times-series data augmentation and deep learning for construction equipment activity recognition.,” Advanced Engineering Informatics, 42, 100944., 2019.
Copyright (c) 2023 International Journal of Science, Technology & Management
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.