Customer Transaction Clustering with K-Prototype Algorithm Using Euclidean-Hamming Distance and Elbow Method

Dendy Arizki Kuswardana; Dwi Arman  Prasetya; Trimono Trimono; I Gede Susrama Mas  Diyasa; Wan Suryani Wan  Awang

doi:10.59395/ijadis.v6i2.1381

Authors

Dendy Arizki Kuswardana Department of Data Science, University of Pembangunan Nasional Veteran Jawa Timur, Indonesia
Dwi Arman Prasetya Department Master of Information Technology, University of Pembangunan Nasional Veteran Jawa Timur Indonesia https://orcid.org/0000-0003-0281-9928
Trimono Trimono Department of Data Science, University of Pembangunan Nasional Veteran Jawa Timur, Indonesia
I Gede Susrama Mas Diyasa Department Master of Information Technology, University of Pembangunan Nasional Veteran Jawa Timur Indonesia
Wan Suryani Wan Awang Faculty of Informatics and Computing, University Sultan Zainal Abidin Besut Campus, Malaysia https://orcid.org/0000-0001-7662-431X

DOI:

https://doi.org/10.59395/ijadis.v6i2.1381

Keywords:

K-Prototype, Euclidean, Hamming, Elbow, Clustering, Customer Transaction Clustering, Euclidean-Hamming Distance, Elbow Method

Abstract

This study aims to cluster customer transactions in a Japanese food stall using the K-Prototype Algorithm with a combination of Euclidean-Hamming Distance and the Elbow method. Facing intense industry competition, this study seeks to understand customer purchasing behavior to increase loyalty and sales. From 9.721 initial entries, 9.705 cleaned and transformed records were analyzed. K-Prototype was chosen because of its ability to handle numeric features (Total Sales, Product Quantity) and categorical features (Payment Method, Order Type, Day Category and Time Category). The combination of Euclidean-Hamming distances was used for distance measurement. The optimal number of clusters was determined using the Elbow method, with the results recommending three clusters as the most optimal number. A Silhouette score of 0.6191 indicates a Good Structure clustering result, effectively identifying three distinct customer grouping: "Loyal Regulars" (49.5%), "Casual Shoppers" (42.3%), and "Premium Shoppers" (8.2%). Statistical validity was also tested using ANOVA and Chi-Square, the results showed significant differences between the clusters in numerical and categorical variables with a p-value <0.0001. The clusters are statistically valid in both numerical and categorical aspects. These insights provide an understanding of customer characteristics and reveal a strategically valuable cluster for targeted marketing.

Downloads

Download data is not yet available.

References

[1] Z. Hussain, A. Albattat, F. Z. Fakir, and Z. Yi, Eds., Innovative Trends Shaping Food Marketing and Consumption: in Advances in Marketing, Customer Relationship Management, and E-Services. IGI Global, 2025. doi: 10.4018/979-8-3693-8542-5. DOI: https://doi.org/10.4018/979-8-3693-8542-5

[2] K. M. Hindrayani and J. Timur, “Business Intelligence For Educational Institution : A Literature Review,” vol. 2, no. 1, 2020, doi: https://doi.org/10.33005/ijconsist.v2i1.32.

[3] X. Liu, “The Role of Consumer Behavior in Shaping Market Demand and Economic Trends,” Int. J. Educ. Humanit., vol. 15, no. 2, pp. 10–16, Jul. 2024, doi: 10.54097/skmxzd63. DOI: https://doi.org/10.54097/skmxzd63

[4] S. Ardian and B. Syairudin, “Development strategy of culinary business employing the Blue Ocean Strategy (BOS),” IPTEK J. Proc. Ser., vol. 0, no. 3, p. 153, Apr. 2018, doi: 10.12962/j23546026.y2018i3.3722. DOI: https://doi.org/10.12962/j23546026.y2018i3.3722

[5] E. Amalijah and M. Fredy, “Pemetaan Restoran Jepang dan Kuliner Milenial di Surabaya,” J. Sakura Sastra Bhs. Kebud. Dan Pranata Jpn., vol. 5, no. 1, p. 169, Feb. 2023, doi: 10.24843/JS.2023.v05.i01.p10. DOI: https://doi.org/10.24843/JS.2023.v05.i01.p10

[6] Muh. R. Ramadhan and N. S. Fadjar, “ANALISIS PENGARUH PENDAPATAN, HARGA, PREFERENSI, DAN GAYA HIDUP TERHADAP PERILAKU KONSUMSI MAKANAN JEPANG (STUDI PADA MAHASISWA FEB UB),” J. Dev. Econ. Soc. Stud., vol. 3, no. 3, pp. 780–789, Jul. 2024, doi: 10.21776/jdess.2024.03.3.09. DOI: https://doi.org/10.21776/jdess.2024.03.3.09

[7] A. V. B. S. Dhivya Devi G. Lakshmi, and S. B. Usman Ak Syed Shujauddin Sameer, “Data-Driven Decision-Making: Leveraging Analytics for Performance Improvement,” J. Inform. Educ. Res., vol. 4, no. 3, Aug. 2024, doi: 10.52783/jier.v4i3.1298. DOI: https://doi.org/10.52783/jier.v4i3.1298

[8] A. S. Girsang, “Clustering Hostels Data for Customer Preferences using K-Prototype Algorithm,” Int. J. Emerg. Trends Eng. Res., vol. 8, no. 6, pp. 2650–2653, Jun. 2020, doi: 10.30534/ijeter/2020/70862020. DOI: https://doi.org/10.30534/ijeter/2020/70862020

[9] M. Idhom, A. M. Priananda, A. Raynaldi, R. Nur, S. A. Pamungkas, and A. C. Wardana, “UPAYA REBRANDING SEBAGAI BENTUK KEPEDULIAN TERHADAP UMKM,” vol. 2, no. 4, doi: https://doi.org/10.56855/jcos.v2i4.1112. DOI: https://doi.org/10.56855/jcos.v2i4.1112

[10] R. Suwanda, Z. Syahputra, and E. M. Zamzami, “Analysis of Euclidean Distance and Manhattan Distance in the K-Means Algorithm for Variations Number of Centroid K,” J. Phys. Conf. Ser., vol. 1566, no. 1, p. 012058, Jun. 2020, doi: 10.1088/1742-6596/1566/1/012058. DOI: https://doi.org/10.1088/1742-6596/1566/1/012058

[11] J. Tayyebi and A. Deaconu, “Inverse Generalized Maximum Flow Problems,” Mathematics, vol. 7, no. 10, p. 899, Sep. 2019, doi: 10.3390/math7100899. DOI: https://doi.org/10.3390/math7100899

[12] E. M. Sipayung, C. Fiarni, and R. Tanudjaya, “DECISION SUPPORT SYSTEM FOR POTENTIAL SALES AREA OF PRODUCT MARKETING USING CLASSIFICATION AND CLUSTERING METHODS,” Proceeding 8 Th Int. Semin. Ind. Eng. Manag., pp. 33–39, 2015.

[13] E. Muningsih and S. Kiswati, “SISTEM APLIKASI BERBASIS OPTIMASI METODE ELBOW UNTUK PENENTUAN CLUSTERING PELANGGAN,” Joutica, vol. 3, no. 1, p. 117, Apr. 2018, doi: 10.30736/jti.v3i1.196. DOI: https://doi.org/10.30736/jti.v3i1.196

[14] F. Indriyani and E. Irfiani, “Clustering Data Penjualan pada Toko Perlengkapan Outdoor Menggunakan Metode K-Means,” JUITA J. Inform., vol. 7, no. 2, p. 109, Nov. 2019, doi: 10.30595/juita.v7i2.5529. DOI: https://doi.org/10.30595/juita.v7i2.5529

[15] B. I. Nugroho, A. Rafhina, P. S. Ananda, and G. Gunawan, “Customer segmentation in sales transaction data using k-means clustering algorithm,” J. Intell. Decis. Support Syst. IDSS, vol. 7, no. 2, pp. 130–136, Jun. 2024, doi: 10.35335/idss.v7i2.236. DOI: https://doi.org/10.35335/idss.v7i2.236

[16] Z. Huang, “CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES,” Proc. First Pac. Asia Knowl. Discov. Data Min. Conf. Singap. World Sci., pp. 21–34, 1997.

[17] S. S. M. Wara, “ANALISIS RESPONS WARGANET TERHADAP DEBAT CALON PRESIDEN 2019 DI TWITTER DENGAN METODE CLUSTERED SUPPORT VECTOR MACHINES,” INSTITUT TEKNOLOGI SEPULUH NOPEMBER, 2019. [Online]. Available: https://repository.its.ac.id/64282/1/06211540000101_Undergraduate_Thesis.pdf

[18] D. A. Prasetya, P. T. Nguyen, R. Faizullin, I. Iswanto, and F. Armay, “Resolving the Shortest Path Problem using the Haversine Algorithm,” J. Crit. Rev., vol. 7, no. 1, 2020, doi: http://10.22159/jcr.07.01.11.

[19] P. A. Riyantoko, T. M. Fahrudin, D. A. Prasetya, T. Trimono, and T. D. Timur, “Analisis Sentimen Sederhana Menggunakan Algoritma LSTM dan BERT untuk Klasifikasi Data Spam dan Non-Spam,” Pros. Semin. Nas. SAINS DATA, vol. 2, no. 1, pp. 103–111, Dec. 2022, doi: 10.33005/senada.v2i1.53. DOI: https://doi.org/10.33005/senada.v2i1.53

[20] H. Hernández, E. Alberdi, A. Goti, and A. Oyarbide-Zubillaga, “Application of the k-Prototype Clustering Approach for the Definition of Geostatistical Estimation Domains,” Mathematics, vol. 11, no. 3, p. 740, Feb. 2023, doi: 10.3390/math11030740. DOI: https://doi.org/10.3390/math11030740

[21] R. Nainggolan, R. Perangin-angin, E. Simarmata, and A. F. Tarigan, “Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) optimized by using the Elbow Method,” J. Phys. Conf. Ser., vol. 1361, no. 1, p. 012015, Nov. 2019, doi: 10.1088/1742-6596/1361/1/012015. DOI: https://doi.org/10.1088/1742-6596/1361/1/012015

[22] S. Renaldi. S, D. A. Prasetya, and A. Muhaimin, “Analisis Klaster Partitioning Around Medoids dengan Gower Distance untuk Rekomendasi Indekos (Studi Kasus: Indekos di Sekitar Kampus UPNVJT),” G-Tech J. Teknol. Terap., vol. 8, no. 3, pp. 2060–2069, Jul. 2024, doi: 10.33379/gtech.v8i3.4898. DOI: https://doi.org/10.33379/gtech.v8i3.4898

[23] A. R. Adiwidyatma, I. G. S. Mas Diyasa, and T. Trimono, “ANALYSIS OF CLUSTERING METHODS ON THE CAUSAL FACTORS OF DIABETES MELLITUS WITH FUZZY C MEANS METHOD,” J. Lebesgue J. Ilm. Pendidik. Mat. Mat. Dan Stat., vol. 5, no. 2, pp. 983–996, Aug. 2024, doi: 10.46306/lb.v5i2.611. DOI: https://doi.org/10.46306/lb.v5i2.611

[24] Z. Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Data Min. Knowl. Discov., 1998.

[25] H. Řezanková, “DIFFERENT APPROACHES TO THE SILHOUETTE COEFFICIENT CALCULATION IN CLUSTER EVALUATION,” 21st Int. Sci. Conf. AMSE, 2018.

[26] D. A. Prasetya, A. P. Sari, M. Idhom, and A. Lisanthoni, “Optimizing Clustering Analysis to Identify High-Potential Markets for Indonesian Tuber Exports,” Indones. J. Electron. Electromed. Eng. Med. Inform., vol. 7, no. 1, pp. 113–122, 2025, doi: https://doi.org/10.35882/ijeeemi.v7i1.55. DOI: https://doi.org/10.35882/skzqbd57

[27] T. A. Yoga Siswa, “Komparasi Optimasi Chi-Square, CFS, Information Gain dan ANOVA dalam Evaluasi Peningkatan Akurasi Algoritma Klasifikasi Data Performa Akademik Mahasiswa,” Inform. Mulawarman J. Ilm. Ilmu Komput., vol. 18, no. 1, p. 62, Feb. 2023, doi: 10.30872/jim.v18i1.11330. DOI: https://doi.org/10.30872/jim.v18i1.11330

[28] C. Andrade, “The P Value and Statistical Significance: Misunderstandings, Explanations, Challenges, and Alternatives,” Indian J. Psychol. Med., vol. 41, no. 3, pp. 210–215, May 2019, doi: 10.4103/IJPSYM.IJPSYM_193_19. DOI: https://doi.org/10.4103/IJPSYM.IJPSYM_193_19