This study aims to enhance the K-Means clustering algorithm to improve the analysis of earthquake occurrence patterns in the Philippines. Traditional K-Means, while effective, suffers from limitations such as random initialization and slow convergence. To address these issues, we propose an improved K-Means algorithm that strategically selects initial centroids based on a distance-weighted probability distribution to enhance accuracy and processes data in smaller batches to reduce computation time, thereby improving scalability and convergence speed. Using earthquake data from the Philippine Institute of Volcanology and Seismology (PHIVOLCS), we evaluate the performance of the enhanced algorithm using metrics such as Silhouette Score and Time Complexity. Results demonstrate that the proposed modifications significantly enhance clustering accuracy, computational efficiency, and scalability, leading to more precise identification of high-risk seismic areas. By providing a more accurate and efficient framework for seismic data analysis, this research contributes to disaster preparedness, risk mitigation, and informed decision-making in urban planning and disaster management.
K-Means algorithm, mini-batch processing, disaster preparedness, seismic data analysis
Sean Marie Bayono. Corresponding author. Undergraduate student. Department of Computer Science. College of Information Systems and Technology Management - Pamantasan ng Lungsod ng Maynila. Email: smbbayono2021@plm.edu.ph
Ronanne Jcher Bulaon. Undergraduate student. Department of Computer Science. College of Information Systems and Technology Management - Pamantasan ng Lungsod ng Maynila
Richard C. Regala. Bachelor’s Degree in Information Communication Technology. Pamantasan ng Lungsod ng Maynila. Computer Laboratory Administrator
Vivien A. Agustin. Master in Information Technology. College of Information Systems and Technology Management - Pamantasan ng Lungsod ng Maynila. Associate Dean/Assistant Professor III
Khatalyn E. Mata. Doctor in Information Technology. Dean - College of Information Systems and Technology Management, Pamantasan ng Lungsod ng Maynila.
No potential conflict of interest was reported by the author(s).
This work was not supported by any funding.
The author declares the use of Artificial Intelligence (AI) in writing this paper. In particular, the author used ChatGPT in identifying relevant literature and refining content structure. The author takes full responsibility in ensuring that research idea, analysis and interpretations are original work.
This paper is presented in the 2nd International Student Research Congress (ISRC) 2025
Arthur, D., & Vassilvitskii, S. (2007). k-means++: The Advantages of Careful Seeding. https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. Advances in Neural Information Processing Systems (NeurIPS), 20, 161–168.
Béjar, J. (2020). K-means vs Mini Batch K-means: A comparison. https://upcommons.upc.edu/bitstream/handle/2117/23414/R13-8.pdf
Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1), 200–210. https://doi.org/10.1016/j.eswa.2012.07.021
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., & Le, Q. V. (2012). Large Scale Distributed Deep Networks. Advances in Neural Information Processing Systems (NeurIPS), 25, 1223–1231.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proceedings. https://file.biolab.si/papers/1996-DBSCAN-KDD.pdf
Fan, Z., & Xu, X. (2019). Application and visualization of typical clustering algorithms in seismic data analysis. Procedia Computer Science, 151, 171–178. https://doi.org/10.1016/j.procs.2019.04.026
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (pp. 451–454). Elsevier. https://doi.org/10.1016/C2009-0-61819-5
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Springer series in statistics the elements of statistical learning data mining, inference, and prediction second edition. https://www.sas.upenn.edu/~fdiebold/NoHesitations/BookAdvanced.pdf
Hicks, S. C., Liu, R., Ni, Y., Purdom, E., & Risso, D. (2021). mbkmeans: Fast clustering for single cell data using mini-batch k-means. PLOS Computational Biology, 17(1), e1008625. https://doi.org/10.1371/journal.pcbi.1008625
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 881–892. https://doi.org/10.1109/tpami.2002.1017616
Likas, A., Vlassis, N., & J. Verbeek, J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451–461. https://doi.org/10.1016/s0031-3203(02)00060-2
Mato, F., & Theofilos Toulkeridis. (2017). An unsupervised K-means based clustering method for geophysical post-earthquake diagnosis. IEEE Symposium Series on Computational Intelligence, 1-8. https://doi.org/10.1109/ssci.2017.8285216
Novianti, P., Setyorini, D., & Rafflesia, U. (2017). K-Means cluster analysis in earthquake epicenter clustering. International Journal of Advances in Intelligent Informatics, 3(2), 81. https://doi.org/10.26555/ijain.v3i2.100
Reynolds, D.A. (2009). Gaussian mixture models. In: Li, S.Z. and Jain, A., (Eds.), Encyclopedia of Biometrics. Springer. https://www.scirp.org/reference/referencespapers?referenceid=3466146
Rifa, I. H., Pratiwi, H., & Respatiwulan, R. (2020). Clustering of earthquake risk in Indonesia using K-Medoids and K-Means algorithms. Media Statistika, 13(2), 194–205. https://doi.org/10.14710/medstat.13.2.194-205
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(0377-0427), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Sanchez, M. Santibanez., Valdovinos, R. M., Trueba, A., Rendon, E., & Lopez, E. (2013). Applicability of cluster validation indexes for large data sets. Artificial Intelligence (MICAI), 187–193. https://doi.org/10.1109/MICAI.2013.30
Sculley, D. (2010). Web-scale k-means clustering. Proceedings of the 19th International Conference on World Wide Web – WWW ’10. https://doi.org/10.1145/1772690.1772862
Shahapure, K. R., & Nicholas, C. (2020). Cluster quality analysis using silhouette score. IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). https://doi.org/10.1109/dsaa49011.2020.00096
Xiangyuan, H., Siyuan, L., & Hao, W. (2020). A survey on k-means initialization methods. https://www.dcs.warwick.ac.uk/~u2470130/randalg20/HLW.pdf
Xiao, B., Wang, Z., Liu, Q., & Liu, X. (2018). SMK-means: An improved mini batch K-means algorithm based on mapreduce with big data. Cmc-Computers Materials & Continua, 56(3), 365–379. https://doi.org/10.3970/cmc.2018.01830
Xie, H., Zhang, L., Lim, C. P., Yu, Y., Liu, C., Liu, H., & Walters, J. (2019). Improving K-means clustering with enhanced Firefly Algorithms. Applied Soft Computing, 84, 105763. https://doi.org/10.1016/j.asoc.2019.105763
Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2), 165–193. https://doi.org/10.1007/s40745-015-0040-1
Xu, Y., Qu, W., Li, Z., Min, G., & Liu, Z. (2014). Efficient -Means++ Approximation with MapReduce. IEEE Transactions on Parallel and Distributed Systems, 25(12), 3135–3144. https://doi.org/10.1109/TPDS.2014.2306193
Cite this article:
Bayono, S.M., Bulaon, R.J., Regala, R.C., Agustin, V.A. & Mata, K.E. (2025). Enhancement of K-Means algorithm for analyzing earthquake occurrence pattern in the Philippines. International Student Research Review, 2(1), 1-18. https://doi.org/10.53378/isrr.163
License:
This work is licensed under a Creative Commons Attribution (CC BY 4.0) International License.