Enhancement of K-Means algorithm for analyzing earthquake occurrence pattern in the Philippines
Sean Marie Bayono, Ronanne Jcher Bulaon, Richard C. Regala, Vivien A. Agustin & Khatalyn E. Mata
Abstract
This study aims to enhance the K-Means clustering algorithm to improve the analysis of earthquake occurrence patterns in the Philippines. Traditional K-Means, while effective, suffers from limitations such as random initialization and slow convergence. To address these issues, we propose an improved K-Means algorithm that strategically selects initial centroids based on a distance-weighted probability distribution to enhance accuracy and processes data in smaller batches to reduce computation time, thereby improving scalability and convergence speed. Using earthquake data from the Philippine Institute of Volcanology and Seismology (PHIVOLCS), we evaluate the performance of the enhanced algorithm using metrics such as Silhouette Score and Time Complexity. Results demonstrate that the proposed modifications significantly enhance clustering accuracy, computational efficiency, and scalability, leading to more precise identification of high-risk seismic areas. By providing a more accurate and efficient framework for seismic data analysis, this research contributes to disaster preparedness, risk mitigation, and informed decision-making in urban planning and disaster management.
Keywords
K-Means algorithm, mini-batch processing, disaster preparedness, seismic data analysis
Author information & Contribution
Sean Marie Bayono. Corresponding author. Undergraduate student. Department of Computer Science. College of Information Systems and Technology Management - Pamantasan ng Lungsod ng Maynila. Email: smbbayono2021@plm.edu.ph
Ronanne Jcher Bulaon. Undergraduate student. Department of Computer Science. College of Information Systems and Technology Management - Pamantasan ng Lungsod ng Maynila
Richard C. Regala. Bachelor’s Degree in Information Communication Technology. Pamantasan ng Lungsod ng Maynila. Computer Laboratory Administrator
Vivien A. Agustin. Master in Information Technology. College of Information Systems and Technology Management - Pamantasan ng Lungsod ng Maynila. Associate Dean/Assistant Professor III
Khatalyn E. Mata. Doctor in Information Technology. Dean - College of Information Systems and Technology Management, Pamantasan ng Lungsod ng Maynila.
"Author 1 primarily handled the implementation and development of the system and contributed to writing and editing the manuscript. Author 2 was responsible for data acquisition and contributed to drafting and revising the manuscript. Authors 3 and 4, as thesis advisers, provided critical feedback on the study’s validity, structure, and overall quality, including thorough review of formatting and content. Author 5, as the thesis coordinator, supervised the alignment of the manuscript with institutional requirements and provided guidance throughout the writing process. All authors reviewed and approved the final version of the manuscript and agreed to be accountable for all aspects of the work."
Disclosure statement
No potential conflict of interest was reported by the author(s).
Funding
This work was not supported by any funding.
AI Declaration
The author declares the use of Artificial Intelligence (AI) in writing this paper. In particular, the author used ChatGPT in identifying relevant literature and refining content structure. The author takes full responsibility in ensuring that research idea, analysis and interpretations are original work.
Notes
This paper is presented in the 2nd International Student Research Congress (ISRC) 2025
Acknowledgement
References
Arthur, D., & Vassilvitskii, S. (2007). k-means++: The Advantages of Careful Seeding. https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. Advances in Neural Information Processing Systems (NeurIPS), 20, 161–168.
Béjar, J. (2020). K-means vs Mini Batch K-means: A comparison. https://upcommons.upc.edu/bitstream/handle/2117/23414/R13-8.pdf
Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1), 200–210. https://doi.org/10.1016/j.eswa.2012.07.021
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., & Le, Q. V. (2012). Large Scale Distributed Deep Networks. Advances in Neural Information Processing Systems (NeurIPS), 25, 1223–1231.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proceedings. https://file.biolab.si/papers/1996-DBSCAN-KDD.pdf
Fan, Z., & Xu, X. (2019). Application and visualization of typical clustering algorithms in seismic data analysis. Procedia Computer Science, 151, 171–178. https://doi.org/10.1016/j.procs.2019.04.026
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (pp. 451–454). Elsevier. https://doi.org/10.1016/C2009-0-61819-5
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Springer series in statistics the elements of statistical learning data mining, inference, and prediction second edition. https://www.sas.upenn.edu/~fdiebold/NoHesitations/BookAdvanced.pdf
Hicks, S. C., Liu, R., Ni, Y., Purdom, E., & Risso, D. (2021). mbkmeans: Fast clustering for single cell data using mini-batch k-means. PLOS Computational Biology, 17(1), e1008625. https://doi.org/10.1371/journal.pcbi.1008625
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 881–892. https://doi.org/10.1109/tpami.2002.1017616
Likas, A., Vlassis, N., & J. Verbeek, J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451–461. https://doi.org/10.1016/s0031-3203(02)00060-2
Mato, F., & Theofilos Toulkeridis. (2017). An unsupervised K-means based clustering method for geophysical post-earthquake diagnosis. IEEE Symposium Series on Computational Intelligence, 1-8. https://doi.org/10.1109/ssci.2017.8285216
Novianti, P., Setyorini, D., & Rafflesia, U. (2017). K-Means cluster analysis in earthquake epicenter clustering. International Journal of Advances in Intelligent Informatics, 3(2), 81. https://doi.org/10.26555/ijain.v3i2.100
Reynolds, D.A. (2009). Gaussian mixture models. In: Li, S.Z. and Jain, A., (Eds.), Encyclopedia of Biometrics. Springer. https://www.scirp.org/reference/referencespapers?referenceid=3466146
Rifa, I. H., Pratiwi, H., & Respatiwulan, R. (2020). Clustering of earthquake risk in Indonesia using K-Medoids and K-Means algorithms. Media Statistika, 13(2), 194–205. https://doi.org/10.14710/medstat.13.2.194-205
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(0377-0427), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Sanchez, M. Santibanez., Valdovinos, R. M., Trueba, A., Rendon, E., & Lopez, E. (2013). Applicability of cluster validation indexes for large data sets. Artificial Intelligence (MICAI), 187–193. https://doi.org/10.1109/MICAI.2013.30
Sculley, D. (2010). Web-scale k-means clustering. Proceedings of the 19th International Conference on World Wide Web – WWW ’10. https://doi.org/10.1145/1772690.1772862
Shahapure, K. R., & Nicholas, C. (2020). Cluster quality analysis using silhouette score. IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). https://doi.org/10.1109/dsaa49011.2020.00096
Xiangyuan, H., Siyuan, L., & Hao, W. (2020). A survey on k-means initialization methods. https://www.dcs.warwick.ac.uk/~u2470130/randalg20/HLW.pdf
Xiao, B., Wang, Z., Liu, Q., & Liu, X. (2018). SMK-means: An improved mini batch K-means algorithm based on mapreduce with big data. Cmc-Computers Materials & Continua, 56(3), 365–379. https://doi.org/10.3970/cmc.2018.01830
Xie, H., Zhang, L., Lim, C. P., Yu, Y., Liu, C., Liu, H., & Walters, J. (2019). Improving K-means clustering with enhanced Firefly Algorithms. Applied Soft Computing, 84, 105763. https://doi.org/10.1016/j.asoc.2019.105763
Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2), 165–193. https://doi.org/10.1007/s40745-015-0040-1
Xu, Y., Qu, W., Li, Z., Min, G., & Liu, Z. (2014). Efficient -Means++ Approximation with MapReduce. IEEE Transactions on Parallel and Distributed Systems, 25(12), 3135–3144. https://doi.org/10.1109/TPDS.2014.2306193
Cite this article:
Bayono, S.M., Bulaon, R.J., Regala, R.C., Agustin, V.A. & Mata, K.E. (2025). Enhancement of K-Means algorithm for analyzing earthquake occurrence pattern in the Philippines. International Student Research Review, 2(1), 1-18. https://doi.org/10.53378/isrr.163
License:
![]()
This work is licensed under a Creative Commons Attribution (CC BY 4.0) International License.
Most read articles
- Senior High School Strand Alignment and Its Implication to The Tertiary Programs: A Basis for Bridging Program
- Reading Comprehension Difficulties Among Junior High School Learners
- Difficulties in the writing skills of Grade 11 HUMSS students
- Identifying gender stereotypes of high school LGBTQ students
- Lived experiences of senior high school focal persons in the implementation of work immersion program
- Factors Influencing Reading Comprehension and Difficulties Among Intermediate Learners: Basis For Developing Remedial Reading Intervention
- Disaster risk reduction and management on earthquake preparedness: An assessment
- Digital Marketing Strategies Used by Competing Coffee Shops in Candelaria, Quezon: Perspective of Employees
- Analysis of school rules and regulation implementation: Basis for policy enhancement program
- Technical vocational students’ higher learning institution preference and level of academic and skills preparedness
