Main Article Content
Developments in the industrial world, especially in the field of computer technology, demand solutions to needs ranging from computing resources, storage media, and communication speeds. This was followed by many new studies in this field, including those related to clustering. Clustering is an exploratory data analysis tool that deals with the task of grouping objects that are similar to each other , . Clustering requires computers with high resources, usually the price for computers with high resources, while computers with not too high specifications will be less reliable in handling such large data. Clustering that runs on a single-core processor takes a long time to execute tasks, so parallel computing is needed to speed up computing performance especially at Automatic Clustering. This research will produce faster performance in grouping large data by utilizing parallel computing and automatic clustering methods as methods for grouping data. This technology allows data processing to be carried out in parallel and distributed in hundreds or even thousands of computers, so this technology is very appropriate for processing very large amounts of data)
 R. Edelani, A. R. Barakbah, T. Harsono, and A. Sudarsono, “Association analysis of earthquake distribution in Indonesia for spatial risk mapping,” Proc. - Int. Electron. Symp. Knowl. Creat. Intell. Comput. IES-KCIC 2017, vol. 2017-Janua, pp. 231–238, 2017, doi: 10.1109/KCIC.2017.8228592.
 A. R. Barakbah and K. Arai, “Determining Constraints Of Moving Variance to Find Global Optimum and Make Automatic Clustering,” in IECI Japan Workshop, 2004, pp. 409–413.
 M. Bakery and R. K. Buyya, “Cluster computing at a glance,” in High-Performance Cluster Computing: …, 1st ed., Prentice Hall PTR, 1999, pp. 3–47.
 K. Sharmila, S. Kamalakkannan, R. Devi, and C. Shanthi, “Big data analysis using apache Hadoop and spark,” Int. J. Recent Technol. Eng., vol. 8, no. 2, pp. 167–170, 2019, doi: 10.35940/ijrte.A2128.078219.
 K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010, May 2010, pp. 1–10, doi: 10.1109/MSST.2010.5496972.
 “Apache SparkTM - Unified Analytics Engine for Big Data.” https://spark.apache.org/ (accessed Jan. 02, 2020).
 T. Sterling, M. Anderson, and M. Brodowicz, “Introduction,” in High Performance Computing, Elsevier, 2018, pp. 1–42.
 T. Sterling, M. Anderson, and M. Brodowicz, “HPC Architecture 1,” in High Performance Computing, Elsevier, 2018, pp. 43–82.
 S. Xu, Z. Wu, H. Yujing, Q. Xue, S. Liao, and B. Liu, “Optimization of High Performance Computing Cluster based on Intel MIC,” in 2016 2nd IEEE International Conference on Computer and Communications, ICCC 2016 - Proceedings, Oct. 2017, pp. 1028–1033, doi: 10.1109/CompComm.2016.7924860.
 P.Baby and K.Sasirekha, “Agglomerative Hierarchical Clustering Algorithm- A Review,” Int. J. Sci. Res. Publ., vol. 3, no. 3, pp. 2–4, 2013.
 V. Marinova–Boncheva, “Using the Agglomerative Method of Hierarchical Clustering as a Data Mining Tool in Capital Market,” BulDML Inst. Math. Informatics, vol. 15, no. 4, pp. 382–386, 2008.
 L. Chen, S. Wang, and X. Yan, “Centroid-based clustering for graph datasets,” Proc. - Int. Conf. Pattern Recognit., no. Icpr, pp. 2144–2147, 2012.
 A. M. Pfalzgraf and J. A. Driscoll, “A low-cost computer cluster for high-performance computing education,” IEEE Int. Conf. Electro Inf. Technol., pp. 362–366, 2014, doi: 10.1109/EIT.2014.6871791.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.