Main Article Content

laila qadrini

Abstract

A very important part of data mining is classification techniques, namely how to study a set of data so that rules are generated that can classify or recognize new data that has never been studied. Classification can be defined as a process for declaring a data object as one of the predefined categories (classes). There are many classification techniques global classification techniques have been proposed by experts, which can be grouped into two categories: global classification techniques (taking into account all training data) and local classification techniques (taking into account only part of the training data). each technique has advantages and disadvantages. Some Problems are often encountered in classification, namely the problem of data imbalance. Unbalanced data is a condition where the distribution of data classes is not balanced, the number of one data class is less or more than the number of other data classes. The group of data classes that are less known as the minority group (minority), the other data class group is called the majority group (majority). Classification of data with unbalanced classes is a major problem in the field of data mining, in this research This study resulted in the application of the AdaBoost smote, the AUC value for the Adaboost SMOTE model was 0.784 while the AUC value for the Adaboost model was 0.664.

Downloads

Download data is not yet available.

Article Details

How to Cite
qadrini, laila (2022) “Handling Unbalanced Data With Smote Adaboost ”, Jurnal Mantik, 6(2), pp. 2332-2336. doi: 10.35335/mantik.v6i2.2597.
References
[1] Zaki, M.J., & Meira, W. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. USA, NJ : Cambridge University Press.
[2] Ali, S. M. Shamsuddin, & A. L. Ralescu. (2009). Classification with class imbalance problem: a review. Int J Adv. Soft Compu Appl, 7(3).
[3] Kothan. (2015). Handling class imbalance problem in miRNA dataset associated with cancer. Bioinformation, 11(1):6–10.
[4] Wu, Y. Ye, H. Zhang, M. K. Ng, & S.-S. Ho. (2014). ForesTexter: An efficient random forest algorithm for imbalanced text categorization. Knowl-Based Syst. 67:105–116.
[5] Li & S. Liu. (2014). A comparative study of the class imbalance problem in Twitter spam Detection. Concurr. Comput. Pract. Exp.,pp. n/a-n/a
[6] Siringoringo, Rimbun. (2018). Klasifikasi data tidak seimbang menggunakan algoritma smote dan k-nearest neighbor. Jurnal ISD. 3(1): 2528-5114.
[7] M. Mustaqim, B. Warsito, & B. Surarso. (2019). Kombinasi Synthetic Minority Oversampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan. Regist. J. Ilm. Teknol. Sist. Inf. 5(2):128.
[8] Choi, M. J. (2010). A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines. Iowa: Graduate Theses. Iowa State University.
[9] Yen, S.-J., & Lee, Y.-S. (2009). Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36(3): 5718–5727.
[10] N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal Of Artificial Intelligence Research. 16:321-357.
[11] Kurniawan, Dios. (2020). Pengenalan Machine Learning dengan Python Solusi Untuk Permasalahan Bigdata. Jakarta(ID): PT. Elex Media Komputindo.
[12] Qadrini, L. Seppewali, A, Aina, A. (2021). Decision tree dan adaboost pada klasifikasi penerima program bantuan sosial. Jurnal Inovasi Penelitian. 2(7): 2722-9475.