Modification of random forest method to predict student graduation data
Main Article Content
Abstract
The graduation rate of students is an important measure of a school’s success, as it indicates the school’s ability to help students complete their education. Predicting student completion is crucial for schools to identify at-risk students and offer them early interventions to improve their academic performance. This can also assist policymakers in developing effective policies and programs to enhance graduation rates and reduce dropout rates. The dataset used in this study was obtained from the Kaggle website, and the best model proposed utilizes the Random Forest method with hyperparameter tuning. By adjusting the n_estimator parameter to 1000, our proposed method decreases the mean squared error (MSE) value from 0.5525155 to 0.5374983 and increases the R2 Score value from 0.9984039 to 0.9984873. The study also compares the performance of the proposed model with other datasets sourced from the University of California Irvine (UCI), demonstrating superior performance across all experiments. The results consistently show a decreasing trend in MSE value and an increasing trend in R2 value for all datasets
Downloads
Article Details
Afzal, A., Alshahrani, S., Alrobaian, A., Buradi, A., & Khan, S. A. (2021). Power Plant Energy Predictions Based on Thermal Factors Using Ridge and Support Vector Regressor Algorithms. Energies, 14(21). https://doi.org/10.3390/en14217254.
Agustina, I., Mulyani, Y., Septiana, T., & Mardiana, M. (2022). Analisis Pengembangan Model Prediksi Kesuksesan Kickstarter Menggunakan Algoritma Backpropagation dan Random Forest. Jurnal Informatika Dan Teknik Elektro Terapan, 10(3). https://doi.org/10.23960/jitet.v10i3.2742.
Altujjar, Y., Altamimi, W., Al-Turaiki, I., & Al-Razgan, M. (2016). Predicting Critical Courses Affecting Students Performance: A Case Study. Procedia Computer Science, 82, 65–71. https://doi.org/10.1016/j.procs.2016.04.010.
Amusa, L., North, D., & Zewotir, T. (2021). Optimal Hyperparameter Tuning of Random Forests for Estimating Causal Treatment Effects. Songklanakarin Journal of Science and Technology, 43(4).
Azmi, U., Hadi, Z. N., & Soraya, S. (2020). ARDL METHOD: Forecasting Data Curah Hujan Harian NTB. Jurnal Varian, 3(2). https://doi.org/10.30812/varian.v3i2.627.
Christanto, H. J., Sutresno, S. A., Denny, A., & Dewi, C. (2023, August). Usability analysis of human computer interaction in google classroom and microsoft teams. Journal of Theoretical and Applied Information Technology, 101(16), 6425-6425.
Christanto, H. J., Sutresno, S. A., Simi, V. S., Dewi, C., & Dai, G. (2023). Analysis of Game Theory in Marketing Strategies of Tiktok and Instagram. Journal of Theoretical and Applied Information Technology, 101(22), 7100-7109.
Cominotte, A., Fernandes, A. F. A., Dorea, J. R. R., Rosa, G. J. M., Ladeira, M. M., van Cleef, E. H. C. B., Pereira, G. L., Baldassini, W. A., & Machado Neto, O. R. (2020). Automated Computer Vision System to Predict Body Weight and Average Daily Gain in Beef Cattle During Growing and Finishing Phases. Livestock Science, 232. https://doi.org/10.1016/j.livsci.2019.103904.
Contreras, P., Orellana-Alvear, J., Muñoz, P., Bendix, J., & Célleri, R. (2021). Influence of Random Forest Hyperparameterization on Short-Term Runoff Forecasting in an Andean Mountain Catchment. Atmosphere, 12(2). https://doi.org/10.3390/atmos12020238.
Dewi, C., & Chen, R. C. (2019). Random Forest and Support Vector Machine on Features Selection for Regression Analysis. International Journal of Innovative Computing, Information and Control, 15(6). https://doi.org/10.24507/ijicic.15.06.2027.
Dewi, N. K., Mulyadi, S. Y., & Syafitri, U. D. (2012). Penerapan Metode Random Forest Dalam Driver Analysis. Forum Statistika Dan Komputasi, 16(1).
Ganatra, D. (2020). Ensemble Methods to Improve Accuracy of a Classifier. International Journal of Advanced Trends in Computer Science and Engineering, 9(3). https://doi.org/10.30534/ijatcse/2020/145932020.
Herdianto. (2013). Prediksi Keruskan Motor Induksi Menggunakan Metode Jaringan Saraf Tiruan Backprograption. Fakultas Teknik, Universitas Sumatera Utara, Medan.
Hidayat, Andi Sunyonto, & Hanif Al Fatta. (2023). Klasifikasi Penyakit Jantung Menggunakan Random Forest Clasifier.
Hosoe, M., Kuwano, M., & Moriyama, T. (2021). A Method for Extracting Travel Patterns Using Data Polishing. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-020-00402-w.
Jun, M. J. (2021). A Comparison of A Gradient Boosting Decision Tree, Random Forests, and Artificial Neural Networks to Model Urban Land Use Changes: The Case of The Seoul Metropolitan Area. International Journal of Geographical Information Science, 35(11). https://doi.org/10.1080/13658816.2021.1887490.
Jungmeier, G. (2017). The Biorefinery Fact Sheet. The International Journal of Life Cycle Assessment, 23(1).
Kaggle - KIATTISAK RATTANAPORN. (2023, February). Student Performance Prediction. https://www.kaggle.com/datasets/rkiattisak/student-performance-in-mathematics/data
Louk, M. H. L., & Tama, B. A. (2022). Tree-Based Classifier Ensembles for PE Malware Analysis: A Performance Revisit. Algorithms, 15(9). https://doi.org/10.3390/a15090332.
Lusiana, A., & Yuliarty, P. (2020). Penerapan Metode Peramalan (FORECASTING) Pada Permintaan Atap di PT X. Industri Inovatif?: Jurnal Teknik Industri, 10(1). https://doi.org/10.36040/industri.v10i1.2530.
Pandey, M., & Taruna, S. (2016). Towards the Integration of Multiple Classifier Pertaining to the Student’s Performance Prediction. Perspectives in Science, 8. https://doi.org/10.1016/j.pisc.2016.04.076.
Paulo Cortez. (2014, November 26). Student Performance. https://doi.org/https://doi.org/10.24432/C5TG7T.
Probst, P., Wright, M. N., & Boulesteix, A. L. (2019). Hyperparameters and Tuning Strategies for Random Forest. In Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (Vol. 9, Issue 3). https://doi.org/10.1002/widm.1301.
Purwaningsih, E., & Nurelasari, E. (2021). Penerapan K-Nearest Neighbor Untuk Klasifikasi Tingkat Kelulusan Pada Siswa. Syntax: Jurnal Informatika, 10(01), 46.
Putra, P., Vinolia, & Novianty, H. (2020). Implementation of Trend Moment Method in Egg Forecasting System in Sukamulia Farm. https://doi.org/10.2991/aisr.k.200424.100.
Ren, S., Chan, H. L., & Siqin, T. (2020). Demand Forecasting in Retail Operations For Fashionable Products: Methods, Practices, and Real Case Study. Annals of Operations Research, 291(1–2). https://doi.org/10.1007/s10479-019-03148-8.
Sarker, I. H., Colman, A., Han, J., Khan, A. I., Abushark, Y. B., & Salah, K. (2020). BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model. Mobile Networks and Applications, 25(3). https://doi.org/10.1007/s11036-019-01443-z.
Schonlau, M., & Zou, R. Y. (2020). The Random Forest Algorithm for Statistical Learning. Stata Journal, 20(1). https://doi.org/10.1177/1536867X20909688.
Sianturi, F. A. (2018). Analisa Decision Tree Dalam Pengolahan Data Siswa. 3(2). http://ejournal.ust.ac.id/index.php/Jurnal_Means/
Siji George, C. G., & Sumathi, B. (2020). Grid Search Tuning of Hyperparameters in Random Forest Classifier for Customer Feedback Sentiment Prediction. International Journal of Advanced Computer Science and Applications, 11(9). https://doi.org/10.14569/IJACSA.2020.0110920.
Sinaga, H. D. E., Irawati, N., & Informasi, S. (2018). Perbandingan Double Moving Average Dengan Double Exponential Smoothing Pada Peramalan. Jurteksi, IV(2).
Suliztia, M. L. (2020). Penerapan Analisis Random Forest Pada Prototype Sistem Prediksi Harga Kamera Bekas Menggunakan Flask. Fakultas Matematika Dan Ilmu Pengetahuan Alam.
Thakkar, A., & Chaudhari, K. (2021). Fusion in Stock Market Prediction: A Decade Survey on the Necessity, Recent Developments, and Potential Future Directions. Information Fusion, 65. https://doi.org/10.1016/j.inffus.2020.08.019.
Tian, Z., Xiao, J., Feng, H., & Wei, Y. (2020). Credit Risk Assessment based on Gradient Boosting Decision Tree. Procedia Computer Science, 174. https://doi.org/10.1016/j.procs.2020.06.070.
Yang, F., & Li, F. W. B. (2018). Study on Student Performance Estimation, Student Progress Analysis, and Student Potential Prediction Based on Data Mining. Computers and Education, 123. https://doi.org/10.1016/j.compedu.2018.04.006.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.