Main Article Content

David Kurniawan
Ega Budiman
Muhammad Fadli
Erliyan Redy Susanto

Abstract

The leading cause of cancer mortality worldwide remains lung cancer which can be better managed when early and precise diagnosis is achieved to enhance patient outcomes. High-dimensional datasets in medical diagnostics create obstacles for classification because redundant and irrelevant features diminish model accuracy and boost computational complexity. This research investigates how feature selection enhances the performance of lung cancer classification models. The study evaluates Random Forest (RF) and XGBoost as classification models and uses Genetic Algorithm (GA) for feature selection to enhance model efficiency. The GA process ran for 50 generations and reached convergence at the 40th generation which showed that the optimal feature subset had reached stability. Random Forest outperformed XGBoost using GA-based feature selection in a number of parameters, such as accuracy, precision, recall, F1-score, and AUC-ROC. Random Forest displays superior effectiveness in utilizing optimized feature subsets to achieve enhanced generalization and classification performance over XGBoost. The research stands out because it compares how feature selection affects RF and XGBoost algorithms for lung cancer classification using fixed model settings. The research findings demonstrate the value of integrating RF with GA for feature selection as it offers potential for building both efficient and interpretable lung cancer diagnostic models within medical AI

Downloads

Download data is not yet available.

Article Details

How to Cite
Kurniawan, D. ., Budiman, E. ., Fadli, M. . and Susanto, E. R. (2025) “A Improving lung cancer classification with feature selection: a comparative study of random forest and xgboost”, Jurnal Mantik, 9(1), pp. 68-79. doi: 10.35335/mantik.v8i5.6319.
References
Al-Rajab, M., Lu, J. and Xu, Q. (2021) ‘A framework model using multifilter feature selection to enhance colon cancer classification’, PLoS ONE, 16(4 April). Available at: https://doi.org/10.1371/journal.pone.0249094.
Alsinglawi, B. et al. (2022) ‘An explainable machine learning framework for lung cancer hospital length of stay prediction’, Scientific Reports, 12(1), pp. 1–10. Available at: https://doi.org/10.1038/s41598-021-04608-7.
Alsulami, A.A. (2024) ‘An Efficient Model for Lung Cancer Detection through the Integration of Genetic Algorithm and Machine Learning’, Engineering, Technology & Applied Science Research, 14(6), pp. 18792–18798.
Attallah, O. (2025) ‘Lung and Colon Cancer Classification Using Multiscale Deep Features Integration of Compact Convolutional Neural Networks and Feature Selection’, Technologies, 13(2), pp. 1–28. Available at: https://doi.org/10.3390/technologies13020054.
Bansal, M., Goyal, A. and Choudhary, A. (2022) ‘A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning’, Decision Analytics Journal, 3(May), p. 100071. Available at: https://doi.org/10.1016/j.dajour.2022.100071.
Benghazouani, S., Nouh, S. and Zakrani, A. (2024) ‘Enhancing breast cancer diagnosis: a comparative analysis of feature selection techniques’, IAES International Journal of Artificial Intelligence, 13(4), pp. 4312–4322. Available at: https://doi.org/10.11591/ijai.v13.i4.pp4312-4322.
Budiman, E. (2025) Lung Cancer Git.
Chen, J.W. and Dhahbi, J. (2021) ‘Lung adenocarcinoma and lung squamous cell carcinoma cancer classification , biomarker identification , and gene expression analysis using overlapping feature selection methods’, Scientific Reports, pp. 1–15. Available at: https://doi.org/10.1038/s41598-021-92725-8.
Choudhry, I.A. et al. (2023) ‘Hybrid Diagnostic Model for Improved COVID-19 Detection in Lung Radiographs Using Deep and Traditional Features’, Biomimetics, 8(5), pp. 1–19. Available at: https://doi.org/10.3390/biomimetics8050406.
Flyckt, R.N.H. et al. (2024) ‘Pulmonologists-Level lung cancer detection based on standard blood test results and smoking status using an explainable machine learning approach’, Scientific Reports, pp. 1–11. Available at: https://doi.org/10.1038/s41598-024-82093-4.
Göltepe, Y. (2021) ‘Performance of lung cancer prediction methods using different classification algorithms’, Computers, Materials and Continua, 67(2), pp. 2015–2028. Available at: https://doi.org/10.32604/cmc.2021.014631.
Hammad, M. et al. (2024) ‘Automated lung cancer detection using novel genetic TPOT feature optimization with deep learning techniques’, Results in Engineering, 24(October), p. 103448. Available at: https://doi.org/10.1016/j.rineng.2024.103448.
Ileberi, E., Sun, Y. and Wang, Z. (2022) ‘A machine learning based credit card fraud detection using the GA algorithm for feature selection’, Journal of Big Data [Preprint]. Available at: https://doi.org/10.1186/s40537-022-00573-8.
K.R.UTHAYAN1*, S.M. and , B.NIVETHA2, S.D. (2021) ‘OPTIMISED FEATURE SELECTION FOR EARLY CANCER DETECTION’, Original scientific article, 53, pp. 985-996,.
Kaur, H. and Kumari, V. (2022) ‘Predictive modelling and analytics for diabetes using a machine learning approach’, Applied Computing and Informatics, 18(1–2), pp. 90–100. Available at: https://doi.org/10.1016/j.aci.2018.12.004.
Khanna, D., Kumar, A. and Bhat, S.A. (2025) ‘Volatile Organic Compound for the Prediction of Lung Cancer by using Ensembled Machine Model and Feature Selection’, IEEE Access, PP(January), p. 1. Available at: https://doi.org/10.1109/ACCESS.2025.3527027.
Mohamed Ebrahim, A.A.H.S. and S.M. (2023) ‘Accuracy Assessment of Machine Learning Algorithms Used to Predict Breast Cancer’, University of Cambridge [Preprint].
Nadimi-shahraki, M.H., Zamani, H. and Mirjalili, S. (2022) ‘Enhanced whale optimization algorithm for medical feature selection?: A COVID-19 case study’, Computers in Biology and Medicine, 148(June), p. 105858. Available at: https://doi.org/10.1016/j.compbiomed.2022.105858.
Razmjouei, P. et al. (2024) ‘Metaheuristic-Driven Two-Stage Ensemble Deep Learning for Lung/Colon Cancer Classification’, Computers, Materials and Continua, 80(3), pp. 3855–3880. Available at: https://doi.org/10.32604/cmc.2024.054460.
Safriandono, A.N., Setiadi, D.R.I.M., et al. (2024) ‘Analyzing Quantum Feature Engineering and Balancing Strategies Effect on Liver Disease Classification’, Journal of Future Artificial Intelligence and Technologies, 1(1), pp. 51–63. Available at: https://doi.org/10.62411/faith.2024-12.
Safriandono, A.N., Ignatius, D.R., et al. (2024) ‘Journal of Future Artificial Intelligence Analyzing Quantum Feature Engineering and Balancing Strategies Effect on Liver Disease Classification’, Future Techno Science [Preprint].
Shami, T.M. et al. (2022) ‘Particle Swarm Optimization: A Comprehensive Survey’, IEEE Access, 10, pp. 10031–10061. Available at: https://doi.org/10.1109/ACCESS.2022.3142859.
Shantanu Garg (2025) Lung Cancer Prediction Dataset, Kaggle. Available at: https://www.kaggle.com/datasets/shantanugarg274/lung-cancer-prediction-dataset (Accessed: 17 March 2025).
Sheth, P.D., Patil, S.T. and Dhore, M.L. (2022) ‘Evolutionary computing for clinical dataset classification using a novel feature selection algorithm’, Journal of King Saud University - Computer and Information Sciences, 34(8), pp. 5075–5082. Available at: https://doi.org/10.1016/j.jksuci.2020.12.012.
Too, J. and Mirjalili, S. (2021) ‘Knowledge-Based Systems A Hyper Learning Binary Dragonfly Algorithm for Feature Selection?: A COVID-19 Case Study’, Knowledge-Based Systems, 212, p. 106553. Available at: https://doi.org/10.1016/j.knosys.2020.106553.
Vijayalakshmi, S. et al. (2020) ‘Multi-modal prediction of breast cancer using particle swarm optimization with non-dominating sorting’, International Journal of Distributed Sensor Networks, 16(11). Available at: https://doi.org/10.1177/1550147720971505.
Xu, K. et al. (2023) ‘AI Body Composition in Lung Cancer Screening: Added Value Beyond Lung Cancer Detection’, Radiology, 308(1). Available at: https://doi.org/10.1148/radiol.222937.
Zhang, J.U.N. et al. (2020) ‘Cyber Resilience in Healthcare Digital Twin on Lung Cancer’, IEEE Access, 8. Available at: https://doi.org/10.1109/ACCESS.2020.3034324.
Zhang, Y.P. et al. (2023) ‘Artificial intelligence-driven radiomics study in cancer: the role of feature engineering and modeling’, Military Medical Research, 10(1), pp. 1–33. Available at: https://doi.org/10.1186/s40779-023-00458-8.