A Hyperparameter Tuning Using GridsearchCV on Random Forest for Malware Detection

##plugins.themes.academic_pro.article.main##

Iik Muhamad Malik Matin

Abstract

Random forest is one of the popular machine learning algorithms used for classification tasks. In malware detection tasks, random forest can help identify malware with good accuracy. However, to improve model performance, a hyperparameter tuning process is required. GridsearchCV is a hyperparameter tuning method that allows the user to scan a number of selected hyperparameters. In this paper, we conduct experiments using GridsearchCV to perform hyperparameter tuning on Random forests for malware detection tasks. The experimental results show that by performing hyperparameter tuning, we can improve the model's accuracy in identifying malware

##plugins.themes.academic_pro.article.details##

How to Cite
Muhamad Malik Matin, I. (2023). A Hyperparameter Tuning Using GridsearchCV on Random Forest for Malware Detection. MULTINETICS , 9(1), 43–50. https://doi.org/10.32722/multinetics.v9i1.5578

References

  1. H. Rathore, S. Agarwal, S. K. Sahay, and M. Sewak, “Malware detection using machine learning and deep learning,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11297 LNCS, pp. 402–411, 2018, doi: 10.1007/978-3-030-04780-1_28.
  2. Y. Kamalrul Bin Mohamed Yunus and S. Bin Ngah, “Review of Hybrid Analysis Technique for Malware Detection,” IOP Conf. Ser. Mater. Sci. Eng., vol. 769, no. 1, 2020, doi: 10.1088/1757-899X/769/1/012075.
  3. Av-Test Institute, “Malware Statistic & Trend Report.” [Online]. Available: https://portal.av-atlas.org/malware/statistics. [Accessed: 24-Feb-2023].
  4. S. J. Kattamuri, R. K. V. Penmatsa, S. Chakravarty, and V. S. P. Madabathula, “Swarm Optimization and Machine Learning Applied to PE Malware Detection towards Cyber Threat Intelligence,” Electronics, vol. 12, no. 2, p. 342, 2023, doi: 10.3390/electronics12020342.
  5. M. A. Jerlin and K. Marimuthu, “A New Malware Detection System Using Machine Learning Techniques for API Call Sequences,” J. Appl. Secur. Res., vol. 13, no. 1, pp. 45–62, 2018, doi: 10.1080/19361610.2018.1387734.
  6. A. Abdallah et al., “An Optimal Framework for SDN Based on Deep Neural Network,” Comput. Mater. Contin., vol. 73, no. 1, pp. 1125–1140, 2022, doi: 10.32604/cmc.2022.025810.
  7. S. R. T. Mat, M. F. A. Razak, M. N. M. Kahar, J. M. Arif, and A. Firdaus, “A Bayesian probability model for Android malware detection,” ICT Express, vol. 8, no. 3, pp. 424–431, 2022, doi: 10.1016/j.icte.2021.09.003.
  8. E. S. Lamdompak Sistem Komputer and F. Ilmu Komputer, “Klasifikasi Malware Trojan Ransomware Dengan Algoritma Support Vector Machine (SVM),” vol. 2, no. 1, pp. 122–127, 2016.
  9. R. Chaganti, V. Ravi, and T. D. Pham, “A multi-view feature fusion approach for effective malware classification using Deep Learning,” J. Inf. Secur. Appl., vol. 72, 2023, doi: 10.1016/j.jisa.2022.103402.
  10. A. Y. Daeef, A. Al-Naji, A. K. Nahar, and J. Chahl, “Features Engineering to Differentiate between Malware and Legitimate Software,” Appl. Sci., vol. 13, no. 3, 2023, doi: 10.3390/app13031972.
  11. D. Yuxin and Z. Siyi, “Malware detection based on deep learning algorithm,” Neural Comput. Appl., vol. 31, no. 2, pp. 461–472, 2019, doi: 10.1007/s00521-017-3077-6.
  12. M. H. L. Louk and B. A. Tama, “Tree-Based Classifier Ensembles for PE Malware Analysis: A Performance Revisit,” Algorithms, vol. 15, no. 9, pp. 1–15, 2022, doi: 10.3390/a15090332.
  13. N. A. Azeez, O. E. Odufuwa, S. Misra, J. Oluranti, and R. Damaševičius, “Windows PE malware detection using ensemble learning,” Informatics, vol. 8, no. 1, 2021, doi: 10.3390/informatics8010010.
  14. H. S. Anderson and P. Roth, “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models,” 2018.
  15. S. Choi, “Combined kNN classification and hierarchical similarity hash for fast malware detection,” Appl. Sci., vol. 10, no. 15, pp. 1–16, 2020, doi: 10.3390/app10155173.
  16. E. M. Alkhateeb and M. Stamp, “A Dynamic Heuristic Method for Detecting Packed Malware Using Naive Bayes,” 2019 Int. Conf. Electr. Comput. Technol. Appl. ICECTA 2019, 2019, doi: 10.1109/ICECTA48151.2019.8959765.
  17. L. Sayfullina et al., “Efficient detection of zero-day android malware using normalized bernoulli naive bayes,” Proc. - 14th IEEE Int. Conf. Trust. Secur. Priv. Comput. Commun. Trust. 2015, vol. 1, pp. 198–205, 2015, doi: 10.1109/Trustcom.2015.375.
  18. P. D. Utami and R. Sari, “Filtering Hoax Menggunakan Naive Bayes Classifier,” Multinetics, vol. 4, no. 1, p. 57, 2018, doi: 10.32722/vol4.no1.2018.pp57-61.
  19. L. Breiman, “Random Forests,” Mach. Learn., vol. 45, pp. 5–32, 2001, doi: 10.1109/ICCECE51280.2021.9342376.
  20. P. Agrawal and P. Trivedi, “Android Malware Detection Using Machine Learning Classifiers,” in In Data Management, Analytics and Innovation: Proceedings of ICDMAI 2020, 2021, vol. 1, doi: 10.1007/978-981-19-3035-5_15.
  21. A. Hussain, M. Asif, M. Bin Ahmad, T. Mahmood, and M. A. Raza, “Malware Detection Using Machine Learning Algorithms for Windows Platform,” Lect. Notes Networks Syst., vol. 350, pp. 619–632, 2022, doi: 10.1007/978-981-16-7618-5_53.
  22. C. D. Morales-Molina, D. Santamaria-Guerrero, G. Sanchez-Perez, H. Perez-Meana, and A. Hernandez-Suarez, “Methodology for malware classification using a random forest classifier,” 2018 IEEE Int. Autumn Meet. Power, Electron. Comput. ROPEC 2018, no. Ropec, pp. 1–6, 2019, doi: 10.1109/ROPEC.2018.8661441.
  23. S. Naz and D. K. Singh, “Review of Machine Learning Methods for Windows Malware Detection,” 2019 10th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, pp. 6–11, 2019, doi: 10.1109/ICCCNT45670.2019.8944796.
  24. H. Al-Harahsheh, M. Al-Shraideh, and S. Al-Sharaeh, “Performance of Malware Detection Classifier Using Genetic Programming in Feature Selection,” Inform., vol. 45, no. 4, pp. 517–529, 2021, doi: 10.31449/INF.V45I4.3819.
  25. R. K. P. Varma, P. Raju, K. V. S. Raju, and A. Kalidindi, “Feature selection and performance improvement of malware detection system using cuckoo search optimization and rough sets,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 5, pp. 708–714, 2022, doi: 10.14569/IJACSA.2020.0110587.
  26. R. K. V. Penmatsa, A. Kalidindi, and S. K. R. Mallidi, “Feature Reduction and Optimization of Malware Detection System Using Ant Colony Optimization and Rough Sets,” Int. J. Inf. Secur. Priv., vol. 14, no. 3, pp. 95–114, 2020, doi: 10.4018/ijisp.2020070106.
  27. B. H. Shekar and G. Dagnew, “Grid search-based hyperparameter tuning and classification of microarray cancer data,” 2019 2nd Int. Conf. Adv. Comput. Commun. Paradig. ICACCP 2019, pp. 1–8, 2019, doi: 10.1109/ICACCP.2019.8882943.
  28. F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 127, no. 9, pp. 2825–2830, 2011, doi: 10.1289/EHP4713.
  29. A. Kumar, “ClaMP (Classification of Malware with PE headers).” [Online]. Available: https://github.com/urwithajit9/clamp. [Accessed: 21-Jul-2020].
  30. Y. Liao, “PE-Header-Based Malware Study and Detection,” Univ. Georg. Georg. US, p. 4, 2018.
  31. F. Zatloukal and J. Znoj, “Malware Detection Based on Multiple PE Headers Identification and Optimization for Specific Types of Files,” J. Adv. Eng. Comput., vol. 1, no. 2, p. 153, 2017, doi: 10.25073/jaec.201712.64.
  32. D. Devi and S. Nandi, “PE File Features in Detection of Packed Executables,” Int. J. Comput. RTheory Eng., vol. 4, pp. 476–478, 2012.
  33. Q. Zhou, W. Lan, Y. Zhou, and G. Mo, “Effectiveness Evaluation of Anti-bird Devices based on Random Forest Algorithm,” 2020 7th Int. Conf. Information, Cybern. Comput. Soc. Syst. ICCSS 2020, pp. 743–748, 2020, doi: 10.1109/ICCSS52145.2020.9336891.