Comparative Analysis of Linear Regression, Decision Tree, and Gradient Boosting Models for Predicting Drug Corrosion Inhibition Efficiency Using QSAR Descriptors

Darnell ignasius(1), Muhamad Akrom(2*), Setyo Budi(3)

(1) Universitas Dian Nuswantoro
(2) Universitas Dian Nuswantoro
(3) Universitas Dian Nuswantoro
(*) Corresponding Author

Abstract


Corrosion in industrial environments poses significant economic and safety challenges, necessitating the development of effective inhibitors. Organic compounds, particularly pharmaceuticals, have emerged as promising corrosion inhibitors due to their efficiency and environmental benefits. However, predicting these compounds' corrosion inhibition efficiency (CIE) remains complex and requires advanced computational methods. This study investigates the predictive capabilities of three machine learning (ML) models, namely linear regression, decision tree, and gradient boosting regression, using Quantitative Structure-Activity Relationship (QSAR) descriptors. A dataset containing 14 QSAR descriptors was compiled from experimental studies on various pharmaceutical-based inhibitors. The dataset was divided into training (90%) and testing (10%) subsets to evaluate model performance. The research follows the CRISP-DM methodology, a systematic framework that includes data preparation, model training, and evaluation. Key performance metrics, including Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²), were used to assess model accuracy. Among the models, Gradient Boosting Regression achieved the most promising results, with the lowest MSE (21.52) and the highest R² (0.21), reflecting its ability to capture non-linear relationships in the data. Despite the relatively modest R², this model demonstrates the potential for improving computational approaches to corrosion inhibition prediction. This study highlights the value of machine learning in optimizing the selection of corrosion inhibitors, potentially reducing the reliance on extensive laboratory testing and accelerating the discovery of efficient, eco-friendly solutions for industrial applications.


Full Text:

PDF

References


M. Akrom, S. Rustad, and H. K. Dipojono, A machine learning approach to predict the efficiency of corrosion inhibition by natural product-based organic inhibitors, Phys. Scr., 2024, doi: 10.1088/1402-4896/ad28a9.

C. Beltran-Perez et al., A general use QSAR-ARX model to predict the corrosion inhibition efficiency of drugs in terms of quantum mechanical descriptors and experimental comparison for lidocaine, Int. J. Mol. Sci., vol. 23, no. 9, May 2022, doi: 10.3390/ijms23095086.

A. Cherkasov, E. N. Muratov, D. Fourches, QSAR modeling: where have you been? Where are you going to?, J. Med. Chem., 2014, doi: 10.1021/jm4004285.

C. Shearer, The CRISP-DM model: the new blueprint for data mining, J. Data Warehous., 2000.

M. Finšgar and J. Jackson, Application of corrosion inhibitors for steels in acidic media for the oil and gas industry: A review, Corros. Sci., 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0010938X14002157

M. Karelson, V. S. Lobanov, and A. R. Katritzky, Quantum-chemical descriptors in QSAR/QSPR studies, Chem. Rev., 1996, doi: 10.1021/cr950202r.

D. C. Ghosh and R. Biswas, Theoretical calculation of absolute radii of atoms and ions. Part 1. The atomic radii, Int. J. Mol. Sci., 2002. [Online]. Available: https://www.mdpi.com/1422-0067/3/2/87

R. T. V. Consonni, Handbook of Molecular Descriptors, vol. 11, John Wiley & Sons, New York, NY, 2009.

Z. S. Priyambudi and Y. S. Nugroho, Which algorithm is better? An implementation of normalization to predict student performance, AIP Conf. Proc., 2024. [Online]. Available: https://pubs.aip.org/aip/acp/article-abstract/2926/1/020110/2999314

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Stanford, 2009.

H. Campbell, Equivalence testing for linear regression, Psychological Methods, 29(1), 88–98 (2024), https://doi.org/10.1037/met0000596

M. Bhaiyya, D. Panigrahi, P. Rewatkar, and H. Haick, Role of machine learning assisted biosensors in point-of-care testing for clinical decisions, ACS Sens., 2024, doi: 10.1021/acssensors.4c01582.

C. Guestrin and T. Chen, XGBoost: A scalable tree boosting system, in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2016.

A. Jadon, A. Patil, and S. Jadon, A comprehensive survey of regression-based loss functions for time series forecasting, arXiv Preprint arXiv:2211.02989, 2022.

G. Vasconcelos, M. B. Francisco, Prediction of surface roughness in duplex stainless steel face milling using artificial neural network, Int. J. Adv. Manuf. Technol., 2024, doi: 10.1007/s00170-024-13955-4.

S. Ameer, M. A. Shah, A. Khan, H. Song, C. Maple, Comparative analysis of machine learning techniques for predicting air quality in smart cities, IEEE Access, 2019. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8746201/

K. E. Lotterhos, Principles in experimental design for evaluating genomic forecasts, Methods Ecol. Evol., 2024, doi: 10.1111/2041-210X.14379.

Y. Akkem, B. S. Kumar, and A. Varanasi, Streamlit application for advanced ensemble learning methods in crop recommendation systems–a review and implementation, Indian J. Sci. Technol., 2023. [Online]. Available: https://sciresol.s3.us-east-2.amazonaws.com/IJST/Articles/2023/Issue-48/IJST-2023-2850.pdf

M. Akrom, S. Rustad, and H. K. Dipojono, SMILES-based machine learning enables the prediction of corrosion inhibition capacity, MRS Commun., vol. 14, pp. 379–387, 2024, doi: 10.1557/s43579-024-00551-6.

M. Akrom, DFT investigation of Syzygium aromaticum and Nicotiana tabacum extracts as corrosion inhibitors, Sci. Tech. J. Ilmu Pengetahuan Teknol., vol. 8, no. 1, pp. 42-48, 2022.

M. Akrom, S. Rustad, A. G. Saputro, and H. K. Dipojono, Data-driven investigation to model the corrosion inhibition efficiency of pyrimidine-pyrazole hybrid corrosion inhibitors, Comput. Theor. Chem., vol. 1229, p. 114307, 2023.

M. Akrom, S. Rustad, and H. K. Dipojono, Machine learning investigation to predict corrosion inhibition capacity of new amino acid compounds as corrosion inhibitors, Results Chem., vol. 6, p. 101126, 2023.

M. Akrom and T. Sutojo, Investigasi model machine learning berbasis QSPR pada inhibitor korosi pirimidin, Eksergi, vol. 20, no. 1, 2023.

S. Budi, M. Akrom, H. Al Azies, U. Sudibyo, T. Sutojo, G. A. Trisnapradika, A. N. Safitri, A. Pertiwi, and S. Rustad, Implementation of polynomial functions to improve the accuracy of machine learning models in predicting the corrosion inhibition efficiency of pyridine-quinoline compounds as corrosion inhibitors, KnE Eng., pp. 78-87, 2024.

W. Herowati, W. A. E. Prabowo, M. Akrom, T. Sutojo, N. A. Setiyanto, A. W. Kurniawan, N. N. Hidayat, and S. Rustad, Prediction of corrosion inhibition efficiency based on machine learning for pyrimidine compounds: A comparative study of linear and non-linear algorithms, KnE Eng., pp. 68-77, 2024.




DOI: http://dx.doi.org/10.30998/faktorexacta.v17i3.24679

Refbacks

  • There are currently no refbacks.




Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

template doaj grammarly tools mendeley crossref SINTA sinta faktor exacta   Garuda Garuda Garuda Garuda Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Flag Counter

site
stats View Faktor Exacta Stats


pkp index