Prévia do material em texto
Journal Pre-proof Performance Comparison of Several Explainable Hybrid Ensemble Models for Predicting Carbonation Depth in Fly Ash Concrete Meng Wang, Hani S. Mitri, Guoyan Zhao, Junxi Wu, Yihang Xu, Weizhang Liang, Ning Wang PII: S2352-7102(24)02814-6 DOI: https://doi.org/10.1016/j.jobe.2024.111246 Reference: JOBE 111246 To appear in: Journal of Building Engineering Received Date: 23 July 2024 Revised Date: 11 October 2024 Accepted Date: 4 November 2024 Please cite this article as: M. Wang, H.S. Mitri, G. Zhao, J. Wu, Y. Xu, W. Liang, N. Wang, Performance Comparison of Several Explainable Hybrid Ensemble Models for Predicting Carbonation Depth in Fly Ash Concrete, Journal of Building Engineering, https://doi.org/10.1016/j.jobe.2024.111246. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2024 Published by Elsevier Ltd. https://doi.org/10.1016/j.jobe.2024.111246 https://doi.org/10.1016/j.jobe.2024.111246 1 Performance Comparison of Several Explainable Hybrid 1 Ensemble Models for Predicting Carbonation Depth in Fly 2 Ash Concrete 3 Meng Wang 1, Hani S. Mitri 2, Guoyan Zhao 1, Junxi Wu 1, Yihang Xu 1, Weizhang Liang 1 and Ning Wang 1* 4 1School of Resource and Safety Engineering, Central South University, Changsha, Hunan 410083, PR China; 5 2Department of Mining and Materials Engineering, McGill University, 3450 University Street, Montreal, Quebec, Canada, H3A 0E8. 6 *Correspondence: ningwang98@csu.edu.cn 7 8 Abstract: The carbonation of fly ash concrete critically impacts the lifespan of structures, necessitating 9 precise prediction of carbonation depth for the construction industry. This study establishes an original 10 database comprising 883 cases, which are divided into training and testing sets in a 4:1 ratio. The Sand Cat 11 Swarm Algorithm (SCSO) was developed to optimize the hyperparameters of three ensemble models: 12 Gradient Boosting Decision Trees (GBDT), Light Gradient Boosting Machine (LGBM), and Categorical 13 Boosting (CatBoost), resulting in the development of three hybrid ensemble models: SCSO-GBDT, 14 SCSO-LGBM, and SCSO-CatBoost. Five classic models were included in the comparison, and all models 15 used five-fold cross validation. Models’ performance was rigorously evaluated using Correlation coefficient 16 (R²), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Variance Accounted For (VAF), 17 with VIsekriterijumska optimizacija i KOmpromisno Resenje (VIKOR) method and Taylor diagrams utilized 18 for optimal model selection. The SCSO-CatBoost model demonstrated superior performance, with R² = 19 0.9657, MAE = 2.0062, RMSE = 8.2147, and VAF = 96.6177. Shapley additive explanations (SHAP) 20 analysis of the SCSO-CatBoost model identified time of exposure as the most significant factor influencing 21 carbonation depth, followed by fly ash content and carbon dioxide concentration. To facilitate practical 22 application by non-algorithm engineers, an intelligent program was developed, allowing for straightforward 23 testing with the three hybrid ensemble models. This study presents three precise models for predicting the 24 carbonation depth of fly ash concrete. These models serve as valuable tools for estimating the service life of 25 concrete structures and can be utilized as simulation instruments in durability engineering. 26 Keywords: Carbonation Depth; Fly Ash Concrete; Hybrid Ensemble Models; Prediction27 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 2 Nomenclature Adaboost Adaptive boosting MAE Mean absolute error ANN Artificial neural network ML Machine learning B Binder NIS Negative ideal solution BPNN Back propagation neural network PIS Positive ideal solution CatBoost Categorical boosting PSO Particle swarm optimization CD Carbonation depth R2 Coefficient of determination CNN Convolutional neural networks RF Random forest model CO2 CO2 concentration RH Relative humidity CSO Chicken swarm optimization RMSD Root means square deviation EFB Exclusive feature bundling RMSE Root means square error EL Ensemble learning RNN Recurrent neural network FA Fly ash content SC Sand cat FAC Fly ash concrete SCMs Supplementary cementitious materials GBDT Gradient boosting decision trees SCSO Sand cat swarm algorithm GOSS Gradient-based one-side sampling SHAP Shapley additive explanations GU Computing group utility value SOA Seagull optimization algorithm GUI Graphical user interface SVR Support vector regression GWO Grey wolf optimization t Time of exposure IR Individual regret values VAF Value accounted for KNN K-nearest neighbor VIKOR Visekriterijumska optimizacija i kompromisno resenje LGBM Light gradient boosting machine w/b Water-to-binder ratio LR Linear Regression X Carbonation depth (When used as the output of a model) LSTM Long short-term memory XGBoost Extreme gradient boosting 28 1. Introduction 29 Concrete structures are extensively used in industrial construction due to their superior mechanical 30 properties and relatively low construction costs[1, 2]. However, as these buildings age, they are subjected to 31 various environmental factors, leading to gradual changes in the micro structure of the concrete due to 32 carbonation. This process results in rebar corrosion, diminished mechanical performance, and ultimately a 33 reduction in the lifespan of the structures [3, 4]. Currently, concrete carbonation is recognized as one of the 34 critical factors affecting the durability of reinforced concrete structures [5, 6]. If not properly addressed, this 35 issue can result in significant economic losses and environmental pollution [7, 8]. Quantifying and predicting 36 the carbonation depth (CD) is essential to mitigate these problems. 37 Carbonation takes place when carbon dioxide (CO2) in the atmosphere infiltrates microcracks in 38 concrete, reacting with hydrated cementitious materials to form calcium carbonate [9]. This process lowers 39 the pH of the concrete, which leads to the corrosion of embedded steel reinforcements and undermines 40 structural integrity [10]. In typical concrete, carbonation peaks at a relative humidity of 50-60% but 41 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado 3 decreases in dry or water-saturated conditions. High concentrations of CO2 also accelerate the carbonation 42 process [11]. The global cement industry accounts for 5% to 7% of total CO2 emissions [12]. To minimize 43 environmental impact, supplementary cementitious materials (SCMs) are increasingly being used as 44 alternatives to cement and coarse aggregates in concrete [13, 14]. For instance, Ahmed [15] proposed using 45 rubber tires as a substitute, which not only enhances the properties of concrete but also benefits the 46 ecological environment and the construction industry. Fly ash (FA), a byproduct of coal combustion, is 47 widely utilized in concrete due to its numerous benefits, including reduced permeability and enhanced 48 workability [16, 17]. The use of FA in concrete addresses environmental concerns associated with its disposal 49 and contributes to more sustainable and resilient concrete structures [18, 19]. 50 However, the carbonation resistance of fly ash concrete (FAC) is controversial. Some studies indicate 51 thatMaterials, 29 (2012) 263-269. 640 [38] M. Castellote, C. Andrade, Modelling the carbonation of cementitious matrixes by means of the 641 unreacted-core model, UR-CORE, Cem. Concr. Res., 38 (2008) 1374-1384. 642 [39] V.G. Papadakis, C.G. Vayenas, M.N. Fardis, A reaction engineering approach to the problem of 643 concrete carbonation, AIChE, 35 (1989) 1639-1650. 644 [40] V.G. Papadakis, M.N. Fardis, C.G. Vayenas, Effect of composition, environmental factors and 645 cement-lime mortar coating on concrete carbonation, Mater. Struct., 25 (1992) 293-304. 646 [41] V.G. Papadakis, Experimental investigation and theoretical modeling of silica fume activity in concrete, 647 Cem. Concr. Res., 29 (1999) 79-86. 648 [42] X.-Y. Wang, H.-S. Lee, A model for predicting the carbonation depth of concrete containing 649 low-calcium fly ash, Construction and Building Materials, 23 (2009) 725-733. 650 [43] H. Torres, E. Correa, J.G. Castaño, F. Echeverría, Simplified Mathematical Model for Concrete 651 Carbonation, J. Mater. Civ. Eng., 29 (2017) 04017150. 652 Jo urn al Pre- pro of 34 [44] A. Zurek, Numerical approximation of a concrete carbonation model: Study of the -law of propagation, 653 Numerical Methods for Partial Differential Equations, 35 (2019) 1801-1820. 654 [45] R.A. Patel, S.V. Churakov, N.I. Prasianakis, A multi-level pore scale reactive transport model for the 655 investigation of combined leaching and carbonation of cement paste, Cem. Concr. Compos., 115 (2021) 656 103831. 657 [46] I. Monteiro, F.A. Branco, J. de Brito, R. Neves, Statistical analysis of the carbonation coefficient in 658 open air concrete structures, Construction and Building Materials, 29 (2012) 263-269. 659 [47] M. Wang, G. Zhao, W. Liang, N. Wang, A comparative study on the development of hybrid SSA-RF 660 and PSO-RF models for predicting the uniaxial compressive strength of rocks, Case Studies in Construction 661 Materials, 18 (2023) e02191. 662 [48] X. Zhang, M.Z. Akber, W. Zheng, Prediction of seven-day compressive strength of field concrete, 663 Construction and Building Materials, 305 (2021) 124604. 664 [49] C. Zhao, J. Li, Z. Zhu, Q. Guo, X. Wu, Z. Wang, R. Zhao, Research on the carbonation resistance and 665 carbonation depth prediction model of fly ash- and slag-based geopolymer concrete, KSCE Journal of Civil 666 Engineering, (2024). 667 [50] J.-s. Zhang, M. Cheng, J.-h. Zhu, Carbonation depth model and prediction of hybrid fiber fly ash 668 concrete, Advances in Civil Engineering, 2020 (2020). 669 [51] Y. Wei, P. Chen, S. Cao, H. Wang, Y. Liu, Z. Wang, W. Zhao, Prediction of carbonation depth for 670 concrete containing mineral admixtures based on machine learning, Arabian Journal for Science and 671 Engineering, 48 (2023) 13211-13225. 672 [52] I.D. Uwanuakwa, Deep learning modelling and generalisation of carbonation depth in fly ash blended 673 concrete, Arabian Journal for Science and Engineering, 46 (2021) 4731-4746. 674 [53] I. Nunez, M.L. Nehdi, Machine learning prediction of carbonation depth in recycled aggregate concrete 675 incorporating SCMs, Construction and Building Materials, 287 (2021). 676 [54] Y. Kellouche, M. Ghrici, B. Boukhatem, Service life prediction of fly ash concrete using an artificial 677 neural network, Frontiers of Structural and Civil Engineering, 15 (2021) 793-805. 678 [55] E.F. Felix, R. Carrazedo, E. Possan, Carbonation model for fly ash concrete based on artificial neural 679 network: Development and parametric analysis, Construction and Building Materials, 266 (2021). 680 [56] Z. Chen, J. Lin, K. Sagoe-Crentsil, W. Duan, Development of hybrid machine learning-based 681 carbonation models with weighting function, Construction and Building Materials, 321 (2022). 682 [57] K. Liu, M.S. Alam, J. Zhu, J. Zheng, L. Chi, Prediction of carbonation depth for recycled aggregate 683 concrete using ANN hybridized with swarm intelligence algorithms, Construction and Building Materials, 301 684 (2021) 124382. 685 [58] R. Biswas, E. Li, N. Zhang, S. Kumar, B. Rai, J. Zhou, Development of hybrid models using 686 metaheuristic optimization techniques to predict the carbonation depth of fly ash concrete, Construction and 687 Building Materials, 346 (2022) 128483. 688 [59] J. Zhou, Y. Qiu, S. Zhu, D.J. Armaghani, M. Khandelwal, E.T. Mohamad, Estimation of the TBM 689 advance rate under hard rock conditions using XGBoost and Bayesian optimization, Underground Space, 6 690 (2021) 506-515. 691 [60] Y. Kellouche, B. Boukhatem, M. Ghrici, A. Tagnit-Hamou, Exploring the major factors affecting 692 fly-ash concrete carbonation using artificial neural network, Neural Computing and Applications, 31 (2019) 693 969-988. 694 [61] C.-F. Chang, J.-W. Chen, The experimental investigation of concrete carbonation depth, Cem. Concr. 695 Res., 36 (2006) 1760-1767. 696 [62] H. Cui, W. Tang, W. Liu, Z. Dong, F. Xing, Experimental study on effects of CO2 concentrations on 697 concrete carbonation and diffusion mechanisms, Construction and Building Materials, 93 (2015) 522-527. 698 Jo urn al Pre- pro of 35 [63] J. Balayssac, C.H. Détriché, J. Grandet, Effects of curing upon carbonation of concrete, Construction 699 and Building Materials, 9 (1995) 91-95. 700 [64] E. Roziere, A. Loukili, F. Cussigh, A performance based approach for durability of concrete exposed to 701 carbonation, Construction and Building Materials, 23 (2009) 190-199. 702 [65] S. Hussain, D. Bhunia, S. Singh, Comparative study of accelerated carbonation of plain cement and 703 fly-ash concrete, Journal of Building Engineering, 10 (2017) 26-31. 704 [66] A. Younsi, P. Turcry, A. Aït-Mokhtar, S. Staquet, Accelerated carbonation of concrete with high content 705 of mineral additions: effect of interactions between hydration and drying, Cem. Concr. Res., 43 (2013) 25-33. 706 [67] P. Turcry, L. Oksri-Nelfia, A. Younsi, A. Aït-Mokhtar, Analysis of an accelerated carbonation test with 707 severe preconditioning, Cem. Concr. Res., 57 (2014) 70-78. 708 [68] Y. Chen, P. Liu, Z. Yu, Effects of environmental factors on concrete carbonation depth and compressive 709 strength, Materials, 11 (2018) 2167. 710 [69] C.H. Huang, G.L. Geng, Y.S. Lu, G. Bao, Z.R. Lin, Carbonation depth research of concrete with 711 low-volume fly ash, Applied Mechanics and Materials, 155 (2012) 984-988. 712 [70] Y. Gao, L. Cheng, Z. Gao, S. Guo, Effects of different mineral admixtures on carbonation resistance of 713 lightweight aggregate concrete, Construction and Building Materials, 43 (2013) 506-510. 714 [71] Q. Zhao, X. He, J. Zhang, J. Jiang, Long-age wet curing effect on performance of carbonation 715 resistance of fly ash concrete, Construction and building materials, 127 (2016) 577-587. 716 [72] R. Kurda, J. de Brito, J.D. Silvestre, Carbonation of concrete made with high amount of fly ash and 717 recycled concrete aggregates for utilization of CO2, Journal of CO2 Utilization, 29 (2019) 12-19. 718 [73] C.-f. Lu, W. Wang, Q.-t. Li, M. Hao, Y. Xu, Effects of micro-environmental climate on the carbonation 719 depth and the pH value in fly ash concrete, Journal of Cleaner Production, 181 (2018) 309-317. 720 [74] E.F. Felix, R. Carrazedo, E. Possan, Carbonation model for fly ash concrete based on artificial neural 721 network: Development and parametric analysis, Construction and Building Materials, 266 (2021) 121050. 722 [75] Y. Kellouche, B. Boukhatem, M. Ghrici, A. Tagnit-Hamou, Exploring the major factors affecting 723 fly-ash concrete carbonation using artificial neural network, Neural Computing and Applications, 31 (2017) 724 969-988. 725 [76] J.H. Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics, 726 (2001) 1189-1232. 727 [77] J.H. Friedman, Stochastic gradient boosting, Comput. Stat. Data Anal., 38 (2002) 367-378. 728 [78] Y. Zhang, A. Haghani, A gradientboosting method to improve travel time prediction, Transportation 729 Research Part C: Emerging Technologies, 58 (2015) 308-324. 730 [79] T. Li, Q. Xia, Y. Ouyang, R. Zeng, Q. Liu, T. Li, Prospectivity and Uncertainty Analysis of Tungsten 731 Polymetallogenic Mineral Resources in the Nanling Metallogenic Belt, South China: A Comparative Study of 732 AdaBoost, GBDT, and XgBoost Algorithms, Natural Resources Research, 33 (2024) 1049-1071. 733 [80] Z. Chen, Z. Li, H. Xia, X. Tong, Performance optimization of the elliptically vibrating screen with a 734 hybrid MACO-GBDT algorithm, Particuology, 56 (2021) 193-206. 735 [81] X. Sun, M. Liu, Z. Sima, A novel cryptocurrency price trend forecasting model based on LightGBM, 736 Finance Research Letters, 32 (2020) 101084. 737 [82] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu, Lightgbm: A highly efficient 738 gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 30 (2017). 739 [83] W. Zhang, C. Wu, L. Tang, X. Gu, L. Wang, Efficient time-variant reliability analysis of Bazimen 740 landslide in the Three Gorges Reservoir Area using XGBoost and LightGBM algorithms, Gondwana Res, 123 741 (2023) 41-53. 742 [84] A.V. Dorogush, V. Ershov, A. Gulin, CatBoost: gradient boosting with categorical features support, 743 arXiv preprint arXiv:1810.11363, (2018). 744 Jo urn al Pre- pro of 36 [85] L. Prokhorenkova, G. Gusev, A. Vorobev, A.V. Dorogush, A. Gulin, CatBoost: unbiased boosting with 745 categorical features, Adv. Neural Inf. Process. Syst., 31 (2018). 746 [86] G.Y. Zhao, M. Wang, W.Z. Liang, A Comparative Study of SSA-BPNN, SSA-ENN, and SSA-SVR 747 Models for Predicting the Thickness of an Excavation Damaged Zone around the Roadway in Rock, 748 Mathematics, 10 (2022). 749 [87] M. Wang, G. Zhao, W. Liang, N. Wang, A Comparative Study on the development of Hybrid SSA-RF 750 and PSO-RF Models for Predicting the Uniaxial Compressive Strength of Rocks, Case Studies in Construction 751 Materials, (2023) e02191. 752 [88] S. Opricovic, Multicriteria optimization of civil engineering systems, Faculty of civil engineering, 753 Belgrade, 2 (1998) 5-21. 754 [89] Á. Delgado-Panadero, B. Hernández-Lorca, M.T. García-Ordás, J.A. Benítez-Andrades, Implementing 755 local-explainability in gradient boosting trees: Feature contribution, Information Sciences, 589 (2022) 756 199-212. 757 [90] N. Kumar, S. Prakash, S. Ghani, M. Gupta, S. Saharan, Data-driven machine learning approaches for 758 predicting permeability and corrosion risk in hybrid concrete incorporating blast furnace slag and fly ash, 759 Asian Journal of Civil Engineering, (2024) 1-13. 760 [91] M. Kumar, M. Kumar, S. Singh, S. Kim, A. Anand, S. Pandey, S.M.M. Hasnain, A.E. Ragab, A.F. 761 Deifalla, A hybrid model based on convolution neural network and long short-term memory for qualitative 762 assessment of permeable and porous concrete, Case Studies in Construction Materials, 19 (2023). 763 [92] H. Qin, J. Wang, Probabilistic prediction model of concrete carbonation depth considering the influence 764 of multiple factors, Structural Concrete, 24 (2023) 6209-6238. 765 766 Jo urn al Pre- pro of 1 1. This study establishes the largest and highest-quality database for predicting carbonation depth in fly ash concrete, consisting of 883 cases. A reliable database forms the foundation for model accuracy. 2. This study proposes three novel hybrid ensemble learning models (SCSO-GBDT, SCSO-LGBM, and SCSO-CatBoost) for predicting carbonation depth in fly ash concrete, offering a reliable solution for long-term predictions. 3. The hybrid models outperform classical models (RF, BPNN, KNN, LR, and SVR) in predictive performance (R² > 0.95) on the current dataset, demonstrating strong robustness and stability. 4. SHAP analysis of the best-performing model, SCSO-CatBoost, identified time of exposure as the key factor influencing carbonation depth, followed by fly ash content and carbon dioxide concentration. To facilitate practical application, a user-friendly GUI program is developed, enabling non-algorithm engineers to use the three hybrid ensemble models effectively. Jo urn al Pre- pro of Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐ The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Jo urn al Pre- pro ofreplacing cement with FA accelerates carbonation [20-23]. In contrast, other studies, such as those by 52 Khunthongkeaw et al. [13], found that concrete containing 10% FA showed minimal carbonation effects. 53 They also discovered that reducing the water to cement ratio (w/b) and the percentage of FA can improve 54 carbonation rate. Atis [24] indicated that concrete with 70% FA has a higher degree of carbonation compared 55 to concrete with 50% FA replacement. Khunthongkeaw et al. [25] pointed out that the CD increases with 56 higher levels of FA, CO2, and w/b. 57 Regardless, carbonation shortens the service life of FA concrete [26, 27]. If improperly managed, 58 carbonation can lead to the damage of FA concrete, that is threats to structural integrity and overall safety [28, 59 29]. Accurate prediction of CD is crucial for evaluating structural durability, identifying risks, and 60 formulating maintenance strategies. In past studies, the CD of concrete has typically been estimated using 61 empirical formulas or theoretical models. Empirical formulas are derived by obtaining relevant data through 62 accelerated carbonation tests in the laboratory or from natural carbonation cases, followed by fitting the data 63 using mathematical statistical methods [4, 13, 30-37]. However, this method often relies on a small amount 64 of data and lacks generalizability to different environmental conditions. Theoretical models, on the other 65 hand, commonly use Fick's law to predict the carbonation process of concrete [16, 38-45], Despite this, they 66 tend to have significant errors and each parameter requires a clear physical meaning, which is not well-suited 67 to the complex mechanisms of concrete carbonation [46]. Therefore, there is a need for a more efficient and 68 effective method. 69 Recently, machine learning (ML) has become one of the most effective methods for addressing civil 70 engineering challenges [47, 48]. Numerous studies have utilized ML models to fitting carbonization 71 relationship of FAC [49-56]. For instance, Chen et al. [56] introduced a method based on weighted functions 72 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 4 to predict the CD of concrete. Liu et al. [57] explored the prediction of recycled aggregate concrete CD using 73 a combination of artificial neural network models and meta-heuristic algorithms. Biswas et al. [58] applied 74 support vector regression models combined with meta-heuristic algorithms to predict the CD of FAC. These 75 studies demonstrate the superiority of meta-heuristic algorithms. 76 However, research on the carbonation prediction of SCMs concretes (especially those with FA) remains 77 relatively sparse, with publications on this topic accounting for only 7% of all studies on concrete 78 carbonation prediction [58]. This scarcity can be attributed to the difficulty in developing a comprehensive 79 formula that considers all variables affecting CD, particularly the types and amounts of additives in the 80 concrete. The literature indicates that no studies have yet utilized meta-heuristic algorithms and ensemble 81 models to predict the CD of FAC. Furthermore, the development of ensemble learning (EL) in this field is 82 also relatively limited [60-62]. EL significantly enhances model accuracy and stability by combining the 83 predictions of multiple base learners. Compared to single ML models, EL more effectively handles data 84 noise and biases, reduces the risk of overfitting, and exhibits greater robustness and generalization ability in 85 complex problems. Recent studies have further indicated that ensemble models incorporating meta-heuristic 86 algorithms outperform traditional ensemble models [59]. 87 Therefore, this study developed three hybrid ensemble models to predict the CD of FAC. First, 883 88 cases were collected to establish the latest database, followed by statistical analysis of the data. Next, the 89 SCSO was used to find the optimal hyperparameter combinations for GBDT, LGBM, and CatBoost, 90 developing hybrid ensemble models. These models were compared with several classical models to achieve 91 the best performance. As research progresses, the number of existing models continues to increase, making 92 the development of an evaluation tool for model selection essential. This study developed the VIKOR 93 method for model selection, which yielded favorable results. Subsequently, the SHAP was applied to the best 94 hybrid ensemble model for interpretability analysis, aiding in feature selection for future research. Lastly, to 95 facilitate the application of our findings by non-algorithm engineers, we developed a user-friendly program 96 that outputs the CD of FAC based on input feature values. This study effectively predicted the CD of FAC, 97 allowing for the estimation of the lifecycle of concrete structures based on material properties and 98 environmental characteristics. This research deepens the understanding of concrete carbonation, improves 99 durability assessment, and promotes interdisciplinary innovation between EL and civil engineering. 100 2. Data analysis 101 Constructing a robust and comprehensive database is pivotal for enhancing the efficacy of model training 102 and validation. This study meticulously undertook the construction of an extensive database by integrating 103 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 5 insights from a thorough literature review. This endeavor incorporated data from 17 distinct research sources, 104 resulting in a dataset encompassing 883 case sets [13, 16, 24, 60-73]. These cases represent a broad spectrum 105 of scenarios, providing a rich collection of sample data for model training. Detailed references and pertinent 106 information about this database are enumerated in Fig. 1. 107 108 Fig. 1. Detailed references and pertinent information about database. 109 Considering the primary factors influencing CD and the distinct characteristics of various ML models [57, 110 74, 75], this study incorporates seven variables: six inputs and one output. Detailed statistical information is 111 presented in Table 1. The input variables include binder ( B ), fly ash content ( FA ), water to binder ratio 112 ( /w b ), carbon dioxide concentration ( 2CO ), relative humidity ( RH ), and time of exposure ( t ), while the 113 output variable is carbonation depth ( X ). These parameters are selected for their significant influence on the 114 carbonation process: B and FA affect the concrete's fundamental properties and pore structure; /w b 115 determines its density and porosity; 2CO and RH influence the carbonation reaction kinetics; and t 116 dictates the progression of carbonation over time. The variable t is measured in days, with a minimum of 3 117 days and a maximum of 365 days. To improve the efficiency of model training, and based on relevant 118 literature, we applied a square root transformation to t . When developing the GUI in the future, users will 119 still be able to input the time directly in days, and the transformation will be automatically applied in the 120 code to fit the model. Collectively, these variables provide a robust framework for assessing concrete CD and 121 durability. 122 Table 1. Statistics of each feature. 123 Indicator Min Median Max Mean Standard deviation 48个 (5.4%) 18个 (2%) 5个 (0.6%) 8个 (0.9%) 33个 (3.7%) 24个 (2.7%) 20个 (2.3%) 15个 (1.7%) 72个 (8.2%) 140个 (15.9%) 64个 (7.2%) 40个 (4.5%) 16个 (1.8%) 18个 (2%) 60个 (6.8%) 2个 (0.2%) 300个 (34%) Kellouche et al. Chang et al. Cui et al. Jiang et al. Balayssac et al. Rozière et al. Hussain et al.Younsi et al. Turcry et al. Chen et al. Khunthongkeaw et al. Huang et al. Gao et al. Zhao et al. Kurda et al. Lu et al. Atis et al. Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 6 B (kg/m3) 120.00 350.00 500.00 354.21 70.92 FA (%) 0.00 25.00 70.00 22.70 22.52 /w b (-) 0.28 0.46 0.65 0.46 0.09 2CO (%) 0.03 6.50 100.00 13.97 16.77 RH (%) 40.00 65.00 100.00 66.43 9.70 t (-) 1.73 5.29 23.24 6.94 4.77 X (mm) 0.00 9.80 67.20 13.20 12.75 Fig. 2 illustrates the correlation matrix and distribution of variables used in predicting CD in FAC. This 124 figure illustrates the relationships between multiple variables through a combination of scatter plot matrices, 125 histograms, and a correlation matrix. The lower-left section consists of scatter plot matrices, where the 126 scatter plots and red linear fit lines depict the relationships between pairs of variables. The diagonal displays 127 histograms, showing the distribution of each variable. The upper-right section contains a correlation matrix, 128 where the color intensity indicates the strength of the correlation, and the numerical values represent the 129 correlation coefficients. This figure provides a clear visualization of the linear relationships, distribution 130 characteristics, and interdependencies among the variables, aiding in comprehensive data analysis. 131 The correlation matrix indicates that B has a low correlation with other variables, with a particularly 132 weak negative correlation with FA and /w b . FA shows a positive correlation with X (0.27), 133 suggesting that increased FA may enhance CD. Similarly, /w b has a positive correlation with X (0.23), 134 aligning with the notion that higher /w b reduce concrete density, thereby accelerating carbonation. t 135 exhibits the strongest correlation with X (0.46), indicating that CD increases over time. The diagonal 136 histograms display the distribution of each variable, with B and /w b distributions being more 137 concentrated, while FA shows a more dispersed distribution. 2CO concentration and RH exhibit 138 relatively uniform distributions, reflecting a wide range of controlled experimental conditions. Scatter plots 139 with regression lines in the lower triangle highlight linear relationships between variables, notably the 140 positive trends between FA and X , and between /w b and X . The strong positive relationship between 141 t and X is also evident. The correlations among all features are not very strong (construct a new decision tree that corrects the residuals from the previous iteration. Unlike random forests 205 which train trees in parallel, GBDT adopts a sequential training approach where each tree's training objective 206 is to minimize the error between the predictions of the previous tree and the actual values. 207 Due to its iterative optimization approach, GBDT has gained widespread popularity across various 208 application domains. To enhance its performance in practical engineering applications, researchers [79, 80] 209 have focused on optimizing key hyperparameters. These optimizations aim to improve model accuracy and 210 applicability. A schematic diagram illustrating the working mechanism of GBDT is depicted in Fig. 5. 211 G C G C GC G C Jo urn al Pre- pro of francisco.wong Resaltado 11 212 Fig. 5. Architecture of GBDT algorithm. 213 3.3 Light Gradient Boosting Machine (LGBM) 214 LGBM regression is a machine learning model based on decision tree algorithms. It predicts target 215 variables by using an ensemble of decision trees, with each tree being trained on the residuals of all previous 216 trees in a gradient boosting framework [81, 82]. its iterative process helps to progressively minimize the 217 prediction error. LGBM uses a leaf-wise growth strategy for constructing trees, which involves splitting the 218 tree nodes based on leaf nodes rather than depth, allowing it to handle large-scale data more efficiently [82, 219 83]. 220 LGBM is known for its fast-training speed and efficient memory usage, making it suitable for handling 221 large datasets and high-dimensional features. The model accelerates the tree-building process with a 222 histogram-based algorithm[82], significantly improving training speed, as illustrated in Fig. 6. Additionally, 223 LGBM implements a leaf-wise growth method with constraints on depth. This approach enables faster 224 identification of the optimal split points. 225 226 Fig. 6. Illustration of the histogram and leaf-wise growth strategies. 227 To significantly enhance training speed, LGBM employs several advanced techniques. One such technique 228 Jo urn al Pre- pro of 12 is gradient-based one-side sampling (GOSS), which selectively retains samples with larger gradients. This 229 approach ensures that the most informative samples are prioritized during training, thereby speeding up the 230 process. Furthermore, LGBM introduces exclusive feature bundling (EFB). This method consolidates 231 mutually exclusive features into a single bundle, effectively reducing the computing time. The detail of this 232 method is advisable to consult the original research by Ke et al [82]. 233 3.4 Categorical Boosting (CatBoost) 234 CatBoost is a powerful gradient boosting algorithm, it is particularly useful for regression tasks [84, 85]. 235 Fig. 7 illustrates the fundamental principles of CatBoost. Starting with a dataset comprising N samples and 236 M features, including categorical features, CatBoost transforms these categorical features into numerical 237 values through a technique known as "weight expansion". This transformation helps in maintaining the 238 predictive power of categorical features without the need for extensive preprocessing [84]. 239 During the training phase, CatBoost develops a series of decision trees, each designed to minimize the 240 residual errors from the preceding models. Illustrated in the figure, every subsequent predictor is trained 241 using the weighted dataset from the previous step. These weights are modified to emphasize the samples that 242 were previously mispredicted, facilitating a gradual correction of the model's errors. Ultimately, the final 243 regression output is derived by averaging the weighted predictions from all the individual trees, resulting in a 244 robust and precise model that accurately captures the data's intrinsic patterns. 245 246 Fig. 7. Explanation of the CatBoost regressor. 247 3.5 Development process of hybrid models 248 The core objective of this study is to develop hybrid ensemble models for predicting the CD in FAC. 249 Jo urn al Pre- pro of 13 The training efficiency of these hybrid ensemble models are significantly decided by their hyperparameters. 250 Given the proficiency of meta-heuristic algorithms in addressing optimization problems, this study employs 251 the SCSO algorithm to optimize the hyperparameters of GBDT, LGBM, and CatBoost models, thereby 252 enhancing their performance and achieving superior results. The framework of the study is shown in Fig. 8, 253 and the specific steps are detailed as follows: 254 (1) Dataset Preparation: A total of 883 measured datasets were acquired from engineering cases and 255 laboratory experiments. Utilizing a random seed, the dataset was divided into a training set (80%) and a test 256 set (20%). 257 (2) Population Initialization: During the application of the SCSO algorithm, a set of hyperparameters for 258 the GBDT, LGBM, and CatBoost models was randomly generated as the initial solution. Algorithm 259 parameters were also set, predominantly using randomly generated values within specified ranges. 260 (3) Fitness Calculation: The positions of all sand cats were updated, and the fitness values of different 261 individuals within the population were calculated. The individual with the lowest fitness value represented 262 the closest approximation to the optimal solution. In this context, the fitness function was based on the 263 RMSE values derived from fivefold cross validation using the training set. 264 (4) Iterative Updating: The positions of all sand cats were continuously updated, and at each new 265 position, a new GBDT, LGBM, or CatBoost model was constructed, followed by the computation of its 266 fitness value. If the new model exhibited superior performance compared to the previous one, it was adopted 267 as the current model. Otherwise, the preceding model was retained. This iterative process persisted until the 268 maximum number of iterations was reached or a termination criterion was satisfied. 269 (5) Optimal Hyperparameters Selection: The optimal hyperparameters were used to construct GBDT, 270 LGBM, and CatBoost models, which were subsequently tested using the test set. A comprehensive 271 evaluation method and Taylor diagrams were employed to thoroughly assess the models, culminating in the 272 selection of the most suitable model. 273 (6) SHAP Analysis: Perform a SHAP value analysis on the selected optimal model to evaluate the 274 influence of each feature on the model's output. This ensures the model's interpretability and reliability. 275 (7) Prediction Program Development: Create a simple application for non-algorithm engineers to input 276 feature values and obtain the predicted CD of FAC. This will make the model's predictions accessible and 277 easy to use. 278 In conclusion, this study presents an innovative approach to optimizing the hyperparameters of GBDT, 279 LGBM, and CatBoost models through the use of the SCSO algorithm. The findings substantiate the efficacy 280 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado Influencia de parametros en la prediccion output francisco.wong Resaltado 14 of this method in markedly enhancing model performance. 281 282 Jo urn al Pre- pro of 15 Fig. 8. Hybrid SCSO-Ensemble Models. 283 3.6 Performance Evaluation method 284 There were four indexes used to evaluate the performances of the all models, namely R2, MAE, RMSE, 285 VAF [86, 87]. 286 ( ) ( ) 2 2 1 1 1 2 22 2 1 1 1 1 R ( ( ) )( ( ) ) n n n i i i i i i i n n n n i i i i i i i i n CD C CD C n CD CD n C C = = = = = = = = − − ( - ) , (10) 287 where n is thetotal sample number, iCD represents the predicted CD value, iC represents the measured 288 CD value. 289 1 1 MAE n i i i CD C n = = − . (11) 290 ( ) 2 1 1 RMSE n i i i CD C n = = − . (12) 291 var( ) 1 100 var( ) i i i X CD VAF C − = − . (13) 292 Among the evaluation indicators, the closer the values of R2 and VAF are to 100, the better the model. The 293 closer the values of MAE and RMSE are to 0, the better the model is. 294 Selecting models solely through scoring and plotting methods can be subjective and ineffective, especially 295 when performance differences are minimal. This study introduces a comprehensive evaluation method for 296 model selection: the VIKOR method (VIsekriterijumska optimizacija i KOmpromisno Resenje), is proposed 297 by Professor Opricovic in 1998 [88]. The VIKOR method involves four main steps: determining weights, 298 calculating the ideal solution, computing group utility and individual regret values, and ranking the 299 compromise solution. 300 Step 1: Determining weights. In this study, the model evaluation metrics used are R², MAE, RMSE, and 301 VAF. For the VIKOR method, equal weights are assigned to each metric in the calculations. 302 Step 2: Calculating the ideal solution. The positive ideal solution (PIS) maximizes beneficial metrics and 303 minimizes detrimental metrics, while the negative ideal solution (NIS) minimizes beneficial metrics and 304 maximizes detrimental metrics. The formulas for calculating the ideal solutions are as follows: 305 1jp+ = , (14) 306 0jp− = , (15) 307 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 16 where jp+ is the PIS for the thj evaluation index, jp− is the NIS for the thj evaluation index. 308 Step 3: Computing group utility value (GU) and individual regret values (IR). GU measure 309 decision-makers' subjective preferences and biases towards gains and losses, while IR reflect the loss from 310 incorrect judgments by comparing other values to the highest value. The VIKOR method traditionally uses 311 only PIS for these calculations, overlooking the unique information in NIS. This study calculates group 312 utility and individual regret values using both PIS and NIS. The formulas are as follows: 313 Using the PIS as a Reference: 314 1 n j ij i j j j j p r S w p p + + + − = − = − , (16) 315 j ij i j j j j p r R Max w p p + + + − − = − , (17) 316 Using the NIS as a Reference: 317 1 n ij j i j j j j r p S w p p − − + − = − = − , (18) 318 ij j i j j j j r p R Min w p p − − + − − = − , (19) 319 The formulas for calculating the GU and IR of a solution are as follows: 320 I i I S S S + − = , (20) 321 I i I R R R + − = , (21) 322 where iS + and iR + are the GU and IR for the thi solution using the PIS as a reference, respectively. 323 Similarly, iS − and iR − are the GU and IR for the thi solution using the NIS as a reference. iS and iR324 represent the overall GU and RU for the thi solution, respectively. 325 Step 4: Calculating the compromise solution for ranking. 326 The calculation of the compromise solution combines GU and IR. The decision-making coefficient v in 327 the VIKOR method allows for the maximization of GU and the minimization of IR. A larger v indicates a 328 greater emphasis on maximizing GU and less concern for IR. The compromise solution results are ranked in 329 descending order, with lower values indicating better outcomes. 330 min min max min max min (1 )i i i i i i i i i S S R R Q v v S S R R − − = + − − − , (22) 331 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado 17 where iQ is the compromise solution, and v is the decision-making coefficient. 332 4. Results and discussion 333 4.1 Parameter settings 334 The performance of models is influenced by various parameter settings. Meta-heuristic optimization 335 algorithms can directly obtain the optimal hyperparameter combinations for models, offering time-saving 336 and convenient advantages. Therefore, this study employs the SCSO to optimize the hyperparameters of 337 ensemble models. To determine the parameters for SCSO, we utilized a trial-and-error approach with 338 multiple training iterations. Specifically, we trained each base ensemble model with nine different population 339 sizes of 10, 20, 30, 40, 50, 75, 100, 150, and 200, respectively. Each model was trained for 500 iterations, 340 using the average RMSE value from fivefold cross validation on the training set as the fitness function. The 341 nine hybrid ensemble models trained for each base model are referred to as a group, resulting in a total of 342 three groups of hybrid ensemble models. Ultimately, the convergence curves of the different hybrid ensemble 343 models were calculated and are illustrated in Fig. 9. Fig. 9(a) shows the convergence curves of the nine 344 SCSO-GBDT models, Fig. 9(b) shows the convergence curves of the nine SCSO-LGBM models, and Fig. 345 9(c) shows the convergence curves of the nine SCSO-CatBoost models. It is clear from the figures that the 346 convergence curves of the SCSO-GBDT models are relatively slow, stabilizing only after approximately 250 347 iterations, with the optimal fitness values varying across different population sizes. The SCSO-LGBM 348 models converge relatively faster, stabilizing around 150 iterations, although there are differences in optimal 349 fitness values across population sizes. The SCSO-CatBoost models exhibit the best performance, with all 350 models rapidly converging and stabilizing around smaller fitness values after approximately 150 iterations. 351 Through comparative analysis of the convergence curves, it is evident that the SCSO-CatBoost model 352 performs the best, achieving rapid and stable convergence to the optimal fitness value, followed by the 353 SCSO-LGBM model, with the SCSO-GBDT model performing slightly worse. Therefore, in practical 354 applications, the SCSO-CatBoost model can be prioritized for hyperparameter optimization. 355 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 18 (a) (b) (a): SCSO-GBDTmodels (b): SCSO-LGBM models (c): SCSO-CatBoost models (c) Fig. 9. Optimization of hybrid ensemble models for different population sizes. 356 Relying solely on the fitness curve to identify the optimal model within each group may be overly 357 simplistic. Consequently, we meticulously recorded the results of each computation, which are detailed in 358 Table 2-4. Table 2 presents the outcomes for nine SCSO-GBDT models with varying population sizes, 359 Table 3 details the results for nine SCSO-LGBM models, and Table 4 illustrates the results for nine 360 SCSO-CatBoost models, each with different population sizes. 361 To determine the optimal ensemble model within each group, we developed a comprehensive scoring 362 system. For each performance index, the model with the best value was awarded 9 points, the second-best 363 received 8 points, and models with equivalent values were given the same score. This ranking was applied 364 across four evaluation indexes. Notably, we scored the results of the training and test sets separately for each 365 model group, and the total score for each model was the sum of these scores. 366 0 100 200 300 400 500 3.30 3.35 3.40 3.45 3.50 3.553.60 3.65 3.70 F it n es s v al u e Iteration Population:10 Population:20 Population:30 Population:40 Population:50 Population:75 Population:100 Population:150 Population:200 0 100 200 300 400 500 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 F it n es s v al u e Iteration Population:10 Population:20 Population:30 Population:40 Population:50 Population:75 Population:100 Population:150 Population:200 0 100 200 300 400 500 2.95 3.00 3.05 3.10 3.15 3.20 3.25 F it n es s v al u e Iteration Population:10 Population:20 Population:30 Population:40 Population:50 Population:75 Population:100 Population:150 Population:200 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado 19 Applying this methodology, we found that the highest total scores were 51 for the SCSO-GBDT model, 66 367 for the SCSO-LGBM model, and 66 for the SCSO-CatBoost model, thereby identifying the optimal model 368 within each group. Based on these findings, we set the population size to 20 for the SCSO-GBDT model, 50 369 for the SCSO-LGBM model, and 30 for the SCSO-CatBoost model. These configurations will serve as the 370 foundation for our subsequent research endeavors. 371 Table 2 Performance comparison of SCSO-GBDT models under different population sizes. 372 Population R2 Rank MAE Rank RMSE Rank VAF Rank Total Training 10 0.9873 7 0.5488 9 1.8028 5 98.7312 6 27 20 0.9877 9 0.5543 6 1.7536 7 98.7657 8 30 30 0.9865 6 0.6966 4 1.9248 4 98.6459 5 19 40 0.9825 3 0.9417 1 2.4884 1 98.2532 2 7 50 0.9860 5 0.7288 3 1.9856 3 98.6039 4 15 75 0.9875 8 0.5797 5 1.7791 6 98.7480 7 26 100 0.9841 4 0.8626 2 2.2560 2 98.4145 3 11 150 0.9877 9 0.5513 8 1.7420 9 98.7740 9 35 200 0.9877 9 0.5520 7 1.7427 8 98.7735 9 33 Testing 10 0.9543 4 2.2077 4 10.9529 4 95.4567 1 13 20 0.9570 5 2.1102 6 10.2912 5 95.7484 5 21 30 0.9625 7 2.0852 7 8.9758 7 96.2787 8 29 40 0.9661 9 2.0277 9 8.1304 9 96.6227 9 36 50 0.9604 6 2.1162 5 9.4872 6 96.0695 6 23 75 0.9541 2 2.2362 2 10.9909 1 95.4608 2 7 100 0.9626 8 2.0504 8 8.9664 8 96.2749 7 31 150 0.9542 3 2.2360 3 10.9804 2 95.4698 3 11 200 0.9542 3 2.2386 1 10.9726 3 95.4713 4 11 373 Table 3 Performance comparison of SCSO-LGBM models under different population sizes. 374 Population R2 Rank MAE Rank RMSE Rank VAF Rank Total Training 10 0.9833 4 0.9163 4 2.3674 3 98.3336 3 14 20 0.9845 7 0.8512 6 2.2000 6 98.4515 6 25 30 0.9828 2 0.9468 2 2.4458 1 98.2784 1 6 40 0.9829 3 0.9526 1 2.4345 2 98.2864 2 8 50 0.9856 9 0.7838 9 2.0450 9 98.5605 9 36 75 0.9837 6 0.9016 5 2.3165 5 98.3694 5 21 100 0.9834 5 0.9234 3 2.3540 4 98.3431 4 16 150 0.9847 8 0.8415 8 2.1709 8 98.4719 8 32 200 0.9845 7 0.8506 7 2.1996 7 98.4518 7 28 Testing 10 0.9615 3 2.0905 8 9.2355 3 96.1562 3 17 20 0.9626 5 2.1162 5 8.9557 5 96.2632 4 19 30 0.9613 2 2.0896 9 9.2618 2 96.1342 2 15 40 0.9585 1 2.2342 1 9.9466 1 95.8607 1 4 50 0.9636 8 2.1083 7 8.7294 8 96.3674 7 30 75 0.9640 9 2.1322 4 8.6350 9 96.4223 9 31 100 0.9634 7 2.1362 3 8.7583 7 96.3992 8 25 150 0.9632 6 2.1152 6 8.8090 6 96.3323 6 24 200 0.9621 4 2.1712 2 9.0711 4 96.2828 5 15 375 376 Jo urn al Pre- pro of 20 Table 4 Performance comparison of SCSO-CatBoost models under different population sizes. 377 Population R2 Rank MAE Rank RMSE Rank VAF Rank Total Training 10 0.9873 9 0.6600 8 1.8084 8 98.7271 8 33 20 0.9872 8 0.6676 5 1.8115 7 98.7249 7 27 30 0.9873 9 0.6593 9 1.8079 9 98.7275 9 36 40 0.9872 8 0.6616 7 1.8115 7 98.7249 7 29 50 0.9873 9 0.6600 8 1.8084 8 98.7271 8 33 75 0.9872 8 0.6616 7 1.8115 7 98.7249 7 29 100 0.9872 8 0.6625 6 1.8130 6 98.7239 6 26 150 0.9873 9 0.6600 8 1.8084 8 98.7271 8 33 200 0.9873 9 0.6600 8 1.8084 8 98.7271 8 33 Testing 10 0.9657 8 2.0045 9 8.2087 8 96.6205 8 33 20 0.9659 9 2.0205 5 8.1772 9 96.6365 9 32 30 0.9657 8 2.0062 8 8.2147 7 96.6177 7 30 40 0.9657 8 2.0085 6 8.2207 4 96.6161 6 24 50 0.9657 8 2.0045 9 8.2087 8 96.6205 8 33 75 0.9657 8 2.0085 6 8.2200 5 96.1640 5 24 100 0.9657 8 2.0063 7 8.2156 6 96.6177 7 28 150 0.9657 8 2.0045 9 8.2087 8 96.6205 8 33 200 0.9657 8 2.0045 9 8.2087 8 96.6205 8 33 378 The hyperparameters for the SCSO-GBDT, SCSO-LightGBM, and SCSO-CatBoost models were settled. 379 Notably, the hyperparameters of each model are tailored to optimize performance. To demonstrate the 380 effectiveness of the proposed model, we compared it against several established regression models. The 381 hyperparameters for these conventional models were fine-tuned using standard approaches, such as trial and 382 error and analysis of learning curves. Detailed parameter configurations for these models are listed in Table 383 5. 384 Table 5 Hyperparameter settings for different models. 385 Models hyprparameters value SCSO-GBDT n_estimators 453 learning_rate 0.0959 max_depth 99 min_samples_split 59 population size 20 Iterations 500 SCSO-LightGBM n_estimators 360 learning_rate 0.4604 max_depth 61 min_samples_split 11 population size 50 Iterations 500 SCSO-CatBoost learning_rate 0.2605 max_depth 4 population size 30 Iterations 500 BPNN Number of input layer nodes 6 Number of hidden layer nodes 9 Number of output layer nodes 1 learning_rate 0.01 SVR c 60 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 21 g 0.2783 RF n_estimators 100 max_depth 7 min_samples_split 5 KNN n_neighbors 3 386 4.2 Analysis the performance of models 387 Based on the aforementioned study, all hybrid ensemble models and classical models have been 388 parameterized and trained according to their specific configurations. Subsequently, we utilized a randomly 389 partitioned test set to evaluate the performance of all models, with a primary focus on comparing the 390 performance of the three hybrid ensemble models. The performance of these three hybrid ensemble models 391 on the test set is illustrated in Fig. 10. In the figure, the performance on both the training and test sets for 392 each of the three models is visualized. Each subplot is annotated with the model label in the lower right 393 corner and the evaluation index values for either the training or test set in the upper left corner. Additionally, 394 a color bar is included to indicate the density of the points, with denser areas appearing in purple and sparser 395 areas in red. The figure clearly shows that most points are concentrated along the line y=x, falling between 396 y=1.1x and y=0.9x, indicating minimal differences between the actual and predicted CDs. This demonstrates 397 that the proposed three hybrid ensemble models exhibit robust performance and are suitable for predicting 398 the CD of FAC. 399 Jo urn al Pre- pro of 22 400 Fig. 10. Predictive performance of hybrid ensemble models. 401 To provide a more convincing comparison and highlight the necessity of the models developed in this 402 study, we introduced several classical models for comparison: Random Forest (RF), Linear Regression (LR), 403 k-Nearest Neighbors (KNN), Support Vector Regression (SVR), and Back Propagation Neural Network 404 (BPNN). These classical models have been widely applied across various fields, and their performance is 405 (a) (b) (c) (d) (e) (f) Jo urn al Pre- pro of francisco.wong Resaltado 23 well-validated. The computational results for all models are presented in Table 6. The results indicate that 406 the three hybrid ensemble models exhibit significantly superior predictive performance. Among the classical 407 models, RF also shows relatively good performance metrics, demonstrating the advantage of EL over 408 traditional ML models.409 Table 6. Performances of prediction models. 410 Model R2 MAE RMSE VAF Training SCSO-GBDT 0.9877 0.5543 1.7536 98.7657 SCSO-LGBM 0.9856 0.7838 2.0450 98.5605 SCSO-CatBoost 0.9873 0.6593 1.8079 98.7275 RF 0.9116 2.6052 12.5597 91.1593 LR 0.4577 6.4206 77.0499 45.7651 KNN 0.8465 2.9762 21.8105 84.6538 SVR 0.7934 3.6857 31.1248 78.5924 BPNN 0.8026 3.5107 28.0420 80.2619 Testing SCSO-GBDT 0.9570 2.1102 10.2912 95.7484 SCSO-LGBM 0.9636 2.1083 8.7294 96.3674 SCSO-CatBoost 0.9657 2.0062 8.2147 96.6177 RF 0.8657 3.8197 32.1809 86.6868 LR 0.4645 8.1121 128.2978 47.2912 KNN 0.8229 4.4205 42.4321 82.4224 SVR 0.7566 4.7833 58.3084 75.8353 BPNN 0.7698 4.5613 55.1392 77.2669 411 The Taylor diagram presents the performance indexes of various models, including classical models (SVR, 412 LR, BPNN, KNN, and RF) and the proposed hybrid ensemble models (SCSO-GBDT, SCSO-LGBM, and 413 SCSO-CatBoost), the details are shown in Fig. 11. The diagram compares these models based on their R2, 414 standard deviation, and root mean square deviation (RMSD). Among the classical models, RF demonstrates a 415 higher correlation coefficient and lower RMSD, indicating superior performance relative to other classical 416 models. The hybrid ensemble models, shown on the right side of the diagram, exhibit even higher correlation 417 coefficients and lower RMSD values, clustering closely near the reference point. This indicates that the 418 hybrid ensemble models significantly outperform the classical models, highlighting their robust predictive 419 performance. The color gradient from purple to yellow represents increasing RMSD values, with models 420 closer to the reference point exhibiting better performance. This diagram effectively demonstrates the 421 superiority of the hybrid ensemble models developed in this study. 422 Jo urn al Pre- pro of francisco.wong Resaltado 24 423 Fig. 11. Taylor diagrams of hybrid ensembles and classical models. 424 The performance of different models was comprehensively evaluated using the VIKOR method. From the 425 perspective of GU, the SCSO-CatBoost algorithm exhibited the lowest GU, indicating the best overall 426 performance across multiple evaluation criteria. In contrast, LR had the highest GU, suggesting relatively 427 poor comprehensive performance. IR reflect the performance loss of each algorithm under a single criterion. 428 The SCSO-CatBoost algorithm had the lowest IR, while RF and KNN had relatively higher values. The 429 compromise solution (Q), which integrates both GU and IR, also demonstrated that the SCSO-CatBoost 430 algorithm performed the best, whereas LR performed the worst. 431 In the final ranking, the SCSO-CatBoost algorithm achieved the highest position due to its excellent GU 432 and low IR, highlighting its significant advantages when multiple evaluation criteria are considered. 433 Conversely, the LR algorithm ranked the lowest due to poor performance across all evaluation criteria. Other 434 algorithms, such as SCSO-GBDT, SCSO-LGBM, RF, KNN, SVR, and BPNN, received intermediate 435 rankings based on their performance across different criteria. The results of VIKOR comprehensive 436 evaluation is shown in Table 7. 437 Table 7 Results of VIKOR comprehensive evaluation. 438 Model GU IR Q Rank SCSO-CatBoost 0.0025 0.0022 0.0001 1 SCSO-GBDT 0.0087 0.0022 0.0031 2 SCSO-LGBM 0.0102 0.0049 0.0148 3 RF 0.1951 0.0334 0.2236 4 KNN 0.3081 0.0516 0.3543 5 BPNN 0.3930 0.0630 0.4433 6 SVR 0.4227 0.0667 0.4733 7 LR 1.0000 0.1250 1.0000 8 SVR LR BPNN RF KNN Reference SCSO GBDT SCSO LGBM SCSO Catboost Jo urn al Pre- pro of francisco.wong Resaltado 25 4.3 Shapley Additive explanations (SHAP) 439 SHAP is crucial for understanding the contribution and interaction of each feature in a model, providing 440 insights that enhance interpretability and improve feature selection [89]. The SHAP summary plot is shown 441 in Fig. 12. The figure indicates that t is the most influential feature on the model output, followed by FA , 442 2CO , B , /w b , and RH . Higher values of t and 2CO generally have a positive impact on the model, 443 while lower values have a negative impact. The features FA and /w b show mixed impacts, with both 444 high and low values influencing the model output in various ways. It is worth noting that while all features in 445 the model influence carbonation, t exhibits a clear positive correlation with carbonation depth, making it a 446 key predictive feature in the model. However, other variables, such as B and FA , also play important roles 447 in the carbonation process. Their effects, being more complex and nonlinear, result in a less direct impact on 448 the model’s predictions. This complexity does not diminish their significance but rather highlights the 449 intricate interactions between these factors and carbonation depth. The plot further emphasizes that 450 understanding the specific influence of each variable is crucial for interpreting model outputs and optimizing 451 feature selection. 452 453 Fig. 12. The SHAP summary plot of SCSO-CatBoost model. 454 The parallel coordinates plot of SCSO-CatBoost model is shown in Fig. 13. The figure illustrates the 455 influence of different features on the model output. The features, listed by their influence from top to bottom, 456 include t , FA , 2CO , B , /w b , and RH . The x-axis represents the model output value ranging from -40 457 to 60, while the color bar indicates feature values, with blue for lower values and red for higher values. 458 Higher t generally lead to higher model outputs, while FA and /w b show varied impacts without a 459 clear trend. 2CO follows a pattern where lower values result in lower outputs and higher values in higher 460 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado 26 outputs. Both B and RH exhibit mixed effects, with lower values pushing outputs lower and higher 461 values pushing them higher. This visualization helps to understand the complex interactions and relative 462 importance of each feature in determining the model's predictions. 463 464 Fig. 13. The SHAP parallel coordinates plot of SCSO-CatBoost model. 465 The SHAP interaction plot illustrates how pairs of features interact to influence the model output, the 466 detail is shown in Fig. 14. Each subplot represents the interaction between two features, with SHAP 467 interaction values ranging from -10 to 10 on the x-axis, and color-coding indicating feature values. Notable 468 observations include significant self-interactions for t , 2CO , B , and /w b . t shows strong interactions 469 with 2CO and RH , while FA interacts notably with 2CO . 2CO has strong interactions with FA and 470 RH , B interacts substantially with /w b , and RH interacts significantly with 2CO . These interactions 471 highlight the importance of considering feature pairs in model interpretation and feature selection. 472 473 Jo urn al Pre- pro of 27 474 Fig. 14. The SHAP interaction plot of SCSO-CatBoost model. 475 4.4 Prediction program development and application 476 FAC is an eco-friendly material that is widely used due to its economic efficiency and environmental 477 benefits. However, the durability issues of concrete, particularly CD, significantly impact the service life of 478 structures. To provide a simple and accurate tool for non-algorithm engineers, we have developed a 479 convenient application with a GUI design based on three models from our research: SCSO-GBDT, 480 SCSO-LGBM, and SCSO-CatBoost. Through this application, users can choose any of these three ensemble 481 models for prediction. These efficient gradient boosting algorithms capture complex nonlinear relationships 482 and provide high-precision prediction results.The development of this tool not only enhances work 483 efficiency but also helps in optimizing concrete mix proportions, improving construction quality and 484 durability, while promoting the recycling of industrial waste such as FA, thus holding significant importance. 485 The GUI interface of the application is shown in Fig. 15. 486 Jo urn al Pre- pro of 28 487 Fig. 15. The GUI interface of the application. 488 4.5 Comparison with related studies 489 In additional, this study compared the proposed models with several advanced previous studies. We 490 selected literature with higher data scales and feature correlations for comparison. The results are shown in 491 Table 8, which lists relevant literature from the past six years. Some studies utilized data included in this 492 study's database, clearly demonstrating that the performance of the three hybrid ensemble models proposed 493 in this study improved even with the expansion of data scale. In Kumar et al.'s latest study in 2024 [90], the 494 ensemble learning models were used, yielding significant results, highlighting the trend of applying 495 ensemble learning in this field. However, their study used more features and fewer datasets, underscoring the 496 advantages of this study. 497 Table 8 Comparison with related studies. 498 Year Reference Models Number of inputs Dataset size Performance 2019 Kellouche et al.[75] ANN model 6 300 R2=0.9468 2021 Uwanuakwa et al.[52] RNN 18 534 R2=0.9400 2021 Liu et al.[57] ANN-PSO model 9 593 R2=0.9380 2021 Felix et al.[74] ANN model 6 272 R2=0.9460 2022 Biswas et al. [58] CSO-SVR model 6 300 R2=0.9593 PSO-SVR model 6 300 R2=0.9575 SOA-SVR model 6 300 R2=0.9592 Jo urn al Pre- pro of francisco.wong Resaltado 29 GWO-SVR model 6 300 R2=0.9585 2023 Kumar et al. [91] 1D-CNN-LSTM model 3 265 R2=0.8000 2023 Qin et al. [92] Stepwise Regression model 4 433 R2=0.9025 2024 Kumar et al. [90] Adaboost 9 766 R2=0.9700 XGBoost 9 766 R2=0.9100 RF 9 766 R2=0.9200 2024 This study SCSO-GBDT model 6 883 R2=0.9570 SCSO-LGBM model 6 883 R2=0.9636 SCSO-CatBoost model 6 883 R2=0.9657 499 4.6 Research significance and limitations 500 This study has developed several explainable hybrid ensemble models for predicting carbonation depth in 501 fly ash concrete (FAC), offering significant economic benefits while advancing the understanding of 502 carbonation processes. By enhancing prediction accuracy, these models not only facilitate cost-effective 503 maintenance planning and reduce unnecessary repairs but also improve the assessment of long-term 504 structural durability. This supports better decision-making in infrastructure design and maintenance. 505 Additionally, the models enable engineers and researchers to evaluate the effects of varying environmental 506 conditions, material compositions, and exposure durations on carbonation progression, leading to more 507 accurate risk assessments. Ultimately, this research contributes to the sustainability and resilience of concrete 508 structures, ensuring their safety and functionality over extended lifecycles while delivering substantial 509 economic savings across large-scale construction and maintenance projects. 510 While the study has yielded promising results, the models developed are currently best suited for 511 predicting the carbonation depth of FAC. Future research should focus on expanding and refining the 512 database by incorporating carbonation cases from various types of concrete, different geographical locations, 513 and diverse environmental conditions. Numerical simulations could offer a viable method to augment the 514 dataset, thereby improving the models' generalization and applicability. Moreover, further exploration of 515 methods combining different features and examining various data partitioning ratios are areas that have yet 516 to be fully explored in this study. 517 Overall, the three hybrid ensemble models proposed in this study demonstrate strong performance and are 518 well-suited for predicting the carbonation depth of FAC. This provides a valuable tool for assessing the 519 durability of engineering structures and contributes to more efficient lifecycle management. 520 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 30 5. Conclusions 521 The phenomenon of carbonation is one of the key factors in the deterioration of concrete structures, 522 significantly impacting their durability and lifespan. Although FA, an industrial byproduct, is widely used in 523 engineering projects due to its environmental and economic benefits, it also affects the carbonation process 524 of concrete. Therefore, it is crucial to develop a model that can accurately predict the CD of FA containing 525 concrete. Such a model would not only provide a scientific basis for the safety assessment of actual 526 engineering projects but also assist engineers and decision-makers in understanding the long-term 527 performance changes in concrete. Furthermore, it would offer valuable guidance for determining the optimal 528 timing for repair and reinforcement, ensuring that concrete structures maintain a high level of safety and 529 stability throughout their lifecycle. 530 (1) This study began by compiling and organizing data from extensive literature on indoor accelerated 531 carbonation tests and natural carbonation cases, resulting in the creation of a reasonable, stable, and reliable 532 database. This comprehensive database includes 883 cases, making it the most extensive carbonation 533 database for FAC in current research. 534 (2) Three hybrid ensemble models were developed: SCSO-GBDT, SCSO-LGBM, and SCSO-CatBoost. 535 The performance of these models was thoroughly evaluated using four metrics: R², MAE, RMSE, and VAF. 536 The evaluation results demonstrated the superiority of these hybrid ensemble models in this field. Five 537 classical models (RF, BPNN, KNN, LR, and SVR) were introduced for comparison. The proposed hybrid 538 ensemble models exhibited significant advantages over these classical models. Additionally, the RF model 539 highlighted the superiority of ensemble learning over traditional machine learning models. 540 (3) This study developed an innovative model preferred method, combining VIKOR with Taylor 541 diagrams to select the optimal model from multiple candidates. The method objectively addresses the 542 challenge of minimal differences in performance indexes when models exhibit similar performance. As the 543 field of artificial intelligence advances and the number of models increases, the method proposed in this 544 study offers an effective means for model selection. Using this method, it was determined that 545 SCSO-CatBoost is the most suitable model for predicting the CD of FAC. The next best models, in order, are 546 SCSO-GBDT, SCSO-LGBM, RF, KNN, BPNN, SVR, and LR. 547 (4) A SHAP analysis was conducted on the best-selected model, SCSO-CatBoost, revealing that t is 548 the key factor influencing carbonation, followed by FA and 2CO . This insight provides valuable 549 recommendations for future feature selection. To facilitate the application of these research findings by 550 Jo urn al Pre- pro of francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado francisco.wong Resaltado 31 non-algorithm engineers, a program with a user-friendly GUI was developed. This allows users to utilize the 551 three developed hybrid ensemble models. By combining the results of these models, it is possible to estimate 552 the CD of FAC, enabling early reinforcement measures and ensuring the sustainable application of buildings.553 In the future, we will focus on significantly expanding our high-quality database, with a particular 554 emphasis on natural carbonation cases, and aim to apply these findings to a wider range of concrete types 555 and scales. Additionally, we plan to explore the use of numerical simulation techniques for building the 556 database, as a large-scale, high-quality dataset will be essential for strengthening the application of deep 557 learning methods. We also intend to further develop the GUI, adding more functionalities to enable 558 predictive modeling for different types of concrete. Lastly, we will compare various intelligent optimization 559 algorithms and data partitioning strategies to further enhance the model’s accuracy.560 Jo urn al Pre- pro of francisco.wong Resaltado 32 References 561 [1] K. Zhang, K. Zhang, R. Bao, X. Liu, A framework for predicting the carbonation depth of concrete 562 incorporating fly ash based on a least squares support vector machine and metaheuristic algorithms, Journal of 563 Building Engineering, 65 (2023). 564 [2] B. Bennett, P. Visintin, T. Xie, Global warming potential of recycled aggregate concrete with 565 supplementary cementitious materials, Journal of Building Engineering, 52 (2022). 566 [3] F. Aslani, M. Dehestani, Probabilistic impacts of corrosion on structural failure and performance limits 567 of reinforced concrete beams, Construction and Building Materials, 265 (2020) 120316. 568 [4] S. Hussain, D. Bhunia, S.B. Singh, Comparative study of accelerated carbonation of plain cement and 569 fly-ash concrete, Journal of Building Engineering, 10 (2017) 26-31. 570 [5] D.E.A. Ramirez, G.R. Meira, M. Quattrone, V.M. John, A review on reinforcement corrosion 571 propagation in carbonated concrete–Influence of material and environmental characteristics, Cem. Concr. 572 Compos., (2023) 105085. 573 [6] Y. Pu, L. Li, Q. Wang, X. Shi, C. Luan, G. Zhang, L. Fu, A.E.-F. Abomohra, Accelerated carbonation 574 technology for enhanced treatment of recycled concrete aggregates: A state-of-the-art review, Construction 575 and Building Materials, 282 (2021) 122671. 576 [7] L. Peng, P. Shen, C.-S. Poon, Y. Zhao, F. Wang, Development of carbon capture coating to improve the 577 durability of concrete structures, Cem. Concr. Res., 168 (2023) 107154. 578 [8] T.L.P. Ortolan, P.M. Borges, L. Silvestro, S.R. da Silva, E. Possan, J.J. de Oliveira Andrade, Durability of 579 concrete incorporating recycled coarse aggregates: carbonation and service life prediction under 580 chloride-induced corrosion, Construction and Building Materials, 404 (2023) 133267. 581 [9] R. Rumman, M.R. Kamal, T. Manzur, M.A. Noor, Optimum proportion of fly ash or slag for resisting 582 concrete deterioration due to carbonation and chloride ingress, Structures, 41 (2022) 287-305. 583 [10] Y. Liu, J. Shi, Recent progress and challenges of using smart corrosion inhibitors in reinforced concrete 584 structures, Construction and Building Materials, 411 (2024) 134595. 585 [11] H.-J. Ho, A. Iizuka, E. Shibata, H. Tomita, K. Takano, T. Endo, Utilization of CO2 in direct aqueous 586 carbonation of concrete fines generated from aggregate recycling: Influences of the solid–liquid ratio and CO2 587 concentration, Journal of Cleaner Production, 312 (2021) 127832. 588 [12] C. Chen, G. Habert, Y. Bouzidi, A. Jullien, Environmental impact of cement production: detail of the 589 different processes and cement plant variability evaluation, Journal of cleaner production, 18 (2010) 478-485. 590 [13] J. Khunthongkeaw, S. Tangtermsirikul, T. Leelawat, A study on carbonation depth prediction for fly ash 591 concrete, Construction and building materials, 20 (2006) 744-753. 592 [14] X.-Y. Wang, Design of low-cost and low-CO2 air-entrained fly ash-blended concrete considering 593 carbonation and frost durability, Journal of Cleaner Production, 272 (2020) 122675. 594 [15] A.F. Ahmed, Experimental investigation of waste rubber admixtures in concrete, Mesopotamian 595 Journal of Civil Engineering, 2023 (2023) 1-9. 596 [16] L. Jiang, B. Lin, Y. Cai, A model for predicting carbonation of high-volume fly ash concrete, Cem. 597 Concr. Res., 30 (2000) 699-702. 598 [17] T. Chen, M. Bai, X. Gao, Carbonation curing of cement mortars incorporating carbonated fly ash for 599 performance improvement and CO2 sequestration, Journal of CO2 Utilization, 51 (2021) 101633. 600 [18] J.-s. Zhang, M. Cheng, J.-h. Zhu, Carbonation Depth model and prediction of hybrid fiber fly ash 601 concrete, Advances in Civil Engineering, 2020 (2020) 9863963. 602 [19] V.Q. Tran, H.-V.T. Mai, Q.T. To, M.H. Nguyen, Machine learning approach in investigating 603 carbonation depth of concrete containing fly ash, Structural Concrete, 24 (2023) 2145-2169. 604 [20] N. Bouzoubaâ, A. Bilodeau, B. Tamtsia, S. Foo, Carbonation of fly ash concrete: laboratory and field 605 data, CaJCE, 37 (2010) 1535-1549. 606 Jo urn al Pre- pro of 33 [21] K. Sisomphon, L. Franke, Carbonation rates of concretes containing high volume of pozzolanic 607 materials, Cem. Concr. Res., 37 (2007) 1647-1653. 608 [22] P. Sulapha, S. Wong, T. Wee, S. Swaddiwudhipong, Carbonation of concrete containing mineral 609 admixtures, J. Mater. Civ. Eng., 15 (2003) 134-143. 610 [23] D. Ho, R. Lewis, Carbonation of concrete and its prediction, Cem. Concr. Res., 17 (1987) 489-504. 611 [24] C.D. Atiş, Accelerated carbonation and testing of concrete made with fly ash, Construction and 612 Building Materials, 17 (2003) 147-152. 613 [25] J. Khunthongkeaw, S. Tangtermsirikul, Model for simulating carbonation of fly ash concrete, J. Mater. 614 Civ. Eng., 17 (2005) 570-578. 615 [26] S. Ekolu, Implications of global CO2 emissions on natural carbonation and service lifespan of concrete 616 infrastructures–reliability analysis, Cem. Concr. Compos., 114 (2020) 103744. 617 [27] L. Chen, R.K.L. Su, Service life modelling of carbonated reinforced concrete with supplementary 618 cementitious materials considering early corrosion propagation, Construction and Building Materials, 413 619 (2024) 134861. 620 [28] G. Chen, Y. Lv, Y. Zhang, M. Yang, Carbonation depth predictions in concrete structures under 621 changing climate condition in China, Eng. Failure Anal., 119 (2021) 104990. 622 [29] W.Z. Taffese, E. Sistonen, J. Puttonen, CaPrM: Carbonation prediction model for reinforced concrete 623 using machine learning methods, Construction and Building Materials, 100 (2015) 70-82. 624 [30] P. Liu, Z. Yu, Y. Chen, Carbonation depth model and carbonated acceleration rate of concrete under 625 different environment, Cem. Concr. Compos., 114 (2020) 103736. 626 [31] P. Woyciechowski, P. Woliński, G. Adamczewski, Prediction of Carbonation Progress in Concrete 627 Containing Calcareous Fly Ash Co-Binder, Materials, 12 (2019) 2665. 628 [32] V. Carević, I. Ignjatović, J. Dragaš, Model for practical carbonation depth prediction for high volume 629 fly ash concrete and recycled aggregate concrete, Construction and Building Materials, 213 (2019) 194-208. 630 [33] K. Zhang, J. Xiao, Prediction model of carbonation depth for recycled aggregate concrete, Cem. Concr. 631 Compos., 88 (2018) 86-99. 632 [34] S.C. Paul, B. Panda, Y. Huang, A. Garg, X. Peng, An empirical model design for evaluation and 633 estimation of carbonation depth in concrete, Measurement, 124 (2018) 205-210. 634 [35] L. Czarnecki, P. Woyciechowski, Concrete carbonation as a limited process and its relevance to 635 concrete cover thickness, ACI Mater. J., 109 (2012) 275. 636 [36] A. Silva, R. Neves, J. de Brito, Statistical modelling of carbonation in reinforced concrete, Cem. Concr. 637 Compos., 50 (2014) 73-81. 638 [37] I. Monteiro, F.A. Branco, J.d. Brito, R. Neves, Statistical analysis of the carbonation coefficient in open 639 air concrete structures, Construction and Building