A Method to improve the performance of support vector machine regression model for predicting demolition waste generation using categorical principal components analysis

Giwook Cha; Hyeunjun Moon; Jinho Kim

doi:10.22712/susb.20210023

Preview

General Article

International Journal of Sustainable Building Technology and Urban Development. 30 September 2021. 282-294
https://doi.org/10.22712/susb.20210023

A Method to improve the performance of support vector machine regression model for predicting demolition waste generation using categorical principal components analysis

Giwook Cha¹

Hyeunjun Moon²

Jinho Kim³^*

¹Research Professor, Department of Architectural Engineering, Dankook University, Yongin, South Korea

²Professor, Department of Architectural Engineering, Dankook University, Yongin, South Korea

³Associate Professor, Division of Architecture and Urban Design, Incheon National University, 119 Academy-ro, Yeonsu-gu, Incheon, South Korea

^{*Corresponding Author}

ABSTRACT

To develop machine learning (ML)-based waste generation predictive models, many researchers are applying various artificial intelligence algorithms while researching predictive performance improvement according to the data characteristics. In this study, we worked on a hybrid demolition waster generation (DWG) predictive model to improve the performance of a small dataset comprising categorical variables. For this, we constructed a DWG dataset from 690 buildings. We adopted a support vector machine regression (SVMR) model to develop a DWG predictive model and applied the categorical principal components analysis (CATPCA) technique for input variable processing along with data pre-processing to improve predictive performance. Furthermore, to improve the predictive performance of the predicate model, we optimized the hyper-parameters for the SVMR algorithm before developing the model and developed SVMR and CATPCA-SVMR DWG predictive models, respectively. Leave one out cross-validation (LOOCV) was used for the validation of the predictive models. In the research results, the CATPCA-SVMR DWG predictive model (R² = 0.594, R = 0.770) developed based on a dataset with categorical variables converted into input variables through the application of the CATPCA technology showed significantly better predictive performance than the SVMR DWG predictive model (R² = 0007, R = 0.083). Based on this, in this study, we proposed a novel hybrid ML model development scheme, which has not been researched in the construction and demolition (C&D) waste management field until now. The results of this study are expected to help mitigate the data requirements in constructing DWG information required for DW management strategies and develop an appropriate DWG predictive model in various data environments.

Keywords

machine learning

demolition waste

predictive model

support vector machine

categorical principal components analysis

MAIN

Introduction
Methods and materials
Data source of DWGR and data pre-processing
Applying categorical principal components analysis
Applying SVMR algorithm and hyper-parameters combinations
Model validation
Results
Model performance
Comparison of predictive models
Discussion and Limitation
Conclusions

Introduction

The construction industry accounts for 40% of the world’s energy consumption every year (Kulatunga et al. (2006) [1]). The generated amount of construction & demolition waste (C&DW) is steadily increasing (Wang et al. 2015 [2]), and 70-90% of C&DW generation is reported to be because of demolition waste (DW) (Butera et al. 2014 [3]). Therefore, accurate prediction of the amount of DW is important in terms of information establishment and tools for C&DW management (Fu et al. 2015 [4]).

Waste generation rate (WGR) is a tool that provides useful information for waste management (Hurley, 2003) [5], and it can be used as basic data for predicting the size of waste generation, economic value and cost, and environmental effects (Lu et al., 2011) [6]. In a recent trend, artificial intelligence (AI) technology, such as machine learning (ML) is actively used to estimate accurate WGR. Many researchers have applied various ML algorithms, such as artificial neural networks (ANN), support vector machines (SVM), and linear regression (LR) to develop WGR and waste generation predictive models. For example, Milojkovic et al. (2008) [7] and Noori et al. (2009) [8] developed municipal solid waste (MSW) generation predictive models by applying the ANN algorithm. Abbasi and Hanandeh. (2016) [9], Kumar et al. (2018) [10], and Abunama et al. (2019) [11] applied the SVM algorithm to develop C&DW generation predictive models. Furthermore, Azadi and Karimijashni. (2016) [12] and Chhay et al. (2018) [13] applied LR to develop MSW generation predictive models. Conversely, some researchers, including Abbasi et al. (2013) [14], Shamshiry et al. (2014) [15], Song et al. (2016) [16], and Golbaz et al. (2019) [17] conducted studies on hybrid models for predictive performance improvement of predictive models. Abbasi et al. (2013) [14] improved the performance of an SVM model through a WT-SVM hybrid model using a wavelet denoising method. Shamshiry et al. (2014) [15] combined ANN and genetic algorithm (GA) to conduct a study on the performance improvement of predictive values. Song et al. (2016) [16] developed a gray model (GM) with a low error rate through a gray model-support vector regression (GM-SVR) hybrid model. Golbaz et al. (2019) [17] improved the predictive performance of a predictive model through a least square support vector machine (LSSVM) and fuzzy logic support vector machine (FSVM) hybrid model.

Research on AI-based C&DW generation predictive model development is conducted based on a variety of variable types and data environments, and a series of processes from the selection of algorithms to the data processing and validation affect the research results. Therefore, C&DW generation predictive models using ML require appropriate algorithm selection, data processing, and validation method according to the inherent characteristics and environment of data in the research. In general, AI models are driven based on a large dataset, and insufficient data is a critical obstacle when applying AI systems (Abdallah et al. (2020) [18]) because it is difficult to secure stable predictive performance in AI models developed based on insufficient data. However, many researchers who handle field data have difficulty obtaining sufficient data (Abdallah et al. (2020) [18]), and insufficient data hinder the application of ML algorithms. Therefore, there is a need to discuss ways to develop AI models with excellent predictive performance based on datasets that are not large enough. The idea of researchers who attempted to develop ML predictive models through hybrid model development may be an appropriate solution to the problem caused by data.

In this study, we aimed to improve the predictive performance of DWG predictive models based on a small dataset comprising mainly categorical variables. For this, (1) DW generation information was collected from 784 buildings; (2) pre-processing was conducted to improve the predictive performance of the predictive models; (3) categorical variables were converted into continuous variables through categorical principal components analysis (CATPCA). (4) To improve the predictive performance of the DWG predictive models, the hyper-parameters were adjusted for the support vector machine regression (SVMR) model, and SVMR and CATPCA-SVMR models were developed. (5) As a validation method of the developed DWG predictive models, the leave one out cross-validation (LOOCV) technique was applied, and the performance of the models was determined through statistical metrics. Through the above process, in this study, we applied the data processing method and validation method that can ensure excellent predictive performance for a small dataset consisting mainly of categorical variables and proposed a hybrid DWG predictive model. Furthermore, we discussed the utilization method of the research results and the direction of follow-up studies.

Methods and materials

Figure 1 shows the flowchart of the processes performed in this study to develop predictive models through the application of CATPCA and examine the performance improvement. For predictive performance improvement of the predictive models, we built Dataset 1 through data pre-processing, such as outlier removal and standardization in raw data, and built Dataset 2 through data pre-processing and CATPCA. Dataset 1 and Dataset 2 were applied to the SVMR and CATPCA-SVMR models, respectively. In this study, the hyper-parameters were adjusted to derive the optimal predictive performance of the predictive models, and based on this, the SVMR and CATPCA-SVMR models were developed. For the performance evaluation of the two developed models, we tested the precision and accuracy through Pearson’s correlation coefficient (R), root mean square error (RMSE), coefficient of determination (R²), and mean absolute error (MAE).

https://cdn.apub.kr/journalsite/sites/durabi/2021-012-03/N0300120305/images/Figure_susb_12_03_05_F1.jpg

Figure 1.

Flowchart of proposed methodology for evaluating and developing the DWG predictive models in this study.

Data source of DWGR and data pre-processing

The raw data in this study include the demolition waste generation rate (DWGR) (kg/m²) information of 784 buildings along with architectural characteristic information, such as region, use of building, structure, wall material, roof material, and gross floor area (GFA). The use, structure, wall material, roof material, and GFA are key features used in the ML predictive models, and DWGR is a dependent variable. Among the features, the use, structure, wall material, and roof material are categorical variables, and the GFA and DWGR are continuous variables. The definition of DWGR in this study is shown in Eq. (1).

(1)

D W G R_{i} = \frac{\sum_{}^{} A o f b u i l d i n g_{i}}{G F A o f b u i l d i n g_{i}}

Here, DWGR is the demolition waste generation rate (kg/m²), A is the amount of a building (quantity) (kg), and GFA is the gross floor area (m²).

The construction of a reliable dataset is required to improve the predictive performance of a predictive model. Data pre-processing is performed to build a reliable dataset, and it includes techniques for cutting, adding, and converting training data (Kuhn and Johnson, 2013; Nisbet et al., 2009) [19, 20]. In this study, outlier removal and standardization were performed to build a reliable dataset. The outlier removal follows Eq. (2), and the standardization follows Eq. (3).

(2)

Q 1 - 1.5 \times IQR < selecting data < Q 3 + 1.5 \times IQR,

where IQR is the interquartile range and the value of IQR is Q3 minus Q1, Q is quartile, Q1 is the 25^th percentile, and Q3 is the 75^th percentile.

(3)

x_{s t a n d a r d i z a t i o n} = \frac{x - \bar{x}}{σ}

Here, $x$ is the element of data, $\bar{x}$ is the average of data, and $σ$ is the standard deviation of data. After data pre-processing, the DWGR data of 690 out of 784 buildings were applied to develop the DWG predictive model.

Applying categorical principal components analysis

Principal components analysis (PCA) is a statistical method used to solve complex variable problems for a large number of variables (Camdevyren et al., 2005) [21]. In general, PCA aims to reduce a large number of variables to a smaller number of variables, which are called principal components that explain the variance of data. PCA works well when continuous variables are targeted, and it is not suitable for categorical variables (Khikmah et al., 2017) [22]. Conversely, CATPCA was developed for data, such as nominal and ordinal variables that have no linear relationship with each other (Linting et al., 2017) [23]. In this study, we used the CATPCA technique of the Python package “Prince” to convert the categorical data into continuous data. As shown in Figure 2, six features in this study can be converted into six continuous variables through CATPCA. Here, X₁-X₆, which were converted into continuous variables, do not signify existing variables x₁-x₆ (location, structure, use, WM, RM, and GFA, respectively). Figure 3 shows the result values of continuous variables converted through CATPCA.

https://cdn.apub.kr/journalsite/sites/durabi/2021-012-03/N0300120305/images/Figure_susb_12_03_05_F2.jpg

Figure 2.

Input variable type conversion process of data set by CATPCA.

https://cdn.apub.kr/journalsite/sites/durabi/2021-012-03/N0300120305/images/Figure_susb_12_03_05_F3.jpg

Figure 3.

Results of values converted from categorical variables into continuous variables through CATPCA.

Applying SVMR algorithm and hyper-parameters combinations

SVM can obtain good results, especially when solving classification or regression problems with small samples, high dimensions, and local minimum points (You et al., 2017) [24]. The SVMR generalized the SVM to predict arbitrary real number values and to solve a regression problem, the input variables are mapped to the feature space through the kernel function, thereby composing a linear function. Here, in the feature space, the structural risk minimization principle is applied to compose the optimal decision function. A linear $ϵ$ -insensitive loss function is selected in the standard SVM regression. The optimization objective of the standard SVM regression is formulated as Eq. (4) (Noori et al., 2008) [25].

(4)

L_{S V R} = \min \frac{1}{2} ∥ w ∥^{2} + C \sum_{i = 1}^{n} (ξ_{i} + ξ_{i}^{*})

Here, $w$ is a vector of weights in feature space, $ξ_{i}$ and $ξ_{i}^{*}$ are positive slack variables specifying the upper and lower training error subject to an error term ( $ϵ$ ), C is the penalty parameter, and n is the sample size.

If Lagrangian multipliers and Karush-Kuhn-Tucker conditions are applied to Eq. (4) to solve a problem for an unknown number, a general regression equation of SVMR, Eq. (5) is derived.

(5)

y (x) = \sum_{i = 1}^{n} (α_{i} - α_{i}^{'}) K (x_{i}, x_{j}) + b

Here, $α_{i}$ is the learning vector and $x_{i}$ is the support vector. $K (x_{i}, x_{j})$ is the kernel function, and the kernel function is expressed as Eq. (6)

(6)

K (x_{i}, x_{j}) = ϕ (x_{i}) ϕ (x_{j})

In SVMR, the choice of coefficients cost (C), gamma, and kernel parameters are the most important (Abbasi et al., 2013) [14]. The constant C (>0) is a weight that determines the balance between network complexities, and the kernel function significantly influences the generalization ability (You et al., 2017) [24]. In this study, the Gaussian radial basis kernel function (also referred to as radial basis function (Rbf)), which has high flexibility and generality, is applied as the kernel function. Furthermore, the best parameters are determined for the Cost (C) and gamma values through the LOOCV process, and in this study, 0.001 and 1 are applied as the Cost (C) and gamma values, respectively. For other parameters, the default values of Python “sklearn.svm” are applied. Table 1 shows the hyper-parameter values used for the development of the SVMR models in this study.

Table 1.

Hyper-parameters combinations of SVMR algorithm in this study

Parameters	Applied value or reference	Definition
C_penenalty	0.001	Penalty parameter of the error term
Kernel	Rbf	Kernel type in the algorithm
Tol	0.001	Tolerance for stopping criterion
gamma	1 / (6 * std of RBF kernel)	How far the influence of a single training example reaches
epsilon	0.1	The value of epsilon defines a margin of tolerance where no penalty is given to error
shrinking	True	The shrinking technique tries to identify and remove some bounded elements, so a smaller optimization problem is solved
cache_size	200	Specify the size of the kernel cache

Model validation

In general, k-fold cross-validation (CV) is most widely used as a validation method for ML models; however, if the dataset size is small, it is appropriate to use LOOCV (Witten et al., 2011) [26]. This is to secure a sufficient training set and a sufficient validation set because the number of data required for validation is insufficient. In LOOCV, all samples undergo the test. Therefore, when a small dataset is used, LOOCV has the advantage that more stable results can be obtained compared to the validation set approach of a conventional CV method (10-fold or k-fold CV method) (Cha et al., 2020; Shao, 2016) [27, 28]. In this study, therefore, the LOOCV technique was applied as a validation method for the models, considering the size of the datasets. For the performance evaluation of the predictive models in this study, we used four statistical metrics (i.e., MAE, RMSE, R², R). The definition of each performance evaluation indicator is shown in Eqs. (7), (8), (9), and (10), respectively.

(7)

MAE = \frac{\sum_{i = 1}^{n} | y_{i} - x_{i} |}{n}

(8)

RMSE = \sqrt{\sum_{i = 1}^{n} \frac{(y_{i} - x_{i})^{2}}{n}}

(9)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - x_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - {\bar{x}}_{i})^{2}}

(10)

R = \frac{\sum_{i = 1}^{n} (x_{i} - {\bar{x}}_{i}) (y_{i} - {\bar{y}}_{i})}{\sqrt{\sum_{i = 1}^{n} (x_{i} - {\bar{x}}_{i})^{2}} \sqrt{\sum_{i = 1}^{n} (y_{i} - {\bar{y}}_{i})^{2}}}

Here, $x_{i}$ is the observed value of the generated DW amount, $y_{i}$ is the predicted value of the generated DW amount, ${\bar{x}}_{i}$ is the average observed value of the generated DW amounts, ${\bar{y}}_{i}$ is the average predicted value of the generated DW amount, and n is the number of samples.

Results

Model performance

In this study, we developed SVMR and CATPCA-SVMR predictive models for small datasets consisting mainly of categorical variables. Table 2 and Figure 4 show the results of the performance metrics for the two models. The predictive performance of the SVMR predictive model is low, showing an MAE value of 2020.304, RMSE value of 1116.264, R² value of 0.007, and R value of 0.083. Furthermore, there seems to be almost no correlation between the predictive and observed values, as shown in Figure 4. The reason is that categorical input variables are not suitable for the SVM algorithm because SMV is based on Mahalanobis distance (Ye et al., 2007) [24]. Conversely, the results of the CATPCA-SVMR model show significant improvements. As for the predictive performance of the CATPCA-SVMR predictive model, the MAE, RMSE, R², and R values are 202.228, 261.445, 0.594, and 0.770, respectively. This model shows excellent predictive performance despite being a predictive model developed based on a dataset of 690 DWGRs. As shown in Figure 4, the distributions of the predictive and observed values of the CATPCA-SVMR predictive model are concentrated on the ideal prediction line. Therefore, the application of CATPCA to datasets comprising categorical variables is a useful method to improve the predictive performance of the SVMR model.

Table 2.

Comparison of models’ performance by MAE, RMSE, R², R

Model	MAE	RMSE	R²	R
SVMR	2020.304	1116.264	0.007	0.083
CATPCA-SVMR	202.228	261.445	0.594	0.770

https://cdn.apub.kr/journalsite/sites/durabi/2021-012-03/N0300120305/images/Figure_susb_12_03_05_F4.jpg

Figure 4.

Scatter plot of the observed and predicted DWGR using SVRM and CATPCA-SVRM.

Comparison of predictive models

In the previous section, the hybrid predictive model that applied the SVMR algorithm and CATPCA technique showed a significant increase in predictive performance. Figure 5 shows the observed and predicted values obtained by the SVMR predictive model and the CATPCA-SVMR predictive model, respectively. The predicted values by the SVMR predictive model show significant difference from the observed values. Conversely, the predicted values by the CATPCA-SVMR predictive model are quite close to the observed values. The results in Figure 5 imply that the application of the CATPCA technique and the development of the hybrid DWG predictive model for a small dataset comprising categorical variables performed in this study are appropriate ways of improving the predictive performance of the predictive model.

https://cdn.apub.kr/journalsite/sites/durabi/2021-012-03/N0300120305/images/Figure_susb_12_03_05_F5.jpg

Figure 5.

Comparison of the observed and predicted DWGR of SVMR and CATPCA-SVMR predictive models.

Discussion and Limitation

In this study, we presented the hybrid ML model development and method that can improve predictive performance in a special data environment (i.e., categorical variables and a small dataset). Previous studies (Noori et al. (2009) [8]; Noori et al. (2008) [25]) used the PCA technique mainly as a simplification process of the input variables applied to the model. Noori et al. (2009) [15] used PCA for s13 input variables to develop a PCA-multi linear regression (PCA-MLR) model. In the study, the 3PCs-applied PCA-MLR model (R=0.445) exhibited the best result. Noori et al. (2008) [16] developed a PCA-SVM model for MSW generation prediction, and the 6PCs-SVM model (R²=0.7516) that had six variables among 13 input variables showed the best result. These studies were conducted using continuous input variables. In addition, the existing studies (Abbasi and El Hanandeh. (2016) [9]; Kumar et al. (2018) [10]; Abunama et al. (2019) [11]; Noori et al. (2008) [25]; Dai et al. (2011) [29]) on the DW prediction model using SVM algorithm were able to secure predictive performance through a dataset consisting of only continuous variables. It is the reason that the categorical type is an inappropriate variable type as a variable of the SVM algorithm. Conversely, this study focused on converting categorical variables into continuous variables by applying CATPCA. A hybrid DWG predictive model was also developed by applying the CATPCA technique to improve the performance of the model by the SVMR algorithm, which is not suitable for categorical input variables. As a result, we proposed a CATPCA-SVMR DWG predictive model that has excellent predictive performance (R² value 0.594, R value 0.770) despite having been developed based on a small dataset. Therefore, this study has presented a development method for a novel hybrid DWG predictive model that can improve predictive performance in an insufficient data environment. Considering that AI models are driven based on large datasets (Abdallah et al., 2020 [18]), the result of this study is quite valuable. Nevertheless, the dataset size that is not sufficiently large is the fundamental limitation of this study, and securing a sufficiently large dataset is an important challenge. In addition, the dataset consisting of 6 variables used in this study has difficulty in reflecting all the characteristics of 690 buildings. Considering this, it seems necessary to add variables other than the key features used in this study. Therefore, it seems that we need to conduct additional studies to improve predictive performance by securing larger datasets in the future.

Conclusions

This study was conducted on a hybrid DWG predictive model for performance improvement with a small dataset comprising categorical variables. For this, we adopted an SVMR model and applied the CATPCA technique to develop SVMR and CATPCA-SVMR DWG predictive models, respectively, and conducted the performance evaluations. Therefore, the conclusions of this study are as follows.

First, categorical variables can be converted into input variables by applying the CATPCA technique, based on which, application of various ML algorithms can be expanded. In this study, only the SVMR algorithm was applied, but various ML algorithms suitable for continuous variables can be applied similarly, and predictive performance improvement is expected. Concisely, this study proposed a method that can improve predictive performance by applying various ML algorithms for datasets comprising categorical variables using CATPCA.

Second, we developed a CATPCA-SVMR predictive model for DWG prediction. Herein, a hybrid model for DWG prediction using SVM and CATPCA was proposed. The CATPCA-SVMR model (R2=0.594, R=0.770) showed significantly improved results compared to the SVMR model (R²=0007, R=0.083), and despite using a dataset of 690 buildings, it showed stable prediction performance.

This study was conducted on CATPCA-applied DWG predictive model, which has not been researched in the C&D waste management field yet. It seems that the novel hybrid ML model development method can be sufficiently used in other regions as well. Furthermore, the results of this study are expected to help mitigate the data requirements in constructing DWG information required for DW management strategies and develop appropriate DWG predictive models in various data environments.

Abbreviations

AI, Artificial Intelligence;

CATPCA, categorial principal components analysis;

C&D waste, construction & demolition waste;

DW, demolition waste;

DWGR, demolition waste generation rate;

GFA, gross floor area;

LOOCV, leave one out cross validation;

MAE, mean absolute error;

ML, machine learning;

PCA, principal components analysis;

R, Pearson’s correlation coefficient;

R², coefficient of determination;

RMSE, root mean square error;

SVMR, support vector machine regression

Acknowledgements

This research was performed with financial support from the Incheon National University Research Grant (2020-0026).

The author declares no conflict of interest.

References

U. Kulatunga, R. Amaratunga, R. Haigh, and R. Rameezdeen, Attitudes and perceptions of construction workforce on construction waste in Sri Lanka. Management of Environmental Quality. 17(1) (2006), pp. 57-72. 10.1108/14777830610639440

J. Wang, Z. Li, and W.Y. Vivian Tam, Identifying best design strategies for construction waste minimization. Journal of Cleaner Production. 92 (2015), pp. 237-247. 10.1016/j.jclepro.2014.12.076

S. Butera, T.H. Christensen, and T.F. Astrup, Composition and leaching of construction and demolition waste: inorganic elements and organic compounds. Journal of hazardous materials. 276 (2014), pp. 302-311. 10.1016/j.jhazmat.2014.05.03324910908

H.Z. Fu, Z.S. Li, and R.H. Wang, Estimating municipal solid waste generation by different activities and various resident groups in five provinces of China. Waste Management. 41(2015), pp. 3-11. 10.1016/j.wasman.2015.03.02925861710

J.W. Hurley, Valuing the pre-demolition audit process. In Proceedings of the 11th Rinker International Conference on Deconstruction and Materials Reuse, Gainesville, FL, USA, 7-10 May 2003; Chini, A.R., Ed.; International Council for Research and Innovation in Building and Construction (CIB): Rotterdam, The Netherlands, pp. 151-164.

W. Lu, H. Yuan, J. Li, J.J. Hao, X. Mi, and Z. Ding, An empirical investigation of construction and demolition waste generation rates in Shenzhen city, South China. Waste Management. 31(4) (2011), pp. 680-687. 10.1016/j.wasman.2010.12.00421208794

J. Milojkovic and V. Litovski, Comparison of some ANN based forecasting methods implemented on short time series. (2008), 9th Symposium on Neural Network Applications in Electrical Engineering, pp. 175-178. 10.1109/NEUREL.2008.4685606

R. Noori, M.A. Abdoli, M.J. Ghazizade, and R. Samieifard, Comparison of neural network and principal component- regression analysis to predict the solid waste generation in Tehran. Iranian Journal of Public Health. 38 (2009), pp. 74-84.

M. Abbasi and A.El. Hanandeh, Forecasting municipal solid waste generation using artificial intelligence modelling approaches. Waste Management. 56 (2016), pp. 13-22. 10.1016/j.wasman.2016.05.01827297046

A. Kumar, S.R. Samadder, N. Kumar, and C. Singh, Estimation of the generation rate of different types of plastic wastes and possible revenue recovery from informal recycling. Waste Management. 79 (2018), pp. 781-790. 10.1016/j.wasman.2018.08.04530343811

T. Abunama, F. Othman, and M. Ansari, Leachate generation rate modeling using artificial intelligence algorithms aided by input optimization method for an MSW landfill. Environmental Science and Pollution Research. 26 (2019), pp. 3368-3381. 10.1007/s11356-018-3749-530511225

S. Azadi and A. Karimi-jashni, Verifying the performance of artificial neural network and multiple linear regression in predicting the mean seasonal municipal solid waste generation rate: A case study of Fars province, Iran. Waste Management. 48 (2016), pp. 14-23. 10.1016/j.wasman.2015.09.03426482809

L. Chhay, M.A.H. Reyad, R. Suy, M.R. Islam, and M.M. Mian, Municipal solid waste generation in China: Influencing factor analysis and multi-model forecasting. Journal of Material Cycles and Waste Management. 20 (2018), pp. 1761-1770. 10.1007/s10163-018-0743-4

M. Abbasi, M.A. Abduli, B. Omidvar, and A. Baghvand, Forecasting municipal solid waste generation by hybrid support vector machine and partial least square model. International Journal of Environemntal Research. 7 (2013), pp. 27-38.

E. Shamshiry, M. Mokhtar, A. Abdulai, I. Komoo, and N. Yahaya, Combining artificial neural network- genetic algorithm and response surface method to predict waste generation and optimize cost of solid waste collection and transportation process in Langkawi island, Malaysia. Malaysian Journal of science. 33 (2014), pp.118-141. 10.22452/mjs.vol33no2.1

Y. Song, Y. Wang, F. Liu, and Y. Zhang, Development of a hybrid model to predict construction and demolition waste: China as a case study. Waste Management. 59 (2016), 350-361. 10.1016/j.wasman.2016.10.00927777033

S. Golbaz, R. Nabizadeh, and H.S. Sajadi, Comparative study of predicting hospital solid waste generation using multiple linear regression and artificial intelligence. Journal of Environmental Health Science and Engineering. 17 (2019), pp. 41-51. 10.1007/s40201-018-00324-z31297201PMC6582046

M. Abdallah, M.A. Talib, S. Feroz, Q. Nasir, H. Abdalla, and B. Mahfood, Artificial intelligence applications in solid waste management: A systematic research review. Waste Management. 109 (2020), pp. 231-246. 10.1016/j.wasman.2020.04.05732428727

M. Kuhn and K. Johnson, Applied Predictive Modeling. Springer (2013): New York, NJ, USA. 10.1007/978-1-4614-6849-3

R. Nisbet, J. Elder, and G. Miner, Handbook of Statistical Analysis and Data Mining Applications. Academic Press (2009), Massachusetts, USA.

H. Camdevyren, N. Demyr, A. Kanik, and S. Keskyn, Use of principal component scores in multiple linear regression models for prediction of chlorophyll-a in reservoirs. Ecological Modelling. 181 (2005), pp. 581-589. 10.1016/j.ecolmodel.2004.06.043

L. Khikmah, H. Wijayanto, and U.D. Syafitri, Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression. Journal of Physics: Conference Series. 824 (2017), 012027. 10.1088/1742-6596/824/1/012027

M. Linting and A. Van der Kooij, Nonlinear Principal Components Analysis With CATPCA: A Tutorial. Journal of Personality Assessment. 94(1) (2012), pp. 12-25. 10.1080/00223891.2011.62796522176263

H. You, Z. Ma, Y. Tang, Y. Wang, J. Yan, M. Ni, K. Cen, and Q. Huang, Comparison of ANN (MLP), ANFIS, SVM, and RF models for the online classification of heating value of burning municipal solid waste in circulating fluidized bed incinerators. Waste Management. 68 (2017), pp. 186-197. 10.1016/j.wasman.2017.03.04428408281

R. Noori, M.A. Abdoli, A.A. Ghasrodashti, and M.J. Ghazizade, Prediction of municipal solid waste generation with combination of support vector machine and principal component analysis: A case study of Mashhad. Environmental Progress & Sustainable Energy. 28(2) (2008). pp. 249-258. 10.1002/ep.10317

I.H. Witten, E. Frank, and M.A. Hall, Data Mining: Practical Machine Learning Tools and Techniques. (2011) 3rd Edition, Morgan Kaufmann, Massachuse. 10.1016/B978-0-12-374856-0.00001-821877416PMC4751866

G.-W. Cha, H.J. Moon, W.-H. Hong, J.-H. Hwang, W.-J. Park, and Y.-C. Kim, Development of a Prediction Model for Demolition Waste Generation Using a Random Forest Algorithm Based on Small DataSets. International Journal of Environmental Research and Public Health. 17(19) (2020), 6997. 10.3390/ijerph1719699732987874PMC7579598

Z. Shao, and M.J. Er, Efficient Leave-One-Out Cross-Validation-based Regularized Extreme Learning Machine. Neurocomputing. 194 (2016), pp. 260-270. 10.1016/j.neucom.2016.02.058

C. Dai, Y.P. Li, and G.H. Huang, A two-stage support-vector-regression optimization model for municipal solid waste management-A case study of Beijing, China. Journal of environmental management. 92 (2016), 3023-3037. 10.1016/j.jenvman.2011.06.03821872384

International Journal of Sustainable Building Technology and Urban Development ISSN:2093-761X(Print) 2093-7628(Online)

Preview

A Method to improve the performance of support vector machine regression model for predicting demolition waste generation using categorical principal components analysis

ABSTRACT

MAIN

Figure 1.

Flowchart of proposed methodology for evaluating and developing the DWG predictive models in this study.

(1)

(2)

(3)

Figure 2.

Input variable type conversion process of data set by CATPCA.

Figure 3.

Results of values converted from categorical variables into continuous variables through CATPCA.

(4)

(5)

(6)

Table 1.

Hyper-parameters combinations of SVMR algorithm in this study

(7)

(8)

(9)

(10)

Table 2.

Comparison of models’ performance by MAE, RMSE, R2, R

Figure 4.

Scatter plot of the observed and predicted DWGR using SVRM and CATPCA-SVRM.

Figure 5.

Comparison of the observed and predicted DWGR of SVMR and CATPCA-SVMR predictive models.

Acknowledgements

References

Comparison of models’ performance by MAE, RMSE, R², R