Department of Industrial Engineering
Permanent URI for this collection
Browse
Browsing Department of Industrial Engineering by Author "KILIÇOĞLU, Şevval"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item COMPARISON AND ASSESSMENT OF SHRINKAGE METHODS IN CASE OF MULTICOLLINEARITY PROBLEM(2022-06-14) KILIÇOĞLU, Şevval; YERLİKAYA ÖZKURT, FatmaThe use of data analysis and data interpretation are increasing in importance in many fields of applied science such as engineering, medicine, natural and social sciences. For this purposes, statistical methods are used to collect, analyze and interpret data. Among the statistical analysis methods, one of the most preferred one is multiple linear regression due to its simplicity and interpretation. It describes the relationship between more than one independent variables and a dependent variable. However, sometimes, it can be observed that there is a multicollinearity (linear relationship) between the independent variables in data sets to which multiple linear regression models will be applied. This causes the variance of the estimated coefficients in the model to be large and their biases to be low, and in such cases, model predictions may not give accurate results and the reliability of the model may decrease. If there is a multicollinearity between the variables in the data set, it is of great importance to determine this in advance. For this purpose, there are many multicollinearity determination method and there are several methods developed to solve this problem. The most popular and powerful methods to handle this problem are shrinkage methods. Shrinkage methods aim to minimize the multicollinearity problem by reducing the variance of the estimated parameters in the model. Ridge Regression, Lasso, and Elastic Net are the most preferred shrinkage methods that set the coefficients of the variables in the model to zero or very close to zero. In this thesis, Ridge Regression, Lasso, and Elastic Net were applied to nine different simulated data sets with different characteristics. The Copula function was used to create multicollinearity between independent variables for simulated data sets. Following that, all of the aforementioned shrinkage methods were also applied on three real-world data sets. These data sets were matched with the simulated data sets based on their sizes, which were classified as small, medium, and large. For the simulated data sets, a 10-fold Cross-Validation (CV) approach is applied to validate the shrinkage methods. On the other hand, the hold-out method, which relies on only one training and test split, is preferred for real-world data sets. After all models were created, well-known performance measures were calculated for each method to determine which method gives better results in the data set in which characteristics. Mean squared error (MSE), mean squared error based on number of independent variables (PMSE), R-squared, mean absolute error (MAE) and explained variance are the performance measures used in decision making. Based on performance results, the methods were compared with TOPSIS, which is one of the multi-criteria decision making methods, and the order of preference was determined for each data set. When all the performance and TOPSIS results are examined, it is seen that generally ridge regression gives the best results in small data sets, as the data set grows, that is, as the complexity increases, shrinkage methods tend to make variable selection to reduce the variance of the estimated coefficients, and therefore lasso or elastic net models give better results. If a general ranking is made among the models, they can be listed as lasso, elastic net and ridge regression.