Jo Nov 22, 2023

Incomplete data are a common problem arising in data analysis. There are many methods to treat incomplete data including deletion and imputation.

However, they may be unsuitable in case of relatively large amount of missing values.

Yang Won Chol, a researcher at the Faculty of Materials Science and Technology, has proposed a method to develop a multiple regression model with block missed incomplete data.

The outline of the method is as follows: (a) separate submatrices from block missed incomplete data, (b) develop multiple regression submodels from the submatrices, and (c) develop a final multiple regression model by linear or nonlinear combination of the submodels.

By applying the proposed method, he conducted a simulation experiment on three datasets and developed a prediction model of casting density for A380 according to die casting process parameters.

The results demonstrated that the performance and data usage rate of the proposed method is far superior to the previous methods.

First, the proposed method to develop a multiple regression model with block missed incomplete data ensures a high performance, statistical stability and reliability of the final multiple regression model, and ensures data usage rate of 100%.

Second, the final multiple quadratic regression model by quadratic combination of regression submodels has better performance than the final multiple linear regression model by linear combination of regression submodels.

Third, when some features of a new object are not observed, it is possible to predict the target value by using final regression submodels.