Published January 1, 2022
| Version v1
Conference paper
Open
A Comparison of Data Imputation Methods Utilizing Machine Learning for a New IoT System Platform
Creators
Description
IoT systems are being used widely place in manufacturing. The volume of the sensor data in these systems is significant. In real-life scenarios, missing sensor data can cause problems, especially for data-driven machine learning (ML) models. The gaps due to missing sensor data should be handled before employing machine learning models. The common practices are to remove the missing data completely or apply simple arithmetic operations. However, there are more sophisticated approaches in the literature that can be applied to these real-time IoT systems considering the native data characteristics. This study compares the performance of regression-based ML algorithms missing data imputation methods such as Support Vector Regression (SVR), Decision Tree Regression (DTR), Ridge Regression, K-Nearest Neighbors Regression (KNN), MissForest (MF), and XGBoost Regression (XGB). Missing data in different positions and proportions are created utilizing experimentally collected timeseries sensor data from a newly developed IoT system platform. The initial work based on the ML models is presented on these datasets together with an overview of the IoT system architecture. The average RMSE and R-2 values of the six ML models showed that the Ridge Regression outperforms the other ML models for the missing data imputation.
Files
bib-0def2565-7ccb-415f-98a9-1cac5acb766a.txt
Files
(237 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:bee6513462d28d79f0d27d7adb66afb1
|
237 Bytes | Preview Download |