Published January 1, 2020 | Version v1
Journal article Open

A supervised learning approach for detecting erroneous samples in embeddings

  • 1. Ankara Univ, Fac Engn, Dept Biomed Engn, Ankara, Turkey

Description

Visualizing multidimensional data has been a crucial task in recent years regarding the growing amount of data from various sources. To achieve this, dimensionality reduction algorithms have been used to reduce the number of dimensions for visualization of the data on a screen. However, these algorithms may fail to faithfully represent high dimensional data in lower dimensions and eventually lead to erroneous visualizations. In this work, we propose an error detection algorithm for dimensionality reduction algorithms based on recently developed error prediction algorithms for medical image registration. The proposed algorithm matches the neighborhoods of high and low dimensional data with different similarity measures and predicts the errors using a random forest classifier. The results on three datasets show that the proposed algorithm can successfully detect errors with an accuracy up to 86% and area under the curve score of 0.81.

Files

10-3906-elk-1909-162.pdf

Files (22.1 MB)

Name Size Download all
md5:c74f664559fab4c67922fc32cec87655
22.1 MB Preview Download