Yayınlanmış 1 Ocak 2016 | Sürüm v1
Konferans bildirisi Açık

An Aggregated Cross-Validation Framework for Computational Discovery of Disease-Associative Genes

  • 1. Univ Northumbria Newcastle, Biohlth Informat Res Team, Dept Comp Sci & Digital Technol, Fac Engn & Environm, Newcastle Upon Tyne NE2 1XE, Tyne & Wear, England
  • 2. TUBITAK BILGEM UEKAE, Kocaeli, Turkey

Açıklama

Numerous computational techniques have been applied to identify vital features of gene expression datasets that aim to increase efficiency of biomedical applications. Classification of samples is an important task to correctly recognize diseased people by identifying small but clinically meaningful genes. Conversely, it is a challenging issue for machine learning algorithms. In this paper, we apply a two-stage feature selection approach by using ensemble filter methods and Pareto Optimality. Although filter methods provide ranked lists of all features, they do not give any information about required (optimum) subset sizes of the features, namely, genes in this study. In order to address this issue, PO is incorporated with filter methods. The main aim of this study is therefore to develop a robust framework with PO, multiple feature selection methods and cross-validated subsets of the samples, which is also applicable to not only similar datasets but also different feature selection methods. The robustness of the framework has been successfully demonstrated over three well-known microarray gene expression data sets. The framework has been shown to yield equal or higher predictive accuracy with comparatively smaller feature sizes. Furthermore, the cross-validation and data variation approaches are considered in the framework. Consequently, the framework reduces the over-fitting problem and is observed to have made the gene selection more stable on different conditions.

Dosyalar

bib-43265578-94b6-4bf3-9a13-9c3c2175b772.txt

Dosyalar (239 Bytes)

Ad Boyut Hepisini indir
md5:a40f78c4b43331b697e0315c08f5e5b0
239 Bytes Ön İzleme İndir