Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction

Catal, Cagatay; Diri, Banu

doi:10.1111/j.1468-0394.2009.00509.x

Published January 1, 2009 | Version v1

Journal article Open

Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction

1. Sci & Technol Res Council Turkey TUBITAK, Inst Informat Technol, Marmara Res Ctr, TR-41470 Kocaeli, Turkey
2. Yildiz Tech Univ, Dept Comp Engn, TR-34349 Istanbul, Turkey

This research focused on investigating and benchmarking several high performance classifiers called J48, random forests, naive Bayes, KStar and artificial immune recognition systems for software fault prediction with limited fault data. We also studied a recent semi-supervised classification algorithm called YATSI (Yet Another Two Stage Idea) and each classifier has been used in the first stage of YATSI. YATSI is a meta algorithm which allows different classifiers to be applied in the first stage. Furthermore, we proposed a semi-supervised classification algorithm which applies the artificial immune systems paradigm. Experimental results showed that YATSI does not always improve the performance of naive Bayes when unlabelled data are used together with labelled data. According to experiments we performed, the naive Bayes algorithm is the best choice to build a semi-supervised fault prediction model for small data sets and YATSI may improve the performance of naive Bayes for large data sets. In addition, the YATSI algorithm improved the performance of all the classifiers except naive Bayes on all the data sets.

Files

bib-84531902-0aa7-49f0-a42f-7be9b9cb0d11.txt

Files (157 Bytes)

Name	Size	Download all
bib-84531902-0aa7-49f0-a42f-7be9b9cb0d11.txt md5:86ab93ee50a7c62d631d0e44298ce701	157 Bytes	Preview Download

	All versions	This version
Views	54	54
Downloads	13	13
Data volume	2.0 kB	2.0 kB

Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction

Files

bib-84531902-0aa7-49f0-a42f-7be9b9cb0d11.txt

Files (157 Bytes)

TÜBİTAK ULAKBİM

CONTACT

Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction

Creators

Description

Files

bib-84531902-0aa7-49f0-a42f-7be9b9cb0d11.txt

Files (157 Bytes)