Published January 1, 2009 | Version v1
Journal article Open

Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction

  • 1. Sci & Technol Res Council Turkey TUBITAK, Inst Informat Technol, Marmara Res Ctr, TR-41470 Kocaeli, Turkey
  • 2. Yildiz Tech Univ, Dept Comp Engn, TR-34349 Istanbul, Turkey

Description

This research focused on investigating and benchmarking several high performance classifiers called J48, random forests, naive Bayes, KStar and artificial immune recognition systems for software fault prediction with limited fault data. We also studied a recent semi-supervised classification algorithm called YATSI (Yet Another Two Stage Idea) and each classifier has been used in the first stage of YATSI. YATSI is a meta algorithm which allows different classifiers to be applied in the first stage. Furthermore, we proposed a semi-supervised classification algorithm which applies the artificial immune systems paradigm. Experimental results showed that YATSI does not always improve the performance of naive Bayes when unlabelled data are used together with labelled data. According to experiments we performed, the naive Bayes algorithm is the best choice to build a semi-supervised fault prediction model for small data sets and YATSI may improve the performance of naive Bayes for large data sets. In addition, the YATSI algorithm improved the performance of all the classifiers except naive Bayes on all the data sets.

Files

bib-84531902-0aa7-49f0-a42f-7be9b9cb0d11.txt

Files (157 Bytes)

Name Size Download all
md5:86ab93ee50a7c62d631d0e44298ce701
157 Bytes Preview Download