Published January 1, 2009
| Version v1
Journal article
Open
Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction
Creators
- 1. Sci & Technol Res Council Turkey TUBITAK, Inst Informat Technol, Marmara Res Ctr, TR-41470 Kocaeli, Turkey
- 2. Yildiz Tech Univ, Dept Comp Engn, TR-34349 Istanbul, Turkey
Description
This research focused on investigating and benchmarking several high performance classifiers called J48, random forests, naive Bayes, KStar and artificial immune recognition systems for software fault prediction with limited fault data. We also studied a recent semi-supervised classification algorithm called YATSI (Yet Another Two Stage Idea) and each classifier has been used in the first stage of YATSI. YATSI is a meta algorithm which allows different classifiers to be applied in the first stage. Furthermore, we proposed a semi-supervised classification algorithm which applies the artificial immune systems paradigm. Experimental results showed that YATSI does not always improve the performance of naive Bayes when unlabelled data are used together with labelled data. According to experiments we performed, the naive Bayes algorithm is the best choice to build a semi-supervised fault prediction model for small data sets and YATSI may improve the performance of naive Bayes for large data sets. In addition, the YATSI algorithm improved the performance of all the classifiers except naive Bayes on all the data sets.
Files
bib-84531902-0aa7-49f0-a42f-7be9b9cb0d11.txt
Files
(157 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:86ab93ee50a7c62d631d0e44298ce701
|
157 Bytes | Preview Download |