Dergi makalesi Açık Erişim

A Methodology for Comparing the Reliability of GPU-Based and CPU-Based HPCs

   Cini, Nevin; Yalcin, Gülay

Today, GPUs are widely used as coprocessors/accelerators in High-Performance Heterogeneous Computing
due to their many advantages. However, many researches emphasize that GPUs are not as reliable as desired
yet. Despite the fact that GPUs are more vulnerable to hardware errors than CPUs, the use of GPUs in HPCs
is increasing more and more. Moreover, due to native reliability problems of GPUs, combining a great number
of GPUs with CPUs can significantly increase HPCs’ failure rates. For this reason, analyzing the reliability
characteristics of GPU-based HPCs has become a very important issue. Therefore, in this study we evaluate
the reliability of GPU-based HPCs. For this purpose, we first examined field data analysis studies for GPU-
based and CPU-based HPCs and identified factors that could increase systems failure/error rates. We then
compared GPU-based HPCs with CPU-based HPCs in terms of reliability with the help of these factors in
order to point out reliability challenges of GPU-based HPCs. Our primary goal is to present a study that can
guide the researchers in this field by indicating the current state of GPU-based heterogeneous HPCs and
requirements for the future, in terms of reliability. Our second goal is to offer a methodology to compare the
reliability of GPU-based HPCs and CPU-based HPCs. To the best of our knowledge, this is the first survey
study to compare the reliability of GPU-based and CPU-based HPCs in a systematic manner.

Dosyalar (516.1 kB)
Dosya adı Boyutu
Makale1.pdf
md5:ba5273044fb4ade31a70b24dda0199ee
516.1 kB İndir
249
93
görüntülenme
indirilme
Görüntülenme 249
İndirme 93
Veri hacmi 48.0 MB
Tekil görüntülenme 220
Tekil indirme 88

Alıntı yap