Konferans bildirisi Açık Erişim

Alternative PPM Model for Quality Score Compression

   Akgun, Mete; Sagiroglu, Mahmut Samil

Next Generation Sequencing (NGS) platforms generate header data and quality information for each nucleotide sequence. These platforms may produce gigabyte-scale datasets. The storage of these datasets is one of the major bottlenecks of NGS technology. Information produced by NGS are stored in FASTQ format. In this paper, we propose an algorithm to compress quality score information stored in a FASTQ file. We try to find a model that gives the lowest entropy on quality score data. We combine our powerful statistical model with arithmetic coding to compress the quality score data the smallest. We compare its performance to text compression utilities such as bzip2, gzip and ppmd and existing compression algorithms for quality scores. We show that the performance of our compression algorithm is superior to that of both systems.

Dosyalar (200 Bytes)
Dosya adı Boyutu
200 Bytes İndir
Görüntülenme 23
İndirme 5
Veri hacmi 1.0 kB
Tekil görüntülenme 19
Tekil indirme 5

Alıntı yap