Published January 1, 2020 | Version v1
Journal article Open

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

  • 1. Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
  • 2. Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
  • 3. Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey

Description

Motivation: Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.

Files

bib-5ed84abb-36e9-45e3-8f83-d2e799270d97.txt

Files (215 Bytes)

Name Size Download all
md5:5d5189a4da9280808a77672d32751c93
215 Bytes Preview Download