Skip to content

Fast and accurate short read alignment with Burrows–Wheeler transform

Why this mattered

Li and Durbin’s BWA work made Burrows–Wheeler transform/FM-index alignment a practical foundation for high-throughput genomics. The key shift was not simply faster read mapping, but a different scaling regime: large mammalian references could be searched accurately with modest memory, while preserving enough speed for the exploding output of next-generation sequencers. This helped move whole-genome resequencing from a specialized computational bottleneck toward a routine analysis step.

After BWA, short-read and longer-read alignment became less dependent on slower hash-based tools such as BLAT and SSAHA2 for large genomes. The paper showed that indexed, compressed reference search could deliver both accuracy and throughput, making it feasible to align millions to billions of reads as a standard precursor to SNP calling, indel detection, copy-number analysis, RNA-seq quantification, and population-scale genomics. In practice, BWA became one of the core aligners behind the sequencing pipelines that enabled projects such as the 1000 Genomes Project and later clinical and population sequencing efforts.

Its longer-term importance is that it helped establish read alignment as a stable infrastructure layer for genomics. Subsequent breakthroughs in variant calling, cancer genomics, ancient DNA, metagenomics, and clinical sequencing depended on reliable, scalable mapping to reference genomes. Later aligners and graph-based methods would revise parts of this model, especially for long reads and structurally diverse genomes, but BWA defined the dominant computational pattern for the short-read era: compact indexing, fast approximate matching, and integration into reproducible genome analysis workflows.

Abstract

MOTIVATION: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. RESULTS: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is approximately 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. AVAILABILITY: http://maq.sourceforge.net.

Sources