Fast and accurate short read alignment with Burrows–Wheeler transform¶
Why this mattered¶
Li and Durbin’s BWA work made Burrows–Wheeler transform/FM-index alignment a practical foundation for high-throughput genomics. The key shift was not simply faster read mapping, but a different scaling regime: large mammalian references could be searched accurately with modest memory, while preserving enough speed for the exploding output of next-generation sequencers. This helped move whole-genome resequencing from a specialized computational bottleneck toward a routine analysis step.
After BWA, short-read and longer-read alignment became less dependent on slower hash-based tools such as BLAT and SSAHA2 for large genomes. The paper showed that indexed, compressed reference search could deliver both accuracy and throughput, making it feasible to align millions to billions of reads as a standard precursor to SNP calling, indel detection, copy-number analysis, RNA-seq quantification, and population-scale genomics. In practice, BWA became one of the core aligners behind the sequencing pipelines that enabled projects such as the 1000 Genomes Project and later clinical and population sequencing efforts.
Its longer-term importance is that it helped establish read alignment as a stable infrastructure layer for genomics. Subsequent breakthroughs in variant calling, cancer genomics, ancient DNA, metagenomics, and clinical sequencing depended on reliable, scalable mapping to reference genomes. Later aligners and graph-based methods would revise parts of this model, especially for long reads and structurally diverse genomes, but BWA defined the dominant computational pattern for the short-read era: compact indexing, fast approximate matching, and integration into reproducible genome analysis workflows.
Abstract¶
MOTIVATION: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. RESULTS: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is approximately 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. AVAILABILITY: http://maq.sourceforge.net.
Related¶
- cite → Improved tools for biological sequence comparison. — BWA cites BLAST-style sequence comparison as a foundational alignment approach but improves speed for massive short-read datasets.
- cite → The Sequence Alignment/Map format and SAMtools — BWA relates to SAMtools through the SAM alignment format used to store and process short-read mapping results.
- cite → Ultrafast and memory-efficient alignment of short DNA sequences to the human genome — BWA compares against Bowtie as another Burrows-Wheeler-transform-based short DNA read aligner for human genome mapping.
- enables → A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping — Burrows-Wheeler short-read alignment enables Hi-C read mapping needed to build kilobase-resolution chromatin contact maps.
- enables → Mutational landscape determines sensitivity to PD-1 blockade in non–small cell lung cancer — BWA enabled efficient alignment of sequencing reads, a prerequisite for calling somatic mutations used to associate lung-cancer mutational burden with PD-1 blockade sensitivity.
- enables → Minimap2: pairwise alignment for nucleotide sequences — BWA's Burrows-Wheeler indexing enabled minimap2's emphasis on fast seed-and-extend alignment for large-scale nucleotide read mapping.
- enables → Integrative analysis of 111 reference human epigenomes — BWA enabled efficient alignment of high-throughput sequencing reads used to build Roadmap Epigenomics chromatin, methylation, and transcriptome maps.
- enables → Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding — BWA enabled SARS-CoV-2 genomic characterization by providing fast short-read alignment for assembling and comparing viral sequences.
- cite ← A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species — The GBS pipeline uses BWA's Burrows-Wheeler short-read alignment method to map sequencing tags to reference genomes.
- cite ← A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping — The 3D genome map relies on BWA short-read alignment to map Hi-C sequencing reads to the human genome.
- cite ← Circular RNAs are a large class of animal RNAs with regulatory potency — The circular-RNA study used BWA's Burrows-Wheeler short-read alignment to map RNA-seq reads when detecting back-splice junctions.
- cite ← A framework for variation discovery and genotyping using next-generation DNA sequencing data — The GATK framework depends on BWA's Burrows-Wheeler short-read alignment to map sequencing reads before variant discovery.
- cite ← Mutational landscape determines sensitivity to PD-1 blockade in non–small cell lung cancer — The NSCLC PD-1 study uses BWA-style Burrows-Wheeler short-read alignment for sequencing-based mutation detection.
- cite ← Minimap2: pairwise alignment for nucleotide sequences — Minimap2 builds on the Burrows-Wheeler-transform alignment lineage represented by BWA while adapting alignment to long noisy reads and assemblies.
- cite ← Integrative analysis of 111 reference human epigenomes — The Roadmap Epigenomics pipelines use BWA-style Burrows-Wheeler short-read alignment to map sequencing reads from epigenomic assays to the human genome.
- cite ← Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding — The SARS-CoV-2 genomic study uses BWA short-read alignment to map sequencing reads for viral genome assembly and comparison.
- enables ← Improved tools for biological sequence comparison. — FASTA advanced fast heuristic sequence comparison, motivating the speed-accuracy tradeoff later pushed much further by BWA's Burrows-Wheeler short-read indexing.