Identification of common molecular subsequences¶
Why this mattered¶
Smith and Waterman’s 1981 paper made local sequence alignment a mathematically rigorous optimization problem. Earlier comparison methods could find similarities, but this work gave a dynamic programming algorithm that identifies the highest-scoring shared subsequences between two molecular sequences under an explicit scoring scheme. The key shift was from asking whether two full sequences could be globally aligned to asking where, within longer sequences, statistically and biologically meaningful conserved regions occur.
That mattered because biology often preserves domains, motifs, active sites, and exons rather than entire molecules end to end. The Smith-Waterman algorithm made it possible to detect these local similarities exactly, providing a foundation for inferring homology, function, and evolutionary relationships from DNA, RNA, and protein sequences. As sequence databases grew, this exact formulation became the benchmark against which faster heuristic tools were judged.
Its influence runs through later computational biology: BLAST and FASTA traded exact optimality for speed, but their central task was shaped by the local-alignment problem Smith and Waterman formalized. The paper helped turn molecular sequence comparison into a core quantitative method of genomics, enabling database search, annotation of newly sequenced genes, comparative genomics, and the interpretation of conserved functional elements across species.
Abstract¶
(no abstract available)
Related¶
- cite → A general method applicable to the search for similarities in the amino acid sequence of two proteins — The common molecular subsequences paper builds on Needleman-Wunsch dynamic programming for sequence similarity by focusing on efficient identification of shared subsequences.
- enables → Initial sequencing and analysis of the human genome — Common-subsequence dynamic programming enabled sequence alignment methods used to assemble and annotate the human genome.
- enables → Accelerated Profile HMM Searches — Dynamic-programming sequence comparison supplied the alignment foundation underlying profile HMM construction and search.
- enables → Ultrafast and memory-efficient alignment of short DNA sequences to the human genome — Smith-Waterman local alignment defined dynamic-programming sequence matching, which Bowtie replaced with FM-index search to align short reads much faster.
- enables → Librispeech: An ASR corpus based on public domain audio books — Common-subsequence dynamic programming enabled sequence-alignment style methods later used in speech recognition pipelines evaluated on LibriSpeech.
- enables → Improved tools for biological sequence comparison. — Smith-Waterman introduced local sequence alignment, which FASTA approximated efficiently for rapid database similarity searches.
- enables → The Sequence of the Human Genome — Smith-Waterman local alignment enabled rigorous subsequence comparison, underpinning sequence assembly and annotation methods used for the human genome.
- cite ← Initial sequencing and analysis of the human genome — The Human Genome paper cites the common molecular subsequence algorithm as a foundation for sequence alignment used in genome assembly and comparison.
- cite ← Accelerated Profile HMM Searches — Accelerated Profile HMM Searches builds on Smith-Waterman dynamic programming for local sequence alignment of common molecular subsequences.
- cite ← Ultrafast and memory-efficient alignment of short DNA sequences to the human genome — Bowtie's short-read alignment strategy relies on suffix-tree-style string matching concepts introduced for identifying common molecular subsequences.
- cite ← Librispeech: An ASR corpus based on public domain audio books — LibriSpeech cites common molecular subsequence algorithms for dynamic-programming sequence alignment used in preparing or validating speech transcripts.
- cite ← Improved tools for biological sequence comparison. — FASTA builds on Smith-Waterman local alignment by using faster heuristics to find common molecular subsequences.
- cite ← The Sequence of the Human Genome — The human genome paper builds on Smith-Waterman local alignment as a foundational method for detecting conserved molecular subsequences.
- enables ← A general method applicable to the search for similarities in the amino acid sequence of two proteins — Needleman-Wunsch dynamic programming for pairwise sequence alignment enabled Smith-Waterman local alignment to identify conserved molecular subsequences.