Accelerated Profile HMM Searches¶
Why this mattered¶
Before this work, profile HMMs were already one of the most principled ways to detect remote sequence homology: they modeled position-specific conservation, insertions, deletions, and uncertainty more naturally than pairwise alignment heuristics. The limitation was practical rather than conceptual. Sensitive profile-HMM searches were often too slow for routine use at the scale of growing protein databases, so many workflows still depended on faster but less expressive tools such as BLAST. Eddy’s 2011 paper mattered because it changed that tradeoff. By introducing the MSV filter and sparse rescaling, HMMER3 made probabilistic profile-HMM search fast enough to become an everyday database-search instrument rather than a specialist method reserved for smaller or slower analyses.
The paradigm shift was not merely “faster HMMER.” The paper showed that a carefully designed heuristic pipeline could preserve nearly all of the sensitivity of full profile-HMM inference while rejecting most database sequences cheaply. MSV supplied a statistically interpretable, vectorized first-pass filter; promising hits then flowed into more exact Forward/Backward analysis. This made it possible to search large protein databases with profile HMMs at roughly BLAST-like speeds while retaining the advantages of profile-based probabilistic modeling. In practice, that enabled broader and more systematic annotation of protein families, domains, and remote homologs, especially through resources and workflows built around HMMER and profile-HMM libraries such as Pfam.
Its later importance lies in how it helped normalize a pattern now common in computational biology: use a fast, statistically calibrated filter to make a richer probabilistic or model-based method usable at scale. HMMER3 became infrastructure for genome annotation, metagenomics, protein-family curation, and comparative genomics, where millions to billions of sequence comparisons are routine. Subsequent breakthroughs in large-scale protein analysis, from massive reference databases to modern structure and function prediction pipelines, depended on reliable ways to place sequences into evolutionary families. This paper helped make that family-level search both sensitive and computationally routine.
Abstract¶
Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.
Related¶
- cite → Basic local alignment search tool — Accelerated Profile HMM Searches compares HMMER's profile-HMM sequence search speed and sensitivity against BLAST's local alignment heuristic.
- cite → Identification of common molecular subsequences — Accelerated Profile HMM Searches builds on Smith-Waterman dynamic programming for local sequence alignment of common molecular subsequences.
- enables → Highly accurate protein structure prediction with AlphaFold — Accelerated profile-HMM searches enabled AlphaFold's use of deep multiple-sequence alignments to extract evolutionary constraints for protein-structure prediction.
- enables → ColabFold: making protein folding accessible to all — HHblits-style accelerated profile-HMM search enabled ColabFold to rapidly build multiple-sequence alignments for protein-structure prediction.
- cite ← Highly accurate protein structure prediction with AlphaFold — AlphaFold uses fast profile-HMM sequence-search tools such as HMMER to construct multiple sequence alignments for protein structure prediction.
- cite ← ColabFold: making protein folding accessible to all — ColabFold uses HHblits-style accelerated profile HMM searches to generate multiple-sequence alignments for protein-structure prediction.
- enables ← Basic local alignment search tool — BLAST popularized fast heuristic local sequence search, setting the computational benchmark that accelerated profile HMM searches aimed to surpass.
- enables ← Identification of common molecular subsequences — Dynamic-programming sequence comparison supplied the alignment foundation underlying profile HMM construction and search.