Improved tools for biological sequence comparison.¶
Why this mattered¶
TBD
Abstract¶
We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.
Related¶
- cite → A general method applicable to the search for similarities in the amino acid sequence of two proteins — FASTA improves on the Needleman-Wunsch dynamic programming method for detecting amino-acid sequence similarity.
- cite → Identification of common molecular subsequences — FASTA builds on Smith-Waterman local alignment by using faster heuristics to find common molecular subsequences.
- enables → The Protein Data Bank — Improved sequence-comparison tools enable the Protein Data Bank by supporting protein sequence alignment and annotation for structural entries.
- enables → Fast and accurate short read alignment with Burrows–Wheeler transform — FASTA advanced fast heuristic sequence comparison, motivating the speed-accuracy tradeoff later pushed much further by BWA's Burrows-Wheeler short-read indexing.
- cite ← The Protein Data Bank — The Protein Data Bank cites FASTA-style sequence comparison as a tool for finding related biological sequences.
- cite ← Fast and accurate short read alignment with Burrows–Wheeler transform — BWA cites BLAST-style sequence comparison as a foundational alignment approach but improves speed for massive short-read datasets.
- cite ← Basic local alignment search tool — BLAST extends FASTA-style heuristic sequence searching with a faster local-alignment word-hit strategy.
- enables ← A general method applicable to the search for similarities in the amino acid sequence of two proteins — Needleman-Wunsch introduced dynamic-programming sequence alignment, which FASTA extended into faster practical tools for biological sequence comparison.
- enables ← Identification of common molecular subsequences — Smith-Waterman introduced local sequence alignment, which FASTA approximated efficiently for rapid database similarity searches.