Genetics: Sequence Alignment
Source: Korf, Yandell, & Bedell. BLAST: An Essential Guide to the Basic Local Aligment Search Tool
Aligning two sequences is a nontrivial task if we are to allow for gaps. Here, we will step through two wellknown algorithms, the NeedlemanWunsch Algorithm, which is used for finding the maximum global alignment; then the SmithWaterman Algorithm, used to find local alignments. NeedlemanWunschThe NeedlemanWunsch algorithm is used to find a global alignment. The steps can be divided into three phases: the Initialization phase, the Fill phase, and the TraceBack phase. InitializationAssign values for the first row and the first column.
The arrows point back to the origin, because all alignments need to go back to the origin (as we are trying to determine the best global alignment). The scoring scheme is +1 or 1 (if two letters are the same, we +1, if not, we 1). The initial scores, then, are the gap scores multiplied by the distance from the origin. FillIn the fill phase (or the induction phase), we fill the matrix with scores and pointers. The operation that we use needs the scores from the diagonal, vertical, and horizontal neighbouring cells. We compute three scores: Compute all three values, and assign the maximum value to the cell. Then point the arrow in the direction of the maximum score. Continue until the matrix is filled. Then, each cell will contain the score and the pointer to the best possible alignment at that point. There is actually only one place you can start, the one where you have all the three required neigbours. Match score: SUM( diagonal cell = 0, score for match
[align C TO p] = 1 ) = 0 + 1 = 1 We have computed all three scores, and the maximum of those is 1. So
this is the first cells value.
Let's calculate the cell to the right. Match score: SUM( diagonal cell = 1, score for match
[align o TO p] = 1 ) = 1 + 1 = 2 And max( 2, 2, 3 ) = 2, so that is the value for your cell. ... And so on.
TraceBackIn the TraceBack phase, you recover the alignment from the matrix: Try it! SmithWatermanSmall modifications to the NeedlemanWunsch can lead to a drastically changed behaviour. If we initialize the edges to 0 and never let the maximum score be less than 0, and start the traceback at the highest score in the matrix, we get an algorithm that finds the best local alignment instead of the global alignment Try it! Amino Acidshttp://www.chemie.fuberlin.de/chemistry/bio/aminoacids_en.html 
