Genetics: Sequence Alignment
Source: Korf, Yandell, & Bedell. BLAST: An Essential Guide to the Basic Local Aligment Search Tool
Aligning two sequences is a non-trivial task if we are to allow for gaps. Here, we will step through two well-known algorithms, the Needleman-Wunsch Algorithm, which is used for finding the maximum global alignment; then the Smith-Waterman Algorithm, used to find local alignments. Needleman-WunschThe Needleman-Wunsch algorithm is used to find a global alignment. The steps can be divided into three phases: the Initialization phase, the Fill phase, and the Trace-Back phase. InitializationAssign values for the first row and the first column.
The arrows point back to the origin, because all alignments need to go back to the origin (as we are trying to determine the best global alignment). The scoring scheme is +1 or -1 (if two letters are the same, we +1, if not, we -1). The initial scores, then, are the gap scores multiplied by the distance from the origin. FillIn the fill phase (or the induction phase), we fill the matrix with scores and pointers. The operation that we use needs the scores from the diagonal, vertical, and horizontal neighbouring cells. We compute three scores: Compute all three values, and assign the maximum value to the cell. Then point the arrow in the direction of the maximum score. Continue until the matrix is filled. Then, each cell will contain the score and the pointer to the best possible alignment at that point. There is actually only one place you can start, the one where you have all the three required neigbours. Match score: SUM( diagonal cell = 0, score for match
[align C TO p] = -1 ) = 0 + -1 = -1 We have computed all three scores, and the maximum of those is -1. So
this is the first cells value.
Let's calculate the cell to the right. Match score: SUM( diagonal cell = -1, score for match
[align o TO p] = -1 ) = -1 + -1 = -2 And max( -2, -2, -3 ) = -2, so that is the value for your cell. ... And so on.
Trace-BackIn the Trace-Back phase, you recover the alignment from the matrix: Try it! Smith-WatermanSmall modifications to the Needleman-Wunsch can lead to a drastically changed behaviour. If we initialize the edges to 0 and never let the maximum score be less than 0, and start the trace-back at the highest score in the matrix, we get an algorithm that finds the best local alignment instead of the global alignment Try it! Amino Acidshttp://www.chemie.fu-berlin.de/chemistry/bio/amino-acids_en.html |
|