## Genetics: Sequence Alignment

Source: Korf, Yandell, & Bedell. BLAST: An Essential Guide to the Basic Local Aligment Search Tool

Aligning two sequences is a non-trivial task if we are to allow for gaps. Here, we will step through two well-known algorithms, the Needleman-Wunsch Algorithm, which is used for finding the maximum global alignment; then the Smith-Waterman Algorithm, used to find local alignments.

## Needleman-Wunsch

The Needleman-Wunsch algorithm is used to find a global alignment. The steps can be divided into three phases: the Initialization phase, the Fill phase, and the Trace-Back phase.

### Initialization

Assign values for the first row and the first column.

 C O E L A C A N T H 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 P -1 E -2 L -3 I -4 C -5 A -6 N -7

The arrows point back to the origin, because all alignments need to go back to the origin (as we are trying to determine the best global alignment). The scoring scheme is +1 or -1 (if two letters are the same, we +1, if not, we -1). The initial scores, then, are the gap scores multiplied by the distance from the origin.

### Fill

In the fill phase (or the induction phase), we fill the matrix with scores and pointers. The operation that we use needs the scores from the diagonal, vertical, and horizontal neighbouring cells.

We compute three scores:
The Match Score - the sum of the diagonal cell score and the score for a match (-1 or +1)
The Vertical Gap Score - The sum of the cell above and the gap score (-1)
The Horizontal Gap Score - The sum of the cell to the left and the gap score (-1)

Compute all three values, and assign the maximum value to the cell. Then point the arrow in the direction of the maximum score. Continue until the matrix is filled. Then, each cell will contain the score and the pointer to the best possible alignment at that point.

There is actually only one place you can start, the one where you have all the three required neigbours.

Match score: SUM( diagonal cell = 0, score for match [align C TO p] = -1 ) = 0 + -1 = -1
Vertical Gap Score: SUM( cell above = -1, gap score = -1 ) = -1 + -1 = -2
Horizontal Gap Score: SUM( cell to left = -1, gap score = -1 ) = -1 + -1 = -2

We have computed all three scores, and the maximum of those is -1. So this is the first cells value.

 C O E L A C A N T H 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 P -1 -1 E -2 L -3 I -4 C -5 A -6 N -7

Let's calculate the cell to the right.

Match score: SUM( diagonal cell = -1, score for match [align o TO p] = -1 ) = -1 + -1 = -2
Vertical Gap Score: SUM( cell above = -2, gap score = -1 ) = -1 + -1 = -3
Horizontal Gap Score: SUM( cell to left = -1, gap score = -1 ) = -1 + -1 = -2

And max( -2, -2, -3 ) = -2, so that is the value for your cell. ... And so on.

 C O E L A C A N T H 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 P -1 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 E -2 -2 -2 -1 -0 -3 -4 -5 -6 -7 -8 L -3 -3 -3 -2 -2 -1 -2 -3 -4 -5 -6 I -4 -4 C -5 -3 A -6 N -7

### Trace-Back

In the Trace-Back phase, you recover the alignment from the matrix:
Start at the bottom-right corner and follow the arrows til you get to the beginning. As you go, write out the letters, or a hyphen for a gap. Finally, reverse the string.

Try it!

Sorry, but this code used to much cpu power :-( You cannot try it from here anymore

## Smith-Waterman

Small modifications to the Needleman-Wunsch can lead to a drastically changed behaviour. If we initialize the edges to 0 and never let the maximum score be less than 0, and start the trace-back at the highest score in the matrix, we get an algorithm that finds the best local alignment instead of the global alignment

Try it!

Sorry, but this code used to much cpu power :-( You cannot try it from here anymore

## Amino Acids

http://www.chemie.fu-berlin.de/chemistry/bio/amino-acids_en.html