This is the first example I learned in dynamic programming. The problem can be described simply. Now here are several time intervals named $\{1,2,\cdots,n\}$, with each $i$ has a starting tiem $s_i$, a finish time $f_i$, and a weight $w_i$. Our goal is to select a subset of $\{1,\cdots,n\}$ to maximize the sum of their weights while they do not overlap.
Suppose the interval sequence $\{1,2,\cdots,n\}$ has been sorted by their finish time in an increasing order. Define \[p(i):=\max \{j\in \{1,2,\cdots i-1\}: f_j<s_i\},\] i.e. the largest index $j$ that is on the left of $i$ and does not overlap $i$. Let $\mathcal{O}_i$ be the optimal subset we select from $\{1,2,\cdots,i\}$. $OPT(i)$ is the sum of weights of intervals in $\mathcal{O}_i$, i.e. \[OPT(i):=\sum_{k\in \mathcal{O}_i}w_k.\] Then we have a dichotomy: if $i\in \mathcal{O}_i$, then \[\mathcal{O}_i = \mathcal{O}_{p(i)}\cup \{i\};\] if $i\notin \mathcal{O}_i$, then \[\mathcal{O}_i = \mathcal{O}_{i-1}.\] Therefore we have the recursive equation for $OPT(i)$, \[OPT(i)=\max \{OPT(i-1), w_i+OPT(p(i))\}\tag{1}\]
Then we can follow eq(1) to select interval $i$. Indeed, as long as we calculate $OPT(i), \,i=1,2,\cdots n$, we can find the optimal solution directly. Here is the implementation of this idea in C++ code.
#include<iostream>
#include<algorithm>
using namespace std;
//The goal of this code is to solve interval scheduling problem. There is n intervals with different value, we want to select several intervals with largest value sum.
int n;//total number of intervals
struct Intervals
{
double s;//start time
double f;//finish time
double val;//value
bool if_selected = false;
bool operator < (const Intervals& A) const {
return f < A.f;//sort intervals by finish time
}
}Itv[10000];
double OptVal[10000];// OptVal[i] means the best value sum among 1~i
void GetIntervals()
{
Itv[0].s = 0;
Itv[0].f = 0;
Itv[0].val = 0;
cout << "Please input the total number of intervals: ";
cin >> n;
//input intervals
cout << "Please input these "<< n <<" intervals:" <> Itv[i].s;
cin >> Itv[i].f;
cin >> Itv[i].val;
}
return;
}
int p(int j)//calculate the largest interval on the left of i that does not overlap j
{
if (j == 0)
return 0;
for (int i = j - 1; i >= 0; i --)
if (Itv[i].f <= Itv[j].s)
{
return i;
}
}
void SelectItv(int j)
{
for (int i = 1; i <= j; i ++)
{
double left,right;
left = Itv[i].val + OptVal[p(i)];
right = OptVal[i - 1];
if (left > right)
{
OptVal[i] = left;
Itv[i].if_selected = true;//choose interval i
for (int k = i - 1; k > p(i); k --)
Itv[k].if_selected = false;
SelectItv(p(i));//Shrink the size of problem to 1~p(i)
}
else
{
OptVal[i] = right;// do not choose interval i
}
}
return;
}
int main()
{
OptVal[0] = 0;
GetIntervals();
sort(Itv+1, Itv+1+n); //sort in the structure [1,n)
SelectItv(n);
cout << "The selected intervals are: " << endl;
for (int i = 1; i <= n; i ++ )
{
if (Itv[i].if_selected)
cout << "[" << Itv[i].s << "," << Itv[i].f << "]" << "value " << Itv[i].val<< endl;
}
cout <<"The optimal value sum is " << OptVal[n] << " ." << endl;
return 0;
}
Another famous example of dynamic programming is longest common subsequence (LCS) problem and sequence alignment problem. Let's articulate this problem first. Suppose we have two strings $A=a_1a_2\cdots a_m$ and $B=b_1b_2\cdots b_n$, where $a_i,b_i$ are characters from a fixed list, such as 26 letters in alphabet list. Sequence $S=s_1s_2\cdots s_l$ is called the subsequence of $A$ and $B$, if
Our goal is to find a subsequence $S$ of $A$ and $B$ with the longest length. In the optimal solution $S$, either $s_l=a_m=b_n$, or $a_m\neq b_n$. And if $a_m\neq b_n$, $a_m$ and $b_n$ cannot both appear in $S$, otherwise if there is $a_i(i< m)$ and $b_j(j< n)$ such that $a_m = b_j$ and $a_i = b_n$ are both in $S$, then it will contradict the second requirement in the definition of subsequence. In other word, if $a_m \neq b_n$, either $a_m$ or $b_n$ is not in $S$.
Define $OPT(i,j)$ be the length of $LCS$ of substring $A_i=a_1a_2\cdots a_i$ and $B_j=b_1b_2\cdots b_j$. Based on above analysis, we have the following recursive equation, \[OPT(i,j)=\begin{cases}OPT(i-1,j-1)+1&a_i=b_j\\ \max\{OPT(i-1,j),OPT(i,j-1)\}&a_i\neq b_j\end{cases}\tag{2}\] And define $OPT(i,0)=OPT(0,j)=0$.
Thus we can first use eq(2) to calculate all $OPT(i,j)$ from $OPT(1,1)$ to $OPT(m,n)$ recursively, and trace back the $OPT$ matrix to find the optimal solution. Below is the realization of the algorithm by Python.
def FindLCS(str1, str2):
m = len(str1)
n = len(str2)
M = [[0 for x in range(n+1)] for x in range(m+1)]
L = [[0 for x in range(n)] for x in range(m)]
for i in range(m+1):
M[i][0] = 0
for j in range(n+1):
M[0][j] = 0
for i in range(m): #0,1,...,m-1
for j in range(n): #0,1,...,n-1
if str1[i] == str2[j]: # str1[0], str2[0] are corresponding to M[1][1]
M[i+1][j+1]= M[i][j] + 1
L[i][j] = 'Hit'
elif M[i+1][j] > M[i][j+1]:
M[i+1][j+1] = M[i+1][j]
L[i][j] = 'J-1'
else:
M[i+1][j+1] = M[i][j+1]
L[i][j] = 'I-1'
#output
LCS = ''
i = m -1
j = n -1
while i > -1 and j > -1 :
if L[i][j] == 'Hit':
LCS = str1[i] + LCS
i = i-1
j = j-1
elif L[i][j] == 'I-1' :
i = i - 1
else:
j = j - 1
print('String 1 is:', str1)
print('String 2 is:', str2)
print('Their longest common subsequence is:', LCS)
return
#example
A = 'GHJKINKK'
B = 'GFHJIK'
FindLCS(A,B)
[Output] String 1 is: GHJKINKK String 2 is: GFHJIK Their longest common subsequence is: GHJIK
Alignment problem is similar to LCS problem, but needs more steps to calculate mismatch and gap penalty. We can also define the score OPT(i,j) \[OPT(i,j) = \max \{\alpha_{ij} + OPT(i-1,j-1), OPT(i-1,j) +\delta, OPT(i,j-1)+\delta\}\] where $\alpha_{ij}$ is the similarity score between character $a_i$ and $b_j$, $\delta$ is the gap penalty. For the boundary, we can define $OPT(i,0)=i\delta$ and $OPT(0,j)=j\delta$. For example, we set $\alpha_{ij}=5$ if they are the same, while $\alpha_{ij}=-4$ if they are different, and set $\delta = -2$. Below is a realization of DNA alignment algorithm.
def DNA_align(str1, str2):
m = len(str1)
n = len(str2)
GP = 2 #GAP PENALTY
M = [[0 for x in range(n+1)] for x in range(m+1)]
L = [[0 for x in range(n+1)] for x in range(m+1)]
for i in range(m+1):
M[i][0] = -i*GP
L[i][0] = 'I-1'
for j in range(n+1):
M[0][j] = -j*GP
L[0][j] = 'J-1'
for i in range(m): #0,1,...,m-1
for j in range(n): #0,1,...,n-1
if str1[i] == str2[j]:
val1 = M[i][j] + 5
elif str1[i] != str2[j]:
val1 = M[i][j] - 4
val2 = M[i+1][j] - GP
val3 = M[i][j+1] - GP
val_max = max(val1,val2,val3)
M[i+1][j+1] = val_max
if val1 == val_max:
if str1[i] == str2[j]:
L[i+1][j+1] = 'Hit'
elif str1[i] != str2[j]:
L[i+1][j+1] = 'Unmatched'
elif val2 == val_max:
L[i+1][j+1] = 'J-1'
else:
L[i+1][j+1] = 'I-1'
LCS = ''
S1 =''
S2 =''
i = m
j = n
while i > -1 and j > -1:
if i == 0 and j == 0:
break
if L[i][j] == 'Hit':
LCS = '|' + LCS
S1 = str1[i-1] + S1
S2 = str2[j-1] + S2
i = i-1
j = j-1
elif L[i][j] == 'Unmatched':
LCS = ' ' + LCS
S1 = str1[i-1] + S1
S2 = str2[j-1] + S2
i = i-1
j = j-1
elif L[i][j] == 'I-1' :
LCS = ' ' + LCS
S1 = str1[i-1] + S1
S2 = '-' + S2
i = i - 1
else:
LCS = ' ' + LCS
S1 = '-' + S1
S2 = str2[j-1] + S2
j = j - 1
print('string 1:',S1)
print(' ',LCS)
print('string 2:',S2)
return
#run an example
A = 'CGCGTTATTTCGAAACGCGCGCGCGC'
B = 'CGTAGGCTCGCAAAAAGCGCGCGCGCG'
DNA_align(A,B)
[Output]
string 1: CGCGTTA-TTTCG--AAACGCGCGCGCGC-
|| || ||| ||| ||||||||||
string 2: --CG-TAGGCTCGCAAAAAGCGCGCGCGCG