# Burrows-Wheeler-Transformation

The bwt can be read off of the last column of all sorted text rotations of a given text $T$. Let $n\in\mathbb{N}$, then we define $s:\Sigma^n\to\Sigma^n$ as function that permutates it's input text. Specifically, it takes/pops the first character and appends it to the end. Now, for convenience, $s_i$ is just $s$ applied $i$ times. Let me illustrate what we essentially have to do:

$\begin{array}{rl} i & s\_i(T) \\ \hline 1 & \mathtt{mississippi\} \\ 2 & \mathtt{ississippi\m} \\ 3 & \mathtt{ssissippi\mi} \\ 4 & \mathtt{sissippi\mis} \\ 5 & \mathtt{issippi\miss} \\ 6 & \mathtt{ssippi\missi} \\ 7 & \mathtt{sippi\missis} \\ 8 & \mathtt{ippi\mississ} \\ 9 & \mathtt{ppi\mississi} \\ 10 & \mathtt{pi\mississip} \\ 11 & \mathtt{i\mississipp} \\ 12 & \mathtt{\mississippi} \\ \end{array}\overset{sort}{\Rightarrow} \begin{array}{rl} i & s\_i(T) \\ \hline 12 & \mathtt{\mississipp{\color{green}i}}\\ 11 & \mathtt{i\mississip{\color{green}p}}\\ 8 & \mathtt{ippi\missis{\color{green}s}}\\ 5 & \mathtt{issippi\mis{\color{green}s}}\\ 2 & \mathtt{ississippi\{\color{green}m}}\\ 1 & \mathtt{mississippi{\color{green}\}}\\ 10 & \mathtt{pi\mississi{\color{green}p}}\\ 9 & \mathtt{ppi\mississ{\color{green}i}}\\ 7 & \mathtt{sippi\missi{\color{green}s}}\\ 4 & \mathtt{sissippi\mi{\color{green}s}}\\ 6 & \mathtt{ssippi\miss{\color{green}i}}\\ 3 & \mathtt{ssissippi\m{\color{green}i}}\\ \end{array}$

Our bwt is then $\mathtt{ipssm\$pissii}$. Note that$\$$is a special character that stands for the end of the string. A very interesting connection exists between suffix arrays and the bwt. To investigate this further, let's make clear what a suffix array is: We say \mathtt{pos} is an array of positions \left\{0,\dotsc,n-1\right\} which represent the beginning of a suffix. They are ordered lexicographically. So, in a similar fashion to the bwt, we can construct the suffix array: $\begin{array}{rl} i & \operatorname{suff}_i(T) \\ \hline 0 & \mathtt{mississippi\} \\ 1 & \mathtt{ississippi\} \\ 2 & \mathtt{ssissippi\} \\ 3 & \mathtt{sissippi\} \\ 4 & \mathtt{issippi\} \\ 5 & \mathtt{ssippi\} \\ 6 & \mathtt{sippi\} \\ 7 & \mathtt{ippi\} \\ 8 & \mathtt{ppi\} \\ 9 & \mathtt{pi\} \\ 10 & \mathtt{i\} \\ 11 & \mathtt{\} \\ \end{array}\overset{sort}{\Rightarrow} \begin{array}{rl} \downarrow\mathtt{pos} & \operatorname{suff}_i(T) \\ \hline 11 & \mathtt{\}\\ 10 & \mathtt{i\}\\ 7 & \mathtt{ippi\}\\ 4 & \mathtt{issippi\}\\ 1 & \mathtt{ississippi\}\\ 0 & \mathtt{mississippi\}\\ 9 & \mathtt{pi\}\\ 8 & \mathtt{ppi\}\\ 6 & \mathtt{sippi\}\\ 3 & \mathtt{sissippi\}\\ 5 & \mathtt{ssippi\}\\ 2 & \mathtt{ssissippi\}\\ \end{array}$ Now, a very interesting connection between the bwt and the suffix array hides behind a corner. Take the sorted suffixes and prepend the character right before it. So, if we have the suffix \mathtt{ssippi}, we transform it to \mathtt{{\color{green}i}ssippi}. A special case for the non-whole-suffix \mathtt{mississippi\}: We just prepend the \$$ to it. The final table will look like this: (left, bwt is right)

$\begin{array}{lr} \begin{array}{rl} \downarrow\mathtt{pos} & {\color{green}c}+\operatorname{suff}_i(T) \\ \hline 11 & \mathtt{{\color{green}i}\}\\ 10 & \mathtt{{\color{green}p}i\}\\ 7 & \mathtt{{\color{green}s}ippi\}\\ 4 & \mathtt{{\color{green}s}issippi\}\\ 1 & \mathtt{{\color{green}m}ississippi\}\\ 0 & \mathtt{{\color{green}\}mississippi\}\\ 9 & \mathtt{{\color{green}p}pi\}\\ 8 & \mathtt{{\color{green}i}ppi\}\\ 6 & \mathtt{{\color{green}s}sippi\}\\ 3 & \mathtt{{\color{green}s}sissippi\}\\ 5 & \mathtt{{\color{green}i}ssippi\}\\ 2 & \mathtt{{\color{green}i}ssissippi\}\\ \end{array} & \begin{array}{rl} i & s_i(T) \\ \hline 12 & \mathtt{\mississipp{\color{green}i}}\\ 11 & \mathtt{i\mississip{\color{green}p}}\\ 8 & \mathtt{ippi\missis{\color{green}s}}\\ 5 & \mathtt{issippi\mis{\color{green}s}}\\ 2 & \mathtt{ississippi\{\color{green}m}}\\ 1 & \mathtt{mississippi{\color{green}\}}\\ 10 & \mathtt{pi\mississi{\color{green}p}}\\ 9 & \mathtt{ppi\mississ{\color{green}i}}\\ 7 & \mathtt{sippi\missi{\color{green}s}}\\ 4 & \mathtt{sissippi\mi{\color{green}s}}\\ 6 & \mathtt{ssippi\miss{\color{green}i}}\\ 3 & \mathtt{ssissippi\m{\color{green}i}}\\ \end{array} \end{array}$

Now it is very easy to verify that $bwt[i] = T[pos[i] - 1]$, ignoring boundary conditions. ##### Matthis Kruse
###### Masters Student

My research interests include programming languages and compilers.

Next
Previous