Let’s Circle Back: The Viterbi Algorithm

3 min readJun 29, 2021

An algorithm named after one guy, but discovered independently on at least seven occasions.

The Viterbi Algorithm was developed for use in communications. It’s a dynamic programming algorithm which has many present day applications. One of these is in Natural Language Processing.

(Natural Language Processing, NLP, is a field of study combining linguistics, computer science, and AI. It focuses on the interactions between computers and human language.)

Within NLP, one challenge is in speech-to-text conversions. The Viterbi Algorithm can break down the problem of assigning text to speech components by backtracking to eliminate unlikely outcomes.

A Simplified Example.

Let’s say someone leaves you a voicemail. Weird, right? Why didn’t they just text? Maybe your phone provides you with a transcript of the message so that you can avoid the anxiety of having to check your voicemails.

“Hi Mary, have you seen my keys anywhere?”

There are all sorts of ways your phone can get this wrong. Many of these words are homophones, sounding the same but with distinct meanings and spellings.

“High marry, have ewe scene my quays anywhere?”

The algorithm treats the sounds as evidence (observations) and the text as parameters (states). It also utilises transition probabilities, which are the likelyhood of one event following another, given that they are not independent. The likelyhood of rolling a six and then a one is one in 36, the product of their probabilities, but the likelyhood of hot weather persisting from one day to the next is greater than P(HotDay)*P(HotDay).

Once the program determines that it has heard the sound [ˈhaɪ], the Viterbi Algorithm calculates starting probabilities, P(“High”|[ˈhaɪ]) and P(“Hi”|[ˈhaɪ]). The parameter with the higher probability is stored. Then, using a transition probability, it compares “Mary” and “marry” for the second word, given that it heard [ˈmɛrɪ], and is following “Hi” .

And so it goes over the course of the question, compounding the probabilities and eliminating the less likely text fragments so that it only ever has to compare states for the current observation, rather than every possible version of the entire sentence.

Big Data

With toy examples, like the Healthy v. Feverish given feeling normal, cold or dizzy example on the Wikipedia page for the Viterbi Algorithm, these compounded probabilities get very small, very quickly.

In the village doctor example, you start with a base probability of 60% that a person is healthy and 40% that they’re ill. If you’re healthy, you have a 70% chance of being healthy the following day, and if you’re feverish, the chance that you will be feverish again tomorrow is 60%. Healthy people report feeling normal 50% of the time and cold 40% of the time, and feverish people report feeling dizzy 60% of the time and cold 30% of the time.

If you observe a patient feeling normal, cold, and dizzy on subsequent days, their condition was most likely Healthy the first and second days, and Feverish the third. However, since all the probabilities above are relatively low, the probability of someone being healthy, heathy, and feverish on each of three days, given they report feeling normal, cold, and dizzy, is only 1.5%.

Given large amounts of data, you can say with far more certainly than 60% that a voicemail starts with the word “Hi” and not “High”, and so on.

Let’s Circle Back: The Viterbi Algorithm

An algorithm named after one guy, but discovered independently on at least seven occasions.

A Simplified Example.

Big Data

Written by Clare Dunne