What is a Hidden Markov Model?

A Hidden Markov Model (HMM) is a graphical model that describes how systems change over time. When modeling a time series with $T$ observations using an HMM, we assume that the observed data $y_{1:T}$ depends on hidden states $x_{1:T}$ that are not observed. Specifically, an HMM is a type of state-space model in which the hidden states are discrete.

The three components of an HMM are as follows:

An initial state distribution ($\pi$): which hidden states we are likely to start in.
A transition matrix ($A$): how the hidden states evolve over time.
An emission model: how the hidden states generate the observed data.

StateSpaceDynamics.HiddenMarkovModel — Type

HiddenMarkovModel

Store a Hidden Markov Model (HMM) with custom emissions.

Fields

A::AbstractMatrix{<:Real}: Transition matrix.
B::AbstractVector{<:EmissionModel}: State-dependent emission models.
πₖ::AbstractVector{<:Real}: Initial state distribution.
K::Int: Number of states.

source

The generative model is given by:

\[\begin{align*} x_1 &\sim \text{Cat}(\pi) \\ x_t &\mid x_{t-1} \sim \text{Cat}(A_{x_{t-1}, :}) \\ y_t &\mid x_t \sim p(y_t \mid \theta_{x_t}) \end{align*}\]

Where:

$x_t$ is the hidden (discrete) state at time $t$
$y_t$ is the observed data at time $t$
$\pi$ is the initial state distribution
$\mathbf{A}$ is the state transition matrix
$\theta_{x_t}$ are the parameters of the emission distribution for state $x_t$

The emission model can take many forms: Gaussian, Poisson, Bernoulli, categorical, etc... In the case of a Gaussian emission distribution, this becomes:

\[y_t \mid (x_t = k) \sim \mathcal{N}(\mu_k, \Sigma_k)\]

Where:

$\mu_k$ is the mean of the emission distribution for state $k$
$\Sigma_k$ is the covariance of the emission distribution for state $k$

What is a Generalized Linear Model - Hidden Markov Model

A Hidden Markov Model - Generalized Linear Model (GLM-HMM) - also known as Switching Regression Model - is an extension to classic HMMs where the emission models are state-dependent GLMs that link an observed input to an observed output. This formulation allows each hidden state to define its own regression relationship between inputs and outputs, enabling the model to capture complex, state-dependent dynamics in the data. Currently, StateSpaceDynamics.jl support Gaussian, Bernoulli, Poisson, and Autoregressive GLMs as emission models.

The generative model is as follows:

\[\begin{align*} x_1 &\sim \text{Cat}(\pi) \\ x_t &\mid x_{t-1} \sim \text{Cat}(A_{x_{t-1}, :}) \\ y_t &\mid x_t, u_t \sim p(y_t \mid \theta_{x_t}, u_t) \end{align*}\]

Where:

$x_t$ is the hidden (discrete) state at time $t$
$y_t$ is the observed output at time $t$
$$u_t$$ is the observed input (covariate) at time $t$
$\theta_{x_t}$ are the parameters of the GLM emission model for state $x_t$

Example Emission Models

For example, if the emission is a Gaussian GLM:

\[y_t \mid (x_t = k), u_t \sim \mathcal{N}(\mu_k + \beta_k^\top u_t, \sigma_k^2)\]

Where:

$\beta_k$ are the regression weights for state $k$
$\sigma_k^2$ is the state-dependent variance
$\mu_k$ is the state-dependent bias

If the emission is Bernoulli (for binary outputs):

\[y_t \mid (x_t = k), u_t \sim \text{Bernoulli} \left( \sigma \left( \mu_k + \beta_k^\top u_t \right) \right)\]

Where:

$\beta_k$ are the regression weights for state $k$
$\sigma(\cdot)$ is the logistic sigmoid function for binary outputs
$\mu_k$ is the state-dependent bias

Sampling from an HMM

You can generate synthetic data from an HMM:

Base.rand — Function

Random.rand(
    rng::AbstractRNG,
    model::HiddenMarkovModel,
    X::Union{Matrix{<:Real}, Nothing}=nothing;
    n::Int,
    autoregressive::Bool=false)

Generate n samples from a Hidden Markov Model. Returns a tuple of the state sequence and the observation sequence.

Arguments

rng::AbstractRNG: The seed.
model::HiddenMarkovModel: The Hidden Markov Model to sample from.
X: The input data for switching regression models.
n::Int: The number of samples to generate.

Returns

state_sequence::Vector{Int}: The state sequence, where each element is an integer 1:K.
observation_sequence::Matrix{Float64}: The observation sequence. This takes the form of the emission model's output.

source

Learning in an HMM

StateSpaceDynamics.jl implements Expectation-Maximization (EM) for parameter learning in both HMMs and GLM-HMMs. EM is an iterative method for finding maximum likelihood estimates of the parameters in graphical models with hidden variables.

StateSpaceDynamics.fit! — Method

fit!(model::HiddenMarkovModel, Y::Matrix{<:Real}, X::Union{Matrix{<:Real}, Nothing}=nothing; max_iters::Int=100, tol::Float64=1e-6)

Fit the Hidden Markov Model using the EM algorithm.

Arguments

model::HiddenMarkovModel: The Hidden Markov Model to fit.
Y::Matrix{<:Real}: The emission data.
X::Union{Matrix{<:Real}, Nothing}=nothing: Optional input data for fitting Switching Regression Models
max_iters::Int=100: The maximum number of iterations to run the EM algorithm.
tol::Float64=1e-6: When the log likelihood is improving by less than this value, the algorithm will stop.

source

Expectation Step (E-step)

In the expectation step (E-step), we calculate the posterior distribution of the latent states given the current parameters of the model:

\[p(X \mid Y, \theta_{\text{old}})\]

We use dynamic programming to efficiently calculate this posterior using the forward and backward recursions for HMMs. This posterior is then used to construct the expectation of the complete data log-likelihood, also known as the Q-function:

\[Q(\theta, \theta_{\text{old}}) = \sum_X p(X \mid Y, \theta_{\text{old}}) \ln p(Y, X \mid \theta)\]

Maximization Step (M-step)

In the maximization step (M-step), we maximize this expectation with respect to the parameters $\theta$. Specifically:

For the initial state distribution and the transition matrix, we use analytical updates for the parameters, derived using Lagrange multipliers.
For emission models in the case of HMMs, we also implement analytical updates.
If the emission model is a GLM, we use Optim.jl to numerically optimize the objective function.

Inference in an HMM

For state inference in Hidden Markov Models (HMMs), we implement two common algorithms:

Forward-Backward Algorithm

The Forward-Backward algorithm is used to compute the posterior state probabilities at each time step. Given the observed data, it calculates the probability of being in each possible hidden state at each time step, marginalizing over all possible state sequences.

StateSpaceDynamics.class_probabilities — Method

function class_probabilities(model::HiddenMarkovModel, Y::Matrix{<:Real}, X::Union{Matrix{<:Real},Nothing}=nothing;)

Calculate the class probabilities at each time point using forward backward algorithm

Arguments

model::HiddenMarkovModel: The Hidden Markov Model to fit.
Y::Matrix{<:Real}: The emission data
X::Union{Matrix{<:Real},Nothing}=nothing: Optional input data for fitting Switching Regression Models

Returns

class_probabilities::Matrix{Float64}: The class probabilities at each timepoint

source

Viterbi Algorithm

The Viterbi algorithm is used for best state sequence labeling. It finds the most likely sequence of hidden states given the observed data. This is done by dynamically computing the highest probability path through the state space, which maximizes the likelihood of the observed sequence.

StateSpaceDynamics.viterbi — Method

viterbi(model::HiddenMarkovModel, Y::Matrix{<:Real}, X::Union{Matrix{<:Real},Nothing}=nothing;)

Get most likely class labels using the Viterbi algorithm

Arguments

model::HiddenMarkovModel: The Hidden Markov Model to fit.
Y::Matrix{<:Real}: The emission data
X::Union{Matrix{<:Real},Nothing}=nothing: Optional input data for fitting Switching Regression Models

Returns

best_path::Vector{Float64}: The most likely state label at each timepoint

source

Reference

For a complete mathematical formulation of the relevant HMM and HMM-GLM learning and inference algorithms, we recommend Pattern Recognition and Machine Learning, Chapter 13 by Christopher Bishop.