What is a Hidden Markov Model?

A Hidden Markov Model (HMM) is a graphical model that describes how systems change over time. When modeling a time series with $T$ observations using an HMM, we assume that the observed data $y_{1:T}$ depends on hidden states $x_{1:T}$ that are not observed. Specifically, an HMM is a type of state-space model in which the hidden states are discrete.

The three components of an HMM are as follows:

  • An initial state distribution ($\pi$): which hidden states we are likely to start in.
  • A transition matrix ($A$): how the hidden states evolve over time.
  • An emission model: how the hidden states generate the observed data.

The generative model is given by:

$

x1 \sim \text{Cat}(\pi) \ xt \mid x{t-1} \sim \text{Cat}(A{x{t-1}, :}) \ yt \mid xt \sim p(yt \mid \theta{xt}) $

Where:

  • \[x_t\]

    is the hidden (discrete) state at time $t$
  • \[y_t\]

    is the observed data at time $t$
  • \[\pi\]

    is the initial state distribution
  • \[A\]

    is the state transition matrix
  • \[\theta_{x_t}\]

    are the parameters of the emission distribution for state $x_t$

The emission model can take many forms: Gaussian, Poisson, Bernoulli, categorical, etc... In the case of a Gaussian emission distribution, this becomes:

$

yt \mid (xt = k) \sim \mathcal{N}(\muk, \Sigmak) $

Where:

  • \[\mu_k\]

    is the mean of the emission distribution for state $k$
  • \[\Sigma_k\]

    is the covariance of the emission distribution for state $k$

What is a Generalized Linear Model - Hidden Markov Model

A Hidden Markov Model - Generalized Linear Model (GLM-HMM) - also known as Switching Regression Model - is an extension to classic HMMs where the emission models are state-dependent GLMs that link an observed input to an observed output. This formulation allows each hidden state to define its own regression relationship between inputs and outputs, enabling the model to capture complex, state-dependent dynamics in the data. Currently, StateSpaceDynamics.jl support Gaussian, Bernoulli, Poisson, and Autoregressive GLMs as emission models.

The generative model is as follows:

$

x1 \sim \text{Cat}(\pi) \ xt \mid x{t-1} \sim \text{Cat}(A{x{t-1}, :}) \ yt \mid xt, ut \sim p(yt \mid \theta{xt}, ut) $

Where:

  • \[x_t\]

    is the hidden (discrete) state at time $t$
  • \[y_t\]

    is the observed output at time $t$
  • \[u_t\]

    is the observed input (covariate) at time $t$
  • \[\theta_{x_t}\]

    are the parameters of the GLM emission model for state $x_t$

Example Emission Models

For example, if the emission is a Gaussian GLM:

$

yt \mid (xt = k), ut \sim \mathcal{N}(\muk + \betak^\top ut, \sigma_k^2) $

Where:

  • \[\beta_k\]

    are the regression weights for state $k$
  • \[\sigma_k^2\]

    is the state-dependent variance
  • \[\mu_k\]

    is the state-dependent bias

If the emission is Bernoulli (for binary outputs):

$

yt \mid (xt = k), ut \sim \text{Bernoulli} \left( \sigma \left( \muk + \betak^\top ut \right) \right) $

Where:

  • \[\beta_k\]

    are the regression weights for state $k$
  • \[\sigma(\cdot)\]

    is the logistic sigmoid function for binary outputs
  • \[\mu_k\]

    is the state-dependent bias

Learning in an HMM

StateSpaceDynamics.jl implements Expectation-Maximization (EM) for parameter learning in both HMMs and GLM-HMMs. EM is an iterative method for finding maximum likelihood estimates of the parameters in graphical models with hidden variables.

Expectation Step (E-step)

In the expectation step (E-step), we calculate the posterior distribution of the latent states given the current parameters of the model:

$

p(X \mid Y, \theta_{\text{old}}) $

We use dynamic programming to efficiently calculate this posterior using the forward and backward recursions for HMMs. This posterior is then used to construct the expectation of the complete data log-likelihood, also known as the Q-function:

$

Q(\theta, \theta{\text{old}}) = \sumX p(X \mid Y, \theta_{\text{old}}) \ln p(Y, X \mid \theta) $

Maximization Step (M-step)

In the maximization step (M-step), we maximize this expectation with respect to the parameters $\theta$. Specifically:

  • For the initial state distribution and the transition matrix, we use analytical updates for the parameters, derived using Lagrange multipliers.
  • For emission models in the case of HMMs, we also implement analytical updates.
  • If the emission model is a GLM, we use Optim.jl to numerically optimize the objective function.

Inference in an HMM

For state inference in Hidden Markov Models (HMMs), we implement two common algorithms:

Forward-Backward Algorithm

The Forward-Backward algorithm is used to compute the posterior state probabilities at each time step. Given the observed data, it calculates the probability of being in each possible hidden state at each time step, marginalizing over all possible state sequences.

Viterbi Algorithm

The Viterbi algorithm is used for best state sequence labeling. It finds the most likely sequence of hidden states given the observed data. This is done by dynamically computing the highest probability path through the state space, which maximizes the likelihood of the observed sequence.

Reference

For a complete mathematical formulation of the relevant HMM and HMM-GLM learning and inference algorithms, we recommend Pattern Recognition and Machine Learning, Chapter 13 by Christopher Bishop.

StateSpaceDynamics.HiddenMarkovModelType
HiddenMarkovModel

A Hidden Markov Model (HMM) with custom emissions.

Fields

  • K::Int: Number of states.
  • B::Vector=Vector(): Vector of emission models.
  • emission=nothing: If B is missing emissions, clones of this model will be used to fill in the rest.
  • A::Matrix{<:Real}: Transition matrix.
  • πₖ::Vector{Float64}: Initial state distribution.
StateSpaceDynamics.class_probabilitiesFunction
function class_probabilities(model::HiddenMarkovModel, Y::Matrix{<:Real}, X::Union{Matrix{<:Real},Nothing}=nothing;)

Calculate the class probabilities at each time point using forward backward algorithm

Arguments

  • model::HiddenMarkovModel: The Hidden Markov Model to fit.
  • Y::Matrix{<:Real}: The emission data
  • X::Union{Matrix{<:Real},Nothing}=nothing: Optional input data for fitting Switching Regression Models

Returns

  • class_probabilities::Matrix{Float64}: The class probabilities at each timepoint
StateSpaceDynamics.class_probabilitiesFunction
function class_probabilities(model::HiddenMarkovModel, Y::Vector{<:Matrix{<:Real}}, X::Union{Vector{<:Matrix{<:Real}},Nothing}=nothing;)

Calculate the class probabilities at each time point using forward backward algorithm on multiple trials of data

Arguments

  • model::HiddenMarkovModel: The Hidden Markov Model to fit.
  • Y::Vectpr{<:Matrix{<:Real}}: The trials of emission data
  • X::Union{Vector{<:Matrix{<:Real}},Nothing}=nothing: Optional trials of input data for fitting Switching Regression Models

Returns

  • class_probabilities::Vector{<:Matrix{Float64}}: Each trial's class probabilities at each timepoint
StateSpaceDynamics.fit!Function
fit!(model::HiddenMarkovModel, Y::Matrix{<:Real}, X::Union{Matrix{<:Real}, Nothing}=nothing; max_iters::Int=100, tol::Float64=1e-6)

Fit the Hidden Markov Model to multiple trials of data using the EM algorithm.

Arguments

  • model::HiddenMarkovModel: The Hidden Markov Model to fit.
  • Y::Vector{<:Matrix{<:Real}}: The trialized emission data.
  • X::Union{Vector{<:Matrix{<:Real}}, Nothing}=nothing: Optional input data for fitting Switching Regression Models
  • max_iters::Int=100: The maximum number of iterations to run the EM algorithm.
  • tol::Float64=1e-6: When the log likelihood is improving by less than this value, the algorithm will stop.
StateSpaceDynamics.fit!Function
fit!(model::HiddenMarkovModel, Y::Matrix{<:Real}, X::Union{Matrix{<:Real}, Nothing}=nothing; max_iters::Int=100, tol::Float64=1e-6)

Fit the Hidden Markov Model using the EM algorithm.

Arguments

  • model::HiddenMarkovModel: The Hidden Markov Model to fit.
  • Y::Matrix{<:Real}: The emission data.
  • X::Union{Matrix{<:Real}, Nothing}=nothing: Optional input data for fitting Switching Regression Models
  • max_iters::Int=100: The maximum number of iterations to run the EM algorithm.
  • tol::Float64=1e-6: When the log likelihood is improving by less than this value, the algorithm will stop.
StateSpaceDynamics.sampleFunction
sample(model::HiddenMarkovModel, data...; n::Int)

Generate n samples from a Hidden Markov Model. Returns a tuple of the state sequence and the observation sequence.

Arguments

  • model::HiddenMarkovModel: The Hidden Markov Model to sample from.
  • data...: The data to fit the Hidden Markov Model. Requires the same format as the emission model.
  • n::Int: The number of samples to generate.

Returns

  • state_sequence::Vector{Int}: The state sequence, where each element is an integer 1:K.
  • observation_sequence::Matrix{Float64}: The observation sequence. This takes the form of the emission model's output.
StateSpaceDynamics.viterbiFunction
viterbi(model::HiddenMarkovModel, Y::Vector{<:Matrix{<:Real}}, X::Union{Vector{<:Matrix{<:Real}},Nothing}=nothing;)

Get most likely class labels using the Viterbi algorithm for multiple trials of data

Arguments

  • model::HiddenMarkovModel: The Hidden Markov Model to fit.
  • Y::Vectpr{<:Matrix{<:Real}}: The trials of emission data
  • X::Union{Vector{<:Matrix{<:Real}},Nothing}=nothing: Optional trials of input data for fitting Switching Regression Models

Returns

  • best_path::Vector{<:Vector{Float64}}: Each trial's best state path
StateSpaceDynamics.viterbiFunction
viterbi(model::HiddenMarkovModel, Y::Matrix{<:Real}, X::Union{Matrix{<:Real},Nothing}=nothing;)

Get most likely class labels using the Viterbi algorithm

Arguments

  • model::HiddenMarkovModel: The Hidden Markov Model to fit.
  • Y::Matrix{<:Real}: The emission data
  • X::Union{Matrix{<:Real},Nothing}=nothing: Optional input data for fitting Switching Regression Models

Returns

  • best_path::Vector{Float64}: The most likely state label at each timepoint