Understanding Non-Identifiability in Linear Dynamical Systems
This tutorial walks through the fundamental non-identifiability issues in Linear Dynamical Systems (LDS), shows them numerically, and adds Procrustes alignment so we can compare models "apples to apples". It follows a simple pattern:
- build a reference LDS and data; 2) generate equivalent models via similarity transforms; 3) show identical likelihood/predictions; 4) align states/parameters with Procrustes; 5) summarize diagnostics and discuss what is identifiable.
Load Required Packages
using StateSpaceDynamics
using LinearAlgebra
using Random
using Plots
using Statistics
using StableRNGs
using Printf
rng = StableRNG(12345);Create a Reference ("True") LDS
K_true = 3 # latent dimensionality
D = 8 # observation dimensionality
T = 200; # time steps for training/demoStable but nontrivial dynamics
A_true = [0.9 0.1 0.0;
-0.1 0.8 0.2;
0.0 0.0 0.7];
b_true = zeros(K_true)
Q_true = 0.05 * Matrix(I(K_true));Observation matrix with interpretable rows
C_true = [1.0 0.5 0.0; # Obs 1: mainly latent dim 1
0.8 0.3 0.1; # Obs 2: mix of dims 1 & 2
0.2 1.0 0.0; # Obs 3: mainly latent dim 2
0.0 0.7 0.4; # Obs 4: dims 2 & 3
0.1 0.2 0.9; # Obs 5: mainly latent dim 3
0.3 0.0 0.8; # Obs 6: dims 1 & 3
0.6 0.4 0.2; # Obs 7: mixture
0.4 0.6 0.5] # Obs 8: mixture
d_true = zeros(D)
R_true = 0.1 * Matrix(I(D))
x0_true = zeros(K_true)
P0_true = 0.2 * Matrix(I(K_true))
true_lds = LinearDynamicalSystem(
GaussianStateModel(A_true, Q_true, b_true, x0_true, P0_true),
GaussianObservationModel(C_true, R_true, d_true),
K_true, D, fill(true, 6)
);Generate data from the reference model
x_true, y_true = rand(rng, true_lds; tsteps=T, ntrials=1)
print("Generated data from reference LDS model
")
print("True latent dynamics eigenvalues: ", round.(eigvals(A_true), digits=3), "
")
print("Data variance explained by each latent dim: ", round.(var(x_true[:,:,1], dims=2)[:], digits=3), "
")Generated data from reference LDS model
True latent dynamics eigenvalues: ComplexF64[0.7 + 0.0im, 0.85 - 0.087im, 0.85 + 0.087im]
Data variance explained by each latent dim: [0.189, 0.217, 0.097]Non-Identifiability: Similarity (Rotation) Invariance
For any invertible matrix R, the transformation produces an equivalent model: A' = RAR⁻¹, C' = CR⁻¹, Q' = RQRᵀ, x₀' = Rx₀, P₀' = RP₀Rᵀ Such models yield identical likelihoods and predictions.
A helper to build transformed copies
function rotate_lds(lds, R)
A_rot = R * lds.state_model.A * inv(R)
Q_rot = R * lds.state_model.Q * R'
C_rot = lds.obs_model.C * inv(R)
x0_rot = R * lds.state_model.x0
P0_rot = R * lds.state_model.P0 * R'
return LinearDynamicalSystem(
GaussianStateModel(A_rot, Q_rot, b_true, x0_rot, P0_rot),
GaussianObservationModel(C_rot, lds.obs_model.R, d_true),
size(A_rot, 1), size(C_rot, 1), fill(true, 6)
)
endrotate_lds (generic function with 1 method)A small mix of orthogonal and non-orthogonal transforms
rotations = [
[cos(π/4) -sin(π/4) 0.0; sin(π/4) cos(π/4) 0.0; 0.0 0.0 1.0], # R1: rot in (1,2)
[1.0 0.0 0.0; 0.0 cos(π/2) -sin(π/2); 0.0 sin(π/2) cos(π/2)], # R2: rot in (2,3) 90°
Matrix(qr(randn(rng, K_true, K_true)).Q), # R3: random orthogonal
[0.0 0.0 1.0; 0.0 1.0 0.0; 1.0 0.0 0.0], # R4: axis swap (1↔3)
Diagonal([2.0, 0.5, -1.2]) |> Matrix, # R5: scaling + sign flip
[0.0 1.0 0.0; 1.0 0.0 0.0; 0.0 0.0 1.0] # R6: permutation (1↔2)
]
rot_names = [
"R1: rot(1,2, 45°)",
"R2: rot(2,3, 90°)",
"R3: random orthogonal",
"R4: axis swap (1↔3)",
"R5: scaling+sign",
"R6: permutation (1↔2)"
]6-element Vector{String}:
"R1: rot(1,2, 45°)"
"R2: rot(2,3, 90°)"
"R3: random orthogonal"
"R4: axis swap (1↔3)"
"R5: scaling+sign"
"R6: permutation (1↔2)"Helper diagnostics
isorthogonal(R; atol=1e-10) = isapprox(R' * R, I(size(R,1)), atol=atol)
function subspace_angles_deg(C1, C2)
Q1 = qr(C1).Q[:, 1:size(C1,2)]
Q2 = qr(C2).Q[:, 1:size(C2,2)]
σ = svdvals(Q1' * Q2)
σ = clamp.(σ, -1.0, 1.0) # numerical safety
θ = acos.(σ)
return θ .* (180/π)
endsubspace_angles_deg (generic function with 1 method)Build rotated models
rotated_models = [rotate_lds(true_lds, R) for R in rotations]6-element Vector{LinearDynamicalSystem{Float64, GaussianStateModel{Float64, Matrix{Float64}, Vector{Float64}}, GaussianObservationModel{Float64, Matrix{Float64}, Vector{Float64}}}}:
Linear Dynamical System:
------------------------
Gaussian State Model:
---------------------
State Parameters:
A = [0.85 0.15 -0.141; -0.05 0.85 0.141; 0.0 0.0 0.7]
Q = [0.05 0.0 0.0; 0.0 0.05 0.0; 0.0 0.0 0.05]
Initial State:
b = [0.0, 0.0, 0.0]
x0 = [0.0, 0.0, 0.0]
P0 = [0.2 0.0 0.0; 0.0 0.2 0.0; 0.0 0.0 0.2]
Gaussian Observation Model:
---------------------------
size(C) = (8, 3)
size(R) = (8, 8)
size(d) = (8,)
Parameters to update:
---------------------
x0, P0, A (and b), Q, C, R
Linear Dynamical System:
------------------------
Gaussian State Model:
---------------------
State Parameters:
A = [0.9 6.12e-18 0.1; -6.12e-18 0.7 6.12e-18; -0.1 -0.2 0.8]
Q = [0.05 0.0 0.0; 0.0 0.05 0.0; 0.0 0.0 0.05]
Initial State:
b = [0.0, 0.0, 0.0]
x0 = [0.0, 0.0, 0.0]
P0 = [0.2 0.0 0.0; 0.0 0.2 0.0; 0.0 0.0 0.2]
Gaussian Observation Model:
---------------------------
size(C) = (8, 3)
size(R) = (8, 8)
size(d) = (8,)
Parameters to update:
---------------------
x0, P0, A (and b), Q, C, R
Linear Dynamical System:
------------------------
Gaussian State Model:
---------------------
State Parameters:
A = [0.863 0.0598 -0.108; -0.145 0.65 0.0443; 0.0858 0.0186 0.887]
Q = [0.05 -1.04e-17 9.54e-18; -1.04e-17 0.05 -3.47e-18; 1.13e-17 -6.07e-18 0.05]
Initial State:
b = [0.0, 0.0, 0.0]
x0 = [0.0, 0.0, 0.0]
P0 = [0.2 -4.16e-17 3.82e-17; -4.16e-17 0.2 -1.39e-17; 4.51e-17 -2.43e-17 0.2]
Gaussian Observation Model:
---------------------------
size(C) = (8, 3)
size(R) = (8, 8)
size(d) = (8,)
Parameters to update:
---------------------
x0, P0, A (and b), Q, C, R
Linear Dynamical System:
------------------------
Gaussian State Model:
---------------------
State Parameters:
A = [0.7 0.0 0.0; 0.2 0.8 -0.1; 0.0 0.1 0.9]
Q = [0.05 0.0 0.0; 0.0 0.05 0.0; 0.0 0.0 0.05]
Initial State:
b = [0.0, 0.0, 0.0]
x0 = [0.0, 0.0, 0.0]
P0 = [0.2 0.0 0.0; 0.0 0.2 0.0; 0.0 0.0 0.2]
Gaussian Observation Model:
---------------------------
size(C) = (8, 3)
size(R) = (8, 8)
size(d) = (8,)
Parameters to update:
---------------------
x0, P0, A (and b), Q, C, R
Linear Dynamical System:
------------------------
Gaussian State Model:
---------------------
State Parameters:
A = [0.9 0.4 0.0; -0.025 0.8 -0.0833; 0.0 0.0 0.7]
Q = [0.2 0.0 0.0; 0.0 0.0125 0.0; 0.0 0.0 0.072]
Initial State:
b = [0.0, 0.0, 0.0]
x0 = [0.0, 0.0, 0.0]
P0 = [0.8 0.0 0.0; 0.0 0.05 0.0; 0.0 0.0 0.288]
Gaussian Observation Model:
---------------------------
size(C) = (8, 3)
size(R) = (8, 8)
size(d) = (8,)
Parameters to update:
---------------------
x0, P0, A (and b), Q, C, R
Linear Dynamical System:
------------------------
Gaussian State Model:
---------------------
State Parameters:
A = [0.8 -0.1 0.2; 0.1 0.9 0.0; 0.0 0.0 0.7]
Q = [0.05 0.0 0.0; 0.0 0.05 0.0; 0.0 0.0 0.05]
Initial State:
b = [0.0, 0.0, 0.0]
x0 = [0.0, 0.0, 0.0]
P0 = [0.2 0.0 0.0; 0.0 0.2 0.0; 0.0 0.0 0.2]
Gaussian Observation Model:
---------------------------
size(C) = (8, 3)
size(R) = (8, 8)
size(d) = (8,)
Parameters to update:
---------------------
x0, P0, A (and b), Q, C, R
Compute likelihoods: should match (up to numerical tolerance)
y_data = reshape(y_true, D, T, 1)
x_smooth_orig, _ = smooth(true_lds, y_data)
ll_orig = loglikelihood(x_smooth_orig[:,:,1], true_lds, y_true[:,:,1])
print("
" * "="^60 * "
")
print("ROTATION / SIMILARITY INVARIANCE DEMONSTRATION
")
print("="^60 * "
")
@printf("Original model likelihood: %.6f
", sum(ll_orig))
for (name, R, model) in zip(rot_names, rotations, rotated_models)
x_s_rot, _ = smooth(model, y_data)
ll_rot = loglikelihood(x_s_rot[:,:,1], model, y_true[:,:,1])
@printf("%-24s LL: %.6f ΔLL: %.3e cond(R): %-8.2f orth? %s
",
name, sum(ll_rot), sum(abs.(ll_rot - ll_orig)), cond(R), isorthogonal(R) ? "yes" : "no")
end
============================================================
ROTATION / SIMILARITY INVARIANCE DEMONSTRATION
============================================================
Original model likelihood: -826.154686
R1: rot(1,2, 45°) LL: -826.154686 ΔLL: 1.525e-13 cond(R): 1.00 orth? yes
R2: rot(2,3, 90°) LL: -826.154686 ΔLL: 1.681e-13 cond(R): 1.00 orth? yes
R3: random orthogonal LL: -826.154686 ΔLL: 1.703e-13 cond(R): 1.00 orth? yes
R4: axis swap (1↔3) LL: -826.154686 ΔLL: 1.041e-13 cond(R): 1.00 orth? yes
R5: scaling+sign LL: -826.154686 ΔLL: 1.539e-13 cond(R): 4.00 orth? no
R6: permutation (1↔2) LL: -826.154686 ΔLL: 1.221e-13 cond(R): 1.00 orth? yesVisualize Parameter Differences (Before Alignment)
p1 = plot(layout=(2,2), size=(1000, 800))
heatmap!(A_true, title="Original A", color=:RdBu, subplot=1, aspect_ratio=:equal)
heatmap!(rotated_models[3].state_model.A, title="Rotated A (R3)", color=:RdBu, subplot=2, aspect_ratio=:equal)
heatmap!(C_true, title="Original C", color=:RdBu, subplot=3, aspect_ratio=:equal)
heatmap!(rotated_models[3].obs_model.C, title="Rotated C (R3)", color=:RdBu, subplot=4, aspect_ratio=:equal)Procrustes Alignment: Apples-to-Apples Comparisons
The latent coordinates are arbitrary: we can rotate them without changing the model's likelihood. To compare two fits (or a rotated copy) fairly, align the states with an orthogonal Procrustes transform R̂ that minimizes ‖R̂ X - Y‖_F.
function procrustes_R(X::AbstractMatrix, Y::AbstractMatrix; proper::Bool=false)
S = svd(Y * X')
R̂ = S.U * S.Vt # (not S.U * S.V!)
if proper && det(R̂) < 0 # optionally enforce det=+1
U2 = copy(S.U); U2[:,end] .= -U2[:,end]
R̂ = U2 * S.Vt
end
return R̂
endprocrustes_R (generic function with 1 method)Choose R3 for demonstration (random orthogonal)
R_idx = 3
m_rot = rotated_models[R_idx]
x_rot, _ = smooth(m_rot, y_data)
Rhat = procrustes_R(x_rot[:,:,1], x_smooth_orig[:,:,1]) # map rotated -> original
state_align_relerr = norm(Rhat * x_rot[:,:,1] - x_smooth_orig[:,:,1]) / norm(x_smooth_orig[:,:,1])
@printf("
Procrustes state alignment (R%d): rel. error = %.3e
", R_idx, state_align_relerr)
Procrustes state alignment (R3): rel. error = 7.316e-16Align parameters via R̂ for direct visual comparison: Ã = R̂ * Arot * R̂' and C̃ = Crot * R̂'
A_rot_aligned = Rhat * m_rot.state_model.A * Rhat'
C_rot_aligned = m_rot.obs_model.C * Rhat'
ΔA = norm(A_true - A_rot_aligned)
ΔC = norm(C_true - C_rot_aligned)
@printf("Aligned parameter diffs (R%d): ||ΔA||=%.3e ||ΔC||=%.3e
", R_idx, ΔA, ΔC)Aligned parameter diffs (R3): ||ΔA||=9.429e-16 ||ΔC||=1.580e-15Visualize with matched color scales for fairness
Amin = minimum([minimum(A_true), minimum(A_rot_aligned)])
Amax = maximum([maximum(A_true), maximum(A_rot_aligned)])
Cmin = minimum([minimum(C_true), minimum(C_rot_aligned)])
Cmax = maximum([maximum(C_true), maximum(C_rot_aligned)])
p_align = plot(layout=(2,2), size=(1000, 800))
heatmap!(A_true, title="Original A", color=:RdBu, subplot=1, aspect_ratio=:equal, clims=(Amin, Amax))
heatmap!(A_rot_aligned, title="Aligned A (R3)", color=:RdBu, subplot=2, aspect_ratio=:equal, clims=(Amin, Amax))
heatmap!(C_true, title="Original C", color=:RdBu, subplot=3, aspect_ratio=:equal, clims=(Cmin, Cmax))
heatmap!(C_rot_aligned, title="Aligned C (R3)", color=:RdBu, subplot=4, aspect_ratio=:equal, clims=(Cmin, Cmax))Residual over time (after Procrustes): should be ~0 except numerical noise
restit = [norm(Rhat * x_rot[:, t, 1] - x_smooth_orig[:, t, 1]) for t in 1:T]
plot(1:T, restit, lw=2, xlabel="time", ylabel="‖R̂ x_rot − x_orig‖₂",
title="Procrustes residual over time (R3)")Invariants: What is identifiable?
Similarity transforms preserve certain summaries:
- eigenvalues of A (up to ordering), hence modal timescales τ ≈ -1/log|λ|
- column space of C (compare via principal angles)
function invariants_summary(lds)
λ = eigvals(lds.state_model.A)
τ = [-1 / log(abs(l)) for l in λ] # (real-mode heuristic)
return λ, τ
end
λ_true, τ_true = invariants_summary(true_lds)
for (i, (name, model)) in enumerate(zip(rot_names, rotated_models))
λ_i, τ_i = invariants_summary(model)
θ = subspace_angles_deg(C_true, model.obs_model.C)
@printf("Invariant check %-24s max|Δλ|=%.2e max|Δτ|=%.2e max angle(C)=%.3f°
",
name,
maximum(abs.(sort(λ_true; by=abs) - sort(λ_i; by=abs))),
maximum(abs.(sort(τ_true; by=abs) - sort(τ_i; by=abs))),
maximum(θ))
endInvariant check R1: rot(1,2, 45°) max|Δλ|=2.78e-17 max|Δτ|=0.00e+00 max angle(C)=0.000°
Invariant check R2: rot(2,3, 90°) max|Δλ|=2.78e-17 max|Δτ|=0.00e+00 max angle(C)=0.000°
Invariant check R3: random orthogonal max|Δλ|=7.77e-16 max|Δτ|=8.88e-15 max angle(C)=0.000°
Invariant check R4: axis swap (1↔3) max|Δλ|=0.00e+00 max|Δτ|=0.00e+00 max angle(C)=0.000°
Invariant check R5: scaling+sign max|Δλ|=0.00e+00 max|Δτ|=0.00e+00 max angle(C)=0.000°
Invariant check R6: permutation (1↔2) max|Δλ|=0.00e+00 max|Δτ|=0.00e+00 max angle(C)=0.000°Observational Equivalence: Predictions Match
New data from a rotated model and prediction with both models should have identical error.
x_rot_new, y_rot_new = rand(rng, rotated_models[1]; tsteps=100, ntrials=1)
test_data = reshape(y_rot_new, D, 100, 1)
x_pred_orig, _ = smooth(true_lds, test_data)
y_pred_orig = true_lds.obs_model.C * x_pred_orig[:, :, 1]
x_pred_rot, _ = smooth(rotated_models[1], test_data)
y_pred_rot = rotated_models[1].obs_model.C * x_pred_rot[:, :, 1]
mse_orig = mean((y_rot_new[:, :, 1] - y_pred_orig).^2)
mse_rot = mean((y_rot_new[:, :, 1] - y_pred_rot).^2)
@printf("
Prediction MSE (same test seq): original=%.6f rotated=%.6f
", mse_orig, mse_rot)
Prediction MSE (same test seq): original=0.084133 rotated=0.084133Summary Diagnostics Across Transforms
A compact table: ΔLL, cond(R), orthogonality, max subspace angle for C, and Procrustes alignment error versus the original smoothed states.
struct RotDiag
name::String
dLL::Float64
condR::Float64
orth::Bool
max_angle_deg::Float64
proc_relerr::Float64
end
diagnostics = RotDiag[]
for (name, R, model) in zip(rot_names, rotations, rotated_models)
x_s, _ = smooth(model, y_data)
ll = loglikelihood(x_s[:,:,1], model, y_true[:,:,1])
dLL = sum(abs.(ll - ll_orig))
Rhat_i = procrustes_R(x_s[:,:,1], x_smooth_orig[:,:,1])
relerr = norm(Rhat_i * x_s[:,:,1] - x_smooth_orig[:,:,1]) / max(norm(x_smooth_orig[:,:,1]), eps())
θ = subspace_angles_deg(C_true, model.obs_model.C)
push!(diagnostics, RotDiag(name, dLL, cond(R), isorthogonal(R), maximum(θ), relerr))
end
print("
" * "-"^90 * "
")
@printf("%-24s | %-9s | %-8s | %-6s | %-14s | %-14s
",
"Transform", "ΔLL", "cond(R)", "orth?", "max angle(C)°", "Procrustes err")
print("-"^90 * "
")
for d in diagnostics
@printf("%-24s | %-.3e | %-8.2f | %-6s | %-14.3f | %-14.3e
",
d.name, d.dLL, d.condR, d.orth ? "yes" : "no", d.max_angle_deg, d.proc_relerr)
end
print("-"^90 * "
")
------------------------------------------------------------------------------------------
Transform | ΔLL | cond(R) | orth? | max angle(C)° | Procrustes err
------------------------------------------------------------------------------------------
R1: rot(1,2, 45°) | 1.525e-13 | 1.00 | yes | 0.000 | 5.690e-16
R2: rot(2,3, 90°) | 1.681e-13 | 1.00 | yes | 0.000 | 2.838e-15
R3: random orthogonal | 1.703e-13 | 1.00 | yes | 0.000 | 7.316e-16
R4: axis swap (1↔3) | 1.041e-13 | 1.00 | yes | 0.000 | 3.091e-16
R5: scaling+sign | 1.539e-13 | 4.00 | no | 0.000 | 7.117e-01
R6: permutation (1↔2) | 1.221e-13 | 1.00 | yes | 0.000 | 5.198e-16
------------------------------------------------------------------------------------------Practical Takeaways
- Don’t interpret individual latent coordinates; they are defined only up to an invertible change of basis.
- When comparing models/fits, either align with Procrustes or report invariants (eigenvalues/timescales, subspace angles, predictive metrics).
- Watch conditioning: extreme transforms (large cond(R)) can inflate numerical errors even when theory says models are identical.
- (Optional next step) Implement a small canonicalization: whiten Q≈I, real-Schur form for A, mode ordering & sign conventions. This “fixes the gauge” so different runs are directly comparable without per-pair alignment.
This page was generated using Literate.jl.