TUTORIALS SERIES · REGIME INTELLIGENCE

Why Smart Money Abandoned HMM.

And What They Use Instead.

A technical comparison of Hidden Markov Models versus deterministic regime detection — and why the difference cost traders billions in 2008, 2020, and 2026.

Sherif Saad · Regime Intelligence · June 2026 · 20+ years in financial markets · CFA Level II

Regime Intelligence · regimeintelligence.com · Not financial advice

EXECUTIVE SUMMARY

HMM is the academic standard for regime detection. It has three fundamental flaws that make it dangerous in live markets. Regime Intelligence addresses all three by design.

Flaw 1 — Gaussian assumption

HMM assumes market returns follow a normal distribution within each state. They do not. Fat tails — the extreme moves that matter most — are dramatically underestimated. A Gaussian HMM calibrated on normal returns assigned near-zero probability to March 2020. It happened anyway.

Flaw 2 — State churn

A 2- or 3-state HMM fitting on returns flips states rapidly as volatility clusters. Traders cannot act on a model that calls bear, then bull, then bear again across three consecutive weeks of the same drawdown. Regime Intelligence uses hysteresis, persistence, and a Barra-style confirmation gate to prevent false transitions.

Flaw 3 — Single signal

HMM reads price returns only. Regime Intelligence reads five independent pillars simultaneously: volatility level, trend conviction, drawdown depth, drawdown speed, and macro context. Five signals detect what one cannot.

WHAT HMM ACTUALLY IS

A genuine breakthrough for its time. Built on assumptions that financial markets systematically violate.

Hidden Markov Models were introduced to quantitative finance in the early 1990s and became the dominant framework for regime detection at institutional desks by the 2000s. The appeal was real: HMM provided a mathematically rigorous way to infer hidden market states from observable price data, with formal probability theory behind every output.

The basic mechanics: you define a fixed number of hidden states (typically 2 — bull and bear, or 3 with a neutral state). The model is trained on historical return data and learns the statistical properties of each state — mean return, volatility, and transition probabilities between states. At any bar, given the sequence of observed returns, the model outputs the probability of being in each state. The state with the highest probability is the regime call.

The Gaussian variant — GHMM — additionally assumes that returns within each state follow a normal (Gaussian) distribution. It is also the version most dangerously wrong about how financial markets actually behave.

The core problem

HMM was a breakthrough. The problem is not the mathematics. The problem is that every assumption it is built on is violated by real markets at precisely the moments that matter most.

FIVE ASSUMPTIONS HMM MAKES — AND WHY MARKETS BREAK EVERY ONE

Each assumption is reasonable in theory. Each one fails in practice.

Assumption: Returns are Gaussian within each state.
Financial returns have fat tails. A GHMM calibrated on historical returns assigns near-zero probability to 3–10 sigma events and therefore cannot detect them as the regime transitions they actually are.
Real example: March 2020 — the week of February 24 saw a move the Gaussian model called essentially impossible. The Regime Intelligence engine called CRISIS on February 24 at CSS 77.2% via standalone triggers — before the tail flag even confirmed.
Assumption: The number of states is known and fixed at 2 or 3.
A 2-state model cannot distinguish FRAGILE from STRESS — two states that require completely different responses. In FRAGILE, trend conviction is deteriorating but markets function normally. In STRESS, correlations are rising and diversification is breaking down.
Real example:Empirical data from 5,079 classified daily SPY bars confirms five statistically distinct behavioral clusters. CRISIS has negative mean annual return of −29.2% and 32.5% annual volatility. STRESS has +12.9% mean return and 15.8% vol. Same “bear” label. Completely different reality.
Assumption: State transition probabilities are fixed and constant.
HMM assumes the probability of moving from state A to state B is the same today as it was in 1998. Fixed transition matrices cannot capture regime-dependent transition risk.
Real example: GFC 2008 — the engine held CRISIS from June 23 through August 2008 even as CSS percentile dropped to 3.4% because standalone triggers remained active. A fixed transition matrix would have de-escalated. It should not have.
Assumption: The model observes only price returns.
HMM fits on one signal. Regime Intelligence reads five: volatility (35%), trend conviction (20%), drawdown depth (25%), drawdown speed (10%), and macro context including credit, rates, and tail events (15%).
Real example: COVID 2020 — the standalone trigger that called CRISIS on February 24 fired on VIX and drawdown speed — neither captured by return-only HMM.
Assumption: States are globally consistent across timeframes.
A standard HMM applies one state to the entire market at one timeframe. It cannot detect that the daily is in STRESS while the weekly is still in EXPANSION — a conflict that is itself a high-value signal.
Real example: BTC February 2026 — on February 1 the daily showed EXPANSION with CSS at 81%. The weekly showed STRESS at CSS 99.7%. A single-timeframe HMM reading the daily would have called recovery. The full five-timeframe picture told the opposite story.

THE HISTORICAL EVIDENCE — THREE EVENTS, REAL ENGINE DATA

Not a simulation. Not a backtest. Actual engine classifications from the regime store parquet files.

Each chart below shows the actual weekly regime classification from the Regime Intelligence engine versus what a naive HMM or percentile bucket would have called, with the CSS percentile time series below. All data sourced directly from data/regime_store/1w/ parquet files.

Chart 1 of 3 — Five Regime States: Empirically Distinct Behavioral Clusters (SPY 1d, 1993–2026)
Source: 5,079 classified daily bars · SPY 1d · Regime Intelligence engineJune 2026

Why CRISIS mean CSS (76.4%) < STRESS (74.5%) — and why that is correct

CRISIS days include crisis_standalone and tail_flag_crash overrides that fire at CSS below 90% — sometimes well below. The engine correctly classifies a week as CRISIS when VIX spikes and drawdown accelerates simultaneously, even if the CSS percentile has not yet reached the crisis threshold. A pure percentile bucket would miss those early crisis calls.

Chart 2 of 3 — GFC 2008: SPY Weekly Regime Classification
Engine called CRISIS Sep 15, 2008 at CSS 13.8% via standalone triggersdata/regime_store/1w/SPY.parquet

GFC 2008 — three signals a return-only HMM missed

First: CRISIS on June 23, 2008 via crisis_standalone at CSS 50.8% — thirteen weeks before Lehman. Second: CRISIS held through June–August 2008 even as CSS dropped to 3.4%. Third: CRISIS re-entered September 15 at CSS 13.8% — the Lehman week — before percentile hit 94.2% on September 29.

Chart 3 of 3a — COVID 2020: SPY Weekly Regime Classification
Engine held CALM through January — CRISIS Feb 24, two weeks before tail flagdata/regime_store/1w/SPY.parquet

COVID 2020 — the persistence filter that saved traders from a false signal

January 27 to February 17: CSS climbed with pending FRAGILE while the label stayed CALM. A 2-state HMM would have flipped to bear weeks early. Then February 24: CRISIS at CSS 77.2% via standalone — correct and early. Recovery: de-escalated to FRAGILE in May while vol-clustering HMM would still show stress.

Chart 3 of 3b — BTC February 2026: BTCUSD Weekly Regime Classification
CSS 94–99% for 13+ weeks but STRESS held — Barra gate blocked premature CRISISdata/regime_store/1w/BTCUSD.parquet

BTC Feb 2026 — the false positive that cost leveraged traders everything

From November 2025 through February 2026, CSS ran 94–99% — above the 90% crisis threshold. A naive percentile bucket called CRISIS for 13+ consecutive weeks. The engine held STRESS with pending CRISIS. On February 1, the daily label read EXPANSION while CSS was 81% and EWS was active. One month later: Full Storm at 98.3% and $1.24B in liquidations.

Read the full Bitcoin X-Ray →

WHY FIVE STATES — THE EMPIRICAL JUSTIFICATION

Five is not a design choice. It is what the data shows.

State	Days (n)	Ann. return	Ann. vol	Mean CSS	Key character
Clear Skies	1,634 (32.2%)	+17.2%	13.2%	16.5%	Trending, low vol, systems work
Tailwind	727 (14.3%)	+13.7%	12.2%	40.2%	Moderate stress, positive momentum
Thin Ice	738 (14.5%)	+12.8%	13.3%	56.9%	Deteriorating conviction, caution
Storm Warning	580 (11.4%)	+12.9%	15.8%	74.5%	Rising correlations, breakdown risk
Full Storm	1,400 (27.6%)	−29.2%	32.5%	76.4%	Liquidity crisis, all rules fail

CALM, EXPANSION, FRAGILE, and STRESS all have positive mean daily returns. CRISIS alone has a negative mean daily return — and its annualized volatility of 32.5% is 2.5× higher than CALM at 13.2%. That is not a continuous spectrum. That is a qualitatively different behavioral state.

Resolution where it matters

A 2-state model lumps FRAGILE, STRESS, and CRISIS into one “bear” state. A 3-state model still merges FRAGILE and STRESS — precisely where the most actionable warning occurs. Separating them is not adding noise. It is adding resolution where resolution matters most.

Source: 5,079 classified daily SPY bars (1993–2026) · data/regime_store/1d/SPY.parquet · close-to-close returns grouped by same-day regime_state.

THE DIRECT COMPARISON

HMM versus Regime Intelligence — dimension by dimension. Every claim is verifiable from the engine codebase and historical data above.

Dimension	HMM / GHMM	Regime Intelligence
Classification method	Probabilistic — outputs P(state) for each state	Deterministic — one discrete label per bar from sequential state machine
State count	Typically 2–3 (bull/bear/neutral)	5 empirically derived states with distinct statistical profiles
Input signals	Price returns only	5 pillars: volatility, trend, drawdown depth, drawdown speed, macro context
Fat tail handling	Gaussian assumption — extreme events underestimated	Explicit tail override via crisis_standalone + tail_flag_crash
Timeframes	Single timeframe	5 simultaneous timeframes with Timeframe Agreement Score (TAS)
False signal protection	None beyond model fit	Hysteresis bands + persistence bars + Barra 60% confirmation gate
Early warning	None — lags price action	EWS fires on EVS velocity before regime label changes
Output	Probability distribution over states	Discrete state + CSS percentile + TAS + EVS + EWS + pending_state
Actionability	Cannot act on "67% probability of bear"	Single definitive state with quantified stress level
Computational model	Statistical fitting on historical returns	Numba state machine — O(1) per bar, live, no refit required

WHAT THIS MEANS IN PRACTICE

A probability distribution tells you to be cautious. A regime state tells you what to do.

If a model tells you there is a 67% probability of a bear regime, what do you do? Reduce position size by 67%? Hedge 67% of the portfolio? Wait until it reaches 80%? The probability number itself does not answer any of these questions.

A deterministic regime state with a quantified stress score answers a different, more useful question: what kind of environment are you in right now, and how stressed is it? Storm Warning at CSS 81% with EWS active tells you something specific and actionable — that the environment your system was designed for has changed.

The institutional edge now available to everyone

Institutional desks have always combined quantitative regime signals with human judgment. That edge — reading the environment before placing the trade — is what the five-state deterministic regime engine provides. A clear environmental read across thousands of assets, five timeframes, live updates. Free to start at regimeintelligence.com.

WHAT REGIME INTELLIGENCE DOES NOT DO

Every framework has limits. These are ours.

Regime Intelligence does not predict price direction. The regime state tells you what kind of environment you are in — not where price is going.

The five-state classification is calibrated on historical data. If a genuinely novel stress mechanism emerges, the engine will detect stress but may classify it differently from how a human analyst would.

The persistence filter and Barra confirmation gate that protect against false positives also introduce lag. When a market transitions rapidly, the engine will be slightly late to confirm the new state. Given the cost of acting on a false signal in leverage — as demonstrated by the BTC February 2026 episode — we consider this the right trade-off.

Nothing in this framework replaces your own analysis, risk management, position sizing, or stop losses.

See the live regime read on any asset

Thousands of assets · 5 timeframes · Live updates · Free to start

Live SPY regime →Regime foundation guide →