State Machine vs. Motion Matching - Demo Performance Analysis Share

Hey, just working on optimising my demo using Unreal Insights. Revisiting the question I had in my mind on Animation performance, “new” Motion Matching vs. “old” State Machines & Blendspaces. Sharing my findings below for anyone else that was interested in that question.

Context

I profiled a traditional state machine animation blueprint (our game, BGP) against UE 5.7’s GASP sample using motion matching via Unreal Insights single-frame timer analysis. Both traces had full stat named events enabled.

The Key Numbers

State Machine (BGP) Motion Matching (GASP)
Worker thread eval per character ~0.43 ms 0.50 ms
Locomotion selection cost per character 0.008 ms ~0.25 ms
Game thread anim cost per character ~0.07 ms ~0.14 ms
Debug cost (strippable) 0.003 ms/char 0.016 ms

What This Means

  • Motion matching locomotion selection is ~31x more expensive than state machine evaluation (0.25 ms vs 0.008 ms per character)
  • BUT overall worker thread cost per character is surprisingly comparable (~0.43 vs 0.50 ms) because each system spends its budget differently:
    • BGP fills its worker budget with PoseDriver (0.21 ms/char) and ControlRig (0.08 ms/char) - procedural corrections that would remain regardless of locomotion approach
    • GASP fills its worker budget with motion matching pipeline (0.25 ms/char) - pose search, trajectory, choosers, BlendStack
  • Both test scenarios were GPU-bound, meaning worker thread animation had headroom and did not bottleneck the frame
  • If BGP adopted motion matching while keeping PoseDriver/ControlRig, net per-character worker cost would increase by ~0.24 ms (from ~0.43 to ~0.67 ms)
  • At 19 ticking characters, that’s ~4.6 ms additional aggregate worker thread time (parallel, not blocking game thread), reducible via URO

The Biggest Single Cost in Either System

PoseDriver in BGP at 3.94 ms total (0.21 ms/char, 210 calls across 19 characters). This is not related to state machines or motion matching - it’s a procedural correction system that would exist in either architecture. It is the single most expensive animation timer in our entire trace.


Test Methodology

Environment

  • Engine: Unreal Engine 5.7
  • Profiling tool: Unreal Insights, single-frame timer list export with stat named events enabled
  • Hardware: Same machine for both captures

BGP Tutorial Area (State Machine)

  • Rich game world with environment, NPCs, lighting, particles
  • 19 animation-ticking skeletal meshes (leader components running ABPs)
  • 426 total skeletal mesh component ticks (modular character system - each character has multiple body part meshes following a leader pose)
  • Animation blueprint uses traditional state machine with blendspaces, plus IK (LegIK, ControlRig), PoseDriver nodes, layered blending, montage slots
  • Game thread frame: ~9.3 ms
  • Parallel animation evaluation enabled (28 worker tasks)

GASP Motion Matching Sample

  • Simple scene, single controllable character
  • 1 animation-ticking skeletal mesh
  • Animation blueprint uses motion matching (PoseSearch), Chooser Tables, BlendStack, orientation warping, steering, foot placement
  • No PoseDriver, no ControlRig (for animation)
  • Game thread frame: ~5.0 ms (3.2 ms idle/GPU-wait)
  • Parallel animation evaluation enabled (1 worker task)

Normalization Approach

BGP costs are divided by character count (19 for most timers) to produce per-character estimates. GASP costs are taken directly (1 character). Worker thread aggregate times represent total CPU time across all workers, not wall-clock game thread impact.


Game Reference Link - The Freeblades on Steam