Well state machines work well with locomotion so would be better using a layered blend per bone node if using facial clusters. If using morph targets you don’t need to use layering as morph targets are additive when added to the mesh level.
Montages are difficult to work with when sharing the same slot as triggering the shape will override the current montage.
Are you using morphing or clusters?