In theory it should work but in practice could become unmanageable as the number of animation clips grow.
Then there is this.
So with a montage you would not be able to index with in the montage to preform a layered blend for example as the moment you call a different track whatever state the montage is preforming and needs to be blended will be reset.
As for doing something like this an idea I’m still working on is based on the idea of getting rid of the evil that is the state machine and drive everything through the animgraph using matched sets of blend spaces making it 100% data driven.
Behind the scene concept videos so don’t expect production values.
In theory constructing “matched” blend spaces can be matched, work as intended over a network, and can be layered in the animgraph with out having to build a state machine using individual clips (which I’m assuming that is your intent and reasoning )