The model is quite small so we are hoping to be able to have it as a plugin that you can package with the game as its for generating face animations on the fly and it needs to be local to be of any use to anyone.
Latency can be as low as <1.5 seconds (time to first speech) but with Elevenlabs its generally higher.
I will add a link to sign up for alpha access as soon as one is available here.