Request: Speech Recognition / Engine

Right, I have tried to watch the tutorials on the entire integration of the speech.recognition namespace and how to implement the c++, but… it’s all c++ to me (which I know not). I think someone could get very rich by implementing an Unreal Speech Recognition and Synthesis add-on/template.

The basics of it would be to have the meat of it be accessible from blueprint (for us non-programmer nubbins). The project should be broken down into 3 different areas ideally:

  • Command and Control Blueprint: This would provide Unreal with a command, for example, if the developer tells BP that when the user says, “main menu”, then the main menu HUD should appear.
  • Dictation Blueprint: This would allow the developer to implement speech to text functionality in a window / chat box (for example)
  • Text To Speech Blueprint: This is self-obvious, it would allow the developer to use TTS voices as character/game voices/audio. I’m not sure if this could be implemented, though another idea is to allow UE to use multiple TTS voices, instead of one set as default on the user’s system.
  • Grammar Blueprint: A set of variables that the speech engine uses - this BP should “compile” the grammar file, either at build time or during run-time if the developer allows the user to customize it.

If this was done properly, then I would purchase it for $100, give or take depending on functionality / ease of use!

What platform(s) do you want this for?

I would use a PC/Windows version, however, I am sure there will be a market for Android and PS4 as well. The main benefit of a Speech Recognition engine is it’s ability to provide a way for the user to control the application/game in a natural way while in VR.

This is my specialty. Been on my to-do list. getting Kinect 2 for windows this october.

I really like this Idea. :slight_smile:

TTS Voices are really good today, can’t understand why no Developer use it for Games. It would be much easier for Dialogs and it make it for example possible that Players could enter her Name and that Name is really spoken InGame and not only on Text.

And with VR Speech Recognition get’s much more important. Text Chat in VR… brrrr… an Immersions Breaker.

Is Kinect 2 a different speech-to-text proccessor then Window’s SAPI? Or is it just built on top of it?

According to the Microsoft MSDN: Kinect - Listening with Kinect | Microsoft Learn

They are similar, but are different. The Kinect2 has its own namespace/library (Microsoft.Speech) versus the normal OS: System.Speech

With this in mind, I suppose there would need to be yet (another) requested platform - Kinect2. The issue here would be that while it would be namely for the XBox1, there would also need to be cross-compatibility with the PC as you can get the K2/PC version. I am not sure how difficult this would be :\

I hadn’t thought of using the .NET API, I was thinking of the COM interface (http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx). It has the advantage of not needing .NET and not requiring a cross-langugage call, but COM isn’t any fun to work with :(.

I watched a few videos of using C++, Voice Recognition, and it’s implementation in CryEngine, so I figured I could attempt to do something similiar, until I got lost in the first 5 seconds of the first tutorial video :mad:

(They were using the COM interface… I did understand that part!)

Do you have links to the videos?

I can get them when I get home tonight… roughly 2 hours approx.

Oh, there are games that use it. Generally though unless it’s an accessibility feature that you have to turn on in the settings, it just gets abused.

Possibly (Probably) NSFW

Shame they removed that in an update, it was the only part that offered any entertainment.

@Veovis Muad’dib: that is not exactly what I mean, I mean for fixed Text in the Game like a Computer Voice or NPCs that speak to you and use your Name in an RPG or such things. With that you also could generate complete (procedural?) spoken sentences.

If you use a “speech to text / text to speech” for voice communication in a multiplayer/MMO setting, you can put into place a word filter before the text got translated into speech. Of course, I never got why people make themselves into slobbering idiots with tourette syndrome when they’re playing online with other people.

Is anyone working on this? I’m currently trying to figure out voice com through c++ but I’m a non-programmer nubbin so I’m completely lost and there isn’t much accessible documentation for us non-programmers.

Tutorials

Could you please share with me what tutorials you are referring to above? I have been looking for tutorials on implementing speech recognition but haven’t found any.

Well the real good TTS software is well… expensive but as a service what a hell of a good idea. There are already a few translators that are stand alone that connects to a server so from a Star Trek stand point a universal translator is a no brainier.

I would pay good money for something like this, especially if it allowed for different voices and moods when speaking from text.

I know this is an old-ish thread, but is anyone currently working on this? If not, I may give it a shot, though I’m not sure how elegant of a solution it would be since the only thing I know about c++ currently is that there is some kind of file with a .h as it’s extension.

Just the tutorials / projects on the MSDN. Link