Sphinx-UE4: Speech Recognition Plugin [Release v1.0]

ShaneC · July 20, 2016, 12:14am

Hello everyone!
I have been working on a number of big changes and fixes for my Speech Recognition plugin.
I am looking for some feedback from the Community.
This will hopefully become the official 1.0 release.

Please have a read of the Wiki, for demo project/plugin link:

Let me know if there’s any issues.

Changes:
● OnWordsSpoken returns an array of detected phrases:
Previously, a string would be returned that would represent all of the phrases that were detected.
This made it tricky to process, and distinguish phrases (especially if multiple were spoken).
● BUG FIX: Ensure word recognition is ordered:
When multiple phrases were spoken, they would not always appear in the same order as they were detected.
The array of detected phrases is now always ordered.
● Added Grammar support:
This allows JSGF files to be used as an alternative to Keyword matching mode.
**NOTE: **This is experimental. Demonstration of the functionality can be seen in the example blueprint. (Key events
‘A’,‘N’). From my experimentation, this mode attempts to match very poor matches.
So, you could say blah, and it match to “five”, or something.
In future, i will look at ways to ignore poor matches.
● Added Mac/OSX support:
My OSX skills are pretty basic, so let me know if you encounter any problems with Mac usage of the Plugin.
NOTE: I have made some code changes, and have yet to jump back on and retest on the Mac.
● Allow Sphinx parameters to be passed in dynamically:
Previously, the parameters that are used to initialise Pocketsphinx were hardcoded. If you wanted to change the
sphinx parameters, you had to modify the plugin source, and recompile.
Now, you can set them dynamically, on the fly. (example in the demo project)
● Added support for words with Multiple phonetic listings (when working In Keyword mode):
For some words, the dictionary lists multiple phonetic dictations. Eg. (absolve AH B Z AA L V
absolve(2) AE B Z AA L V). Only one could previously be used. Now, you can use both, and have only 1 phrase
be reported. However, to do so, you must include both phonetic spelling of absolve (eg. absolve and
absolve(2)

motorsep · July 20, 2016, 2:16am

w00t! Congrats on 1.0 release!

Hopefully we’ll see Android support soon :o

anonymous_user_595f74f21 · July 20, 2016, 3:23am

What does this do exactly? I have a lot of dialog in my game.
is it a speech engine that processes text and plays it in the game so you don’t have to
write sound files?

ShaneC · July 20, 2016, 3:31am

That would be a text to speech engine, this plugin allows you to detect spoken phrases, to trigger game actions.
Here’s a video I made a month or so ago, showing it in action.

n00854180t · July 21, 2016, 6:28am

@ShaneC - the keyword recognition seems quite reliable now.

For the grammar modes, I tried to test them based on what the grammar files had, but was probably doing it wrong.

ShaneC · July 21, 2016, 10:37am

Hello n00854180t,
Thanks, I really want to get this working as reliably as I can.
There’s still work to do, i just hope I am going in the right direction.

Ahh, I think I see the issue with the Grammar mode…it’s only logging to the output log, and not adding to the set of recognised phrases
Try the following at ~ line 488. I have also updated the demo project link, with the fix.

			if (detectionMode == ESpeechRecognitionMode::VE_GRAMMAR)
			{
				if(phrase != "sil")
				{
					phraseSet.Add(phrase);
					ClientMessage(phrase);
					UE_LOG(SpeechRecognitionPlugin, Log, TEXT("Phrases: %s "), *phrase);
				}
			}

Try the following:

-Start the demo project, and open the Output Log.

Hit ‘I’, then ‘N’.
This activates the following Grammar file:

Essentially, it tries to find a match of the form <digit> <operation> <digit>

If you say something like “two add four” , then the Output Log would report:

Note: At the moment, it appears the grammar format does not enforce that all 3 must exist.
For example, saying “two”, will trigger as “two”, rather than finding closest match that fits the form <digit> <operation> <digit>
I`ll look into whether I can enforce this, there’s very likely a way.

n00854180t · July 21, 2016, 7:00pm

I’ll give it another test tonight and report back.

Joakim_Olsson · July 23, 2016, 7:40pm

Really awesome!

Do I have to do something other than just adding a word to the dictionary in order for it to be callable? Do I have to rebuild the plugin or should I add anything more. It’ appears as when I use the “Mak RecognitionPhrase” with my word and add it to an array with existing words, not even the existing words will work. But if I remove my word from the array, the other words work just fine.

Thank you!

//Joakim

ShaneC · July 26, 2016, 6:21pm

It seems my code did not correctly set Sphinx params.
I have committed an update to GitHub.

I will update the sample project in the coming days.

For sphinx params, I have had pretty good results with the following:

**
-vad_postspeech 10
-vad_prespeech 10
-agc “max”
**

I`m sure I can improve my accuracy, with additional tweaking, it’s just 3AM atm

anonymous_user_595f74f21 · July 27, 2016, 9:15am

My game has got nearly 15,000 text dialog segments to it. so it would need to be read from a
database of text ID strings and flags and speaker name. it’s so easy to set it up in windows script language, but hard to do in Unreal. Blueprints are limited. You need a text string database.

ShaneC · July 27, 2016, 11:57am

Without some context, that made little sense to me.
The methods and classes can easily be called through c++, if you wished to do that instead.

anonymous_user_595f74f21 · July 29, 2016, 2:36pm

Sort of like this

Speaker ID: Storm0391…“String Text Dialog”…%flag name%…%Flag Counter%…%Audio file pathname%

The Problem is, I don’t understand C++
so i have a huge disconnect between the language that Unreal Uses and the language that I know which is simple script.

In other words you need a unreal coder to decipher my script language into unreal code.

ShaneC · July 29, 2016, 8:11pm

I still don’t understand what you are trying to achieve.
From the fact you mentioned 15,000 text dialog in an earlier comment, it sounds as if you are looking to convert text to audio.
That is not the purpose of this plugin.

ShaneC · July 29, 2016, 8:17pm

I have made a few small improvements/fixes to the plugin, and have created a tag for Version 1.0

The Wiki has been updated, and includes a demo project (as seen in the video below)

motorsep · September 1, 2016, 8:02pm

Just wondering if there are any news about Android support.

ShaneC · September 2, 2016, 4:47am

Unfortunately, my spare time has been spent on developing a HTC Vive input mapping application, controlled by javascript scripts (eg. Vive input controlling mouse, keyboard, game-pad).
The port is still on my to do list, but I first want to release a super early alpha of the Vive mapper, for people to try out, before I switch back to working on the Android port of the plugin.

Tee_Kay · January 25, 2018, 7:11pm

First of all your plugin looks awesome and I would love to see how much it will grow.
I opened the project, then the game map, run the game and then when I tried to give commands from my mic, nothing happened.
I’ve been trying to find solutions all day.
Do I have to compile anything or do anything else before trying to run the project in UE4?
I need any help I can get, because I am really interested in seeing how this plugin works.

Thank you very much!

ShaneC · February 14, 2018, 11:59pm

Hi Tee,
Sorry for the late update, I haven’t checked this forum in a fair while.
The recognition plugin uses the default microphone of the system. Is this the microphone that is being used?
The plugin binaries are included in the demo project, so this shouldn’t be the issue.
Was the project updated to a later version? because this could perhaps cause an issue.
I`ll PM you a link to a pocketsphinx test application in the next day. This can be used to test recognition that mirrors the Game map of the demo project.
This will help isolate if the issue is with pocketsphinx in general, or an issue with the unreal implementation or project.

In other news, I have ported the plugin to android, and am looking for people to try and get the demo project working on their Android devices, and then hopefully implement it within projects of their own. Please try it out.

https://forums.unrealengine.com/comm…ndroid-release

Regards,
Shane

slam937 · March 6, 2018, 5:35am

Hi Shane,

Thank you so much for this! It works great.
After playing around with it a fair bit, I decided to implement it into my own project. However I’m getting issues with memory leakage from the node ‘Enable Keyword mode’. Have you experienced this before/have any advice to fix it?

Thanks!

Edit: just found out the problem which was my silly mistake. Forgot to implement ‘Event End Play’ event. Oops!

ShaneC · March 12, 2018, 2:12pm

Glad to see it was an easy fix Blah, i really need to add the Shutdown step into the Wiki steps, instead of just the demo project…too many things, too little time >_<