I am interested in making a Karoake game where the player sings and is judged by hitting the right note or lyric at the right time. For example, if the lyric is spoken incorrectly they fail or lose a lot of points, but if it’s close enough such as getting the main vowel right like “break” and the player says “steak”, the player would only lose a few points because they were phonetically close enough.
I know in real karaoke the words on the screen sometimes flash red when not sung incorrectly, so I know it’s not impossible, I’m just unfamiliar with that aspect of UE4.
Note that it is more impotent to recognize tone of sound then correctness of the text and that you can do by detecting the pitch of sound. UE4 don’t provide those features out of the box so either you need to do it yourself or get plugins
Be careful not to use the Sound Visualization plugin that comes with the engine though - yes it can detect frequency and amplitude of sounds that are being output but it does not work in packaged builds. Also, it’s for output sounds; you need something that handles input sounds.
You can take a look at the RuntimeSpeechRecognizer plugin, which is based on Whisper AI. It’s cross-platform and works offline. It’s also completely free and open source.