Speech to Text or Voice Recognition?

How would I go about converting Sound Waves recorded by the player to text or to commands.

I want to create a system where the player records their voice to be used as a command later on. So ‘Move Forwards’ could be the player saying ‘March’
I figured a good method would be to have the players recording turn to text and compare that text to the database of commands that were previously saved to see if it matches anything. Then execute a command based on that basic Boolean.

Any ideas on how to go about this?