Greetings fellas,
for my university project I need to create a plugin for providing lip sync functionality for characters with morph targets through a text to speech system(TTS).
What I need:
The TTS is running on a server, or a server can be started locally. The connection to it is handled via HTTP and I need to send it either raw text while in runtime, or an EmotionML file while in editor. EmotionML is a Markup Language like XML for emotions.
I will need to receive .wav files and text files which I want to save as a bundled asset in a hierarchy so that every character has his own dialouge folder.
In the editor-mode I want the plugin to be able to open a window with a editor to provide the user some funtionality to create an EmotionML file easily. Within this new editor window it should be possible to send the server the typed EmotionML file and get the .wav file from the server and listen to it. After adjustments, the user should be able to make a new request to the server and after approving the updated audio, all data should be saved in the assests for the chosen character.
The Runtime-Module should work like a usual online chat. You type something in the chat and then the raw text is sent to the server and the audio and necessary text file for timing the lips of the character is retreived and played immediatley together with the lip animation. Nothing has to be saved here.
My problems:
I am pretty new to Unreal and I am not the smartest programmer yet. I tried to read the documentations, tutorials, source code and all this stuff. But I feel like I still can’t start even though I know what I want to do.
I have a time limit and I don’t want to waste more time on trial and error myself through this bulk of work. I’ll tell you my ideas and my thoughts about realizing my project and if you feel like criticising me or giving me tips I’m glad to receive all that.
My plan:
First of all I already created a plugin which opens a window(or dockable tab?) through the example plugin section. My next step was to get the HTTP connection working. Through a batch script I was able to let the TTS-server start whenever the window opens. I found online some documentation about how to interact with a HTTP server through JSON (unreal-wiki), but because I need the EmotionML part I was still a bit stuck. I have no knowledge about HTTP and I only saw tutorials on how to interact with JSON so I was a bit confused. When I get the connection working I planned to create the asset container for the .wav and the text file first. For this I thought I would need a UDataAsset, right? After that I have to dive into Slate and create the layout of the custom text editor. Something simple for the beginning is fine: The functionality to write simple text inside a TextBox and buttons for sending the data to the server, one for playing the received .wav file and one for saving the data as a new Asset. That’s all that I want for the beginning.
For the animation part I wanted to use blueprints to access the custom Assets and use the data to play the .wav file of the Asset in sync with blending the morph targets of the lips of the character. About this part I have no feeling at all on how complicated it will be.
For the runtime part I also plan to use blueprints. Within a UMG created textbox’s blueprint I want to call the necessary c++ functions in my blueprints to send the server the typed text and then the received data should be packed in my custom asset and the animation part can handle it and play like earlier described.
Sooooo I have a lot of plans and I already cut it down to the basics. I would really appreciate it if you share your experience. I still feel kind of lost and I didn’t grasp it yet on how to achieve my ideas.
Thanks guys, cheers