Predicting Using Onnx Neural Network with dynamic input(output) shape

AwakingsWings · September 9, 2022, 4:26am

I’m trying to use WavLM_base in UE5(now my version is 5.0.3).
I converted the pytorch model to onnx,keeping the dynamic input shape.
I promise that the onnx model works well in python onnx-runtime.
I imported the model to UE5,here is the code that how I use it:

bool UAutoLipAnimRuntimeBPLibrary::Func(const FString WavPath)
{
	USoundWave* sw = NewObject<USoundWave>(USoundWave::StaticClass());
	TArray<uint8> rawFile;

	FFileHelper::LoadFileToArray(rawFile, *WavPath);
	FWaveModInfo WaveInfo;
	if (WaveInfo.ReadWaveInfo(rawFile.GetData(), rawFile.Num()))
	{
		sw->InvalidateCompressedData();

		sw->RawData.Lock(LOCK_READ_WRITE);
		void* LockedData = sw->RawData.Realloc(rawFile.Num());
		FMemory::Memcpy(LockedData, rawFile.GetData(), rawFile.Num());
		sw->RawData.Unlock();

		int32 DurationDiv = *WaveInfo.pChannels * *WaveInfo.pBitsPerSample * *WaveInfo.pSamplesPerSec;
		if (DurationDiv)
		{
			sw->Duration = *WaveInfo.pWaveDataSize * 8.0f / DurationDiv;
		}
		else
		{
			sw->Duration = 0.0f;
		}

		sw->SetSampleRate(*WaveInfo.pSamplesPerSec);
		sw->NumChannels = *WaveInfo.pChannels;
		sw->RawPCMDataSize = WaveInfo.SampleDataSize;
		sw->SoundGroup = ESoundGroup::SOUNDGROUP_Default;
	}
	if (sw->NumChannels != 1 || *WaveInfo.pBitsPerSample != 16 || *WaveInfo.pSamplesPerSec != 16000)
	{
		UE_LOG(LogTemp, Error, TEXT("输入不合法,需要单声道,16k采样率,16位音频"));
		return false;
	}
	TArray<float> WavLM_input;
	WavLM_input.Init(0.f, 30);

	// Get the sample data of this file
	const int16* SamplePtr = reinterpret_cast<const int16*>(WaveInfo.SampleDataStart);
	for (uint32 i = 0; i < *WaveInfo.pWaveDataSize >> 1; i++)
	{
		WavLM_input.Add(static_cast<float>(SamplePtr[i]) / 32767.f);
	}

	UNeuralNetwork* WavLM_Base = FAutoLipAnimRuntimeModule::WavLM_Base;
	if (WavLM_Base == nullptr || !WavLM_Base->IsLoaded())
	{
		return false;
	}
	const FNeuralTensor& Input = WavLM_Base->GetInputTensor();
	TArray<int64> InputShape = {1, WavLM_input.Num()};
	// Input.SetNumUninitialized(ENeuralDataType::Float, InputShape,true);
	WavLM_Base->SetInputFromArrayCopy(WavLM_input);
	WavLM_Base->Run();
	TArray<float> OutputTensor = WavLM_Base->GetOutputTensor().GetArrayCopy<float>();
	return true;
}

Noticed that input tensor shape is (1,1) （should be (1,dynamic)）, Once Run() be called,
error occurred(without any infomation).
Maybe I should call Input.SetNumUninitialized(ENeuralDataType::Float, InputShape,true);?
But Input is const,can not do this.
Is there an example to use dynamic input Neural Network?
Thanks for ur helping!