Deep Mimic with Learning Agents

Hello, and thank you for your messagea and Good luck with your project .
Yes I am doing something similar. However, in my case I am also transforming my velocities to root space.
The problem that I have is that my agent is standing still on the initial position all the time it doesnt move. Did you run into a problem similar to this? Also do you have linear velocities and angular velocities? In mujoco I only have the linear velocity of the root but for the joints I only have angular velocities. Also are you using a pd controller? I tried to code a stable pd controller but I am not sure if what I am doing is correct. For the rewards are you also taking into consideration the root pose or just the joints? And I was wondering for how long did you train the agent? Are u also using PPO?

Hey there, i am also working on jointed character models but im using a more complicated version than deepmimic which relies on using the animations at inference time and a phase variable for synchronization.

My observations are locations, rotations, linear velocities, and angular velocities.

The locations are straightforward, get the global location of the pelvis (not the root if using a skeletal mesh like manny their root is a floored bone), get the location of the bone in world space, subtract the bone from the pelvis and scale by the maximum distance a bone can be from the pelvis. In unreal, distance is in centimeters so first divide the bone locations by 100 then by roughly 1.2 as no bone for an average humanoid character model should be more than 1.2 meters from the pelvis.

If using learning agents, rotations are handled extremely well behind the scenes for you. They are returned first as quaternions then converted to a normal tangent vector encoding using the forward and rught vectors of the bones and are already relative to the pelvis if its transformed is passed as the relative transform.

For linear velocities, i save the current relative location of the bone as the previous, subtract the current from the previous, divide it by delta time which is 0.0333333 in my case then divide it by 12. The dividing by 12 scales the bone velocities to be within the maximum linear velocity they can reach. The pelvis of a basic humanoid can reach 5.3ish meters per second and a swinging hand can reach double that or more. Hence, 12 in my case.

For angular velocities i get the quaternion delta, current rotation * previous rotation.inverse(), use .toangleaxis() store the angle and axis, convert from degrees to radians, scale the angle by delta time and the max angular velocity (5 rads per sec or less), multiply the axis by the angle then use that resulting vector as an angualr velocity.

Everything is the pelvis’s frame of reference. Also, i ignore location and rotation of the pelvis and only account for its linear and angular velocities.

As for actions, i use the built in constraint system which is very powerful as a pd controller.

As for rewards, i use the difference between the center of mass velocity of the skeletal mesh and a target velocity between 0 and 5. This is a starter for simply moving in a direction at a speed but can be later enhanced with other rewards.

1 Like

Hello,
Thank you I will try what u are doing, thank you. Aslo I was wondering, what RL algorithm are u using? Also since I am trying to use the rewards from the paper DeepMimic, my rewards should be between 0-1 but when I run PPO I got rewards bigger than 1.