Deep Mimic with Learning Agents

tu-danaaa · March 26, 2024, 7:49am

@murraythis How to make an animated sequence with motion

murraythis · March 26, 2024, 12:22pm

Hey there,

I’m animating in Blender. If you were asking because you were looking for software to animate in, I’d probably recommend looking into UE5 before Blender (Unless you have an autodesk account in which case I would assume you’re already using Maya haha). Based on what was shown at State of Unreal the tools look like they have come on leaps and bounds, and Epic seem to be throwing the kitchen sink at in as far as support goes.

susu506 · April 2, 2024, 10:10am

Hi murraythis, I think you’re making a lot of progress. How did you use Bullet physics in UE? And did you try Adversarial Motion Priors method?

murraythis · April 2, 2024, 12:29pm

Hey susu506,

Thank you! It’s getting there slowly though the move to a bipedal character is certainly presenting some challenges!

I followed this fantastic guide to get Bullet working in Unreal:

From there it was a case of learning more about the bullet API and implementing their multi body class. Bullet comes with a bunch of examples which should help.

I haven’t tried Adversarial Motion Priors but that is where all this work is currently headed. No idea when I’ll get there though haha!

splem7 · April 18, 2024, 11:41pm

Hi @murraythis, I really like seeing your progress with this tool. I’m also taking a basics-first approach before moving into more complicated systems. I successfully followed the driving tutorial last night.

But bc they are so few examples or tutorials of Learning Agents online, I am finding it hard to develop intuition for the Plugin and its nodes. I’m struggling to create a cube that learns to navigate to a goal sphere.

Can you point me to any step-by-step tutorials of extremely basic, ground-up L.A. implementations? I’m trying to get intuition.

@Deathcalibur I feel like the driving tutorial is very exciting, and it gives beginners an immediate feeling of success. But bc it is built with so many idiosyncrasies, it is really hard to generalize what a developer has learned to other scenes they might want to build themselves. For example, I’m not exactly sure what kinds of data can be handled as observations, or how to replace the input with the RL controller.

It would be both easier for yall to build simpler examples, and potentially more beneficial for developers. That’s my two cents.

Are there any simple examples of building from an empty scene online?

Best

Deathcalibur · April 19, 2024, 2:08pm

Thanks for the feedback! It is definitely a good idea for there to be some more tutorials and simple ones at that.

Part of the challenge is that LA is a work-in-progress and we know there are still breaking changes that will be made. Writing a variety of tutorials becomes a maintenance burden with each release as we need to update all tutorials to the current edition, on top of writing the new features and testing, etc. Once we get to a “1.0”, we can expand the breadth and depth of the documentation/tutorials.

I hope that sounds fair and in the meantime I invite someone in the community to put together a simple tutorial with moving to a goal. I will happily review it

splem7 · April 19, 2024, 4:35pm

Thanks Brendan, I know it’s a lot of work and we’re really excited to use it. I’ll post any other tutorials if I find them here, and will make some documentation if I’m able to create something myself.

The issue I’m struggling with now is how to replace an Enhanced Input Mapping with the randomized LA inputs.

splem7 · April 19, 2024, 4:57pm

I am happy to create thorough tutorial assets (video, screengrabs, text) for the community if I could have a brief screenshare convo with someone on the team who can guide me through a basic Blueprint setup.

AuraNode · April 19, 2024, 6:30pm

Hey splem7, I think I can give a little assistance with the observations. LA is being developed to be very flexible. Just about anything can be an observation so long as the data can be accessed in Play-In-Editor. LA does not support non-runtime learning at the moment. So for your cube, your observations could the location of the cube, the location of your goal, the velocity of the cube, the orientation of the cube, the angular velocity, etc. With machine learning, the most effective approach to building an intuition is to start small, so what is the minimum amount and types of data needed to confirm that your agent completes the goal. To relate it to your own capability, if you know where you are, what direction you are facing, where your goal is, how fast you are moving (linearly), how fast you turn, and you have a well defined goal (decrease the distance from your current position to your goal position) you can probably figure out what to do. This is the same idea for a learning agent. The observations should ideally be only what is necessary to accomplish the goal and the goal should be well defined. The simpler the goal is, the easier it will be for you to understand the behavior of your agent. Your actions could be a rotation of your cube on the XY plane, a change in location along the XY plane, a move speed, and/or a turn speed.

I would recommend not getting hung up on the particular implementation of LA 5.3 as the changes in LA 5.4 are somewhat significant. I believe there are changes to the observations and actions functions in the Interactor, there are changes to the Completions, Rewards, and Resets in the Trainer, and likely more. I am almost done implementing my first attempt at an agent in 5.4 but I am using a skeletal mesh as my agent so its a bit more complicated and taking me more time to complete. Once I’m done and it’s working, I would not mind giving a general guide to LA and helping you define your own learning agent.

Let me know if that interests you.

AuraNode · April 19, 2024, 6:41pm

Hey Brendan, in 5.4 in the Trainer, are completions set as part of an episode reset or do they need to be self defined? For example, I want to reset my agent if any body part touches the ground other than foot_l and foot_r (I have the logic for it set up in my agent blueprint), am I accessing this event in my Completion and setting a termination or truncation or in my Reset episode calling reset episode then specifying the completion type, and how is that accessed? At the moment, I am crudely looping through all agents to check for the condition but I am also not wholly sure how to toggle that condition given the event within my agent blueprint.

splem7 · April 19, 2024, 8:39pm

@AuraNode I really appreciate the response. I would definitely like to get some general guidance. In exchange, I’m happy to collaborate on the development of simple guides for other beginners, as well as sharing ideas for further applications. My Blueprint skill is 3/10, but I work professionally in other areas of technical art.

My goal with this project is to create non-human organisms that evolve simple motor skills, then begin evolving anatomically. This second part is not clear, but I’m interested in evolutionary simulations. Eventually I want to set up complex reward architectures that incentivize communication, specialization, and culture. Lofty. For now I need to learn absolute basics.

In the last 12 hours I learned a bit more about Setting Actions and Observations. But I now have the problem of not understanding how to replace an Input Mapping Context with the movement floats as inputs so that the AI can permutate randomly and learn.

Talk soon.

AuraNode · April 19, 2024, 8:47pm

That would be great actually, I figured starting a discussion that explores the plugin would be a great idea.

With that in mind I will set up a Conversation Topic so we can include the community in the process. The Conversation will be on a General Guide to Learning Agents.

On your goal, I have similar goals that are exceedingly lofty but I do have a path to development for them. I’d like an idea of your current knowledge in machine learning, and more specifically reinforcement learning, as that will help me tailor responses. If you want to discuss your ideas and see what path their might be, make a new Conversation or Question and link it here.

splem7 · April 19, 2024, 8:54pm

Here is my current skill level: I am not sure which nodes I should use in the Event Graph of my Pawn Cube to set up its Inputs, and how to convert those inputs into a variable that can be used by the BP_Interactor (I’m trying to reward a cube to move towards a goal Sphere Actor which randomly spawns somewhere in a bounded space, and penalize the cube for crossing a bounadary).

I’m fairly comfortable with setting up the nodes that can pass the math of positions and distances into Set ____ Observation nodes at this point. And I can also scale the math of these Observations into a Reward window that ranges from 1 to -1.

splem7 · April 19, 2024, 9:09pm

Ok, great. I’ll look out for the Conversation Topic. Did you set it up already?

My ML experience: I have 3/10 Python experience. Can call APIs, use langchain to extract embeddings from text, have connected openai api to elevenlabs api to output custom voices. I’ve played around with a few colabs that run LLMs from huggingface or wherever else. I also can implement diffusion models in custom architectures for visuals. Besides that it’s all been thru AI SaaS services that anyone can use.

I have zero hands-on experience with RL, but I have researched the conceptual history of it. It comes from Behaviorism, and entrainment research in animals. Pavlovian conditioning. Skinner’s operant theories.

I would like it to be possible for anyone with basic blueprint knowledge to implement a simple experiment using geometric primitives: ie. teaching an actor to move to a location, teaching an actor to dodge, actor can forage and return. Then teaching RL for physics components like joints, so that a two-jointed caterpiller could, for example, receive a reward for learning to make itself jump by rapidly contracting the joints. or teaching a simple robot arm how to point to a randomly spawning actor (idk, just thinking out loud).

Perhaps a criteria for the General Guide is that the examples can start from blank scenes. Only the plugin enabled.

AuraNode · April 19, 2024, 9:49pm

Ok let me see if we are on the same page.

By your agents ‘inputs’ do you mean how your agent moves itself?

This can be done two ways:

By physics.
By ‘player’ input.

For physics you would make your cube simulate physics, lock its Z motion in the settings, and in the Interactor you want to setup actions for adding a velocity to the cube on the X and Y.

For ‘player’ input, I see your issue. For normal player input you would create an event for the key pressed and run the logic given that key press. But the Driving tutorial seems to have input nodes that set the strength of the Throttle and Brake out of the box.

Ok so I think I figured it out (partially). Ignore the Input Mapping, go into your interactor. In Setup Actions, specify a vector action. In Set Actions, get a reference to your movement component by creating a new variable, naming it, then use the dropdown to search for your pawn or character by name, set it as an object reference. Then, drag that variable onto the event graph. Then drag off the object pin and search for Add Force or Add Impulse. Then set that vector as your action using the GetVectorAction node.

I haven’t set up that guide yet, I will start it tomorrow after I have completed stage one of my prototype and tested it to ensure I only recommend successful steps.

As for your ideas, Learning Agents is absolutely capable of all of that and it does help that you have some background in ML, not much is needed for the basics but it gets much more useful as the projects get more complex. I am working on a jointed, physics based character model much like OP.

Reinforcement learning requires knowing not just how to create the learning algorithm but also how it interacts within an environment so, compared to other forms of ML, it’s both more involved and potentially more powerful.

I’ll make sure the Guide starts blank to lower the barrier for entry. RL gets so complicated so quick that any smoothing of the on ramp is low-key necessary for collective progress.

splem7 · April 19, 2024, 10:44pm

I’m not concerned with physics or collision right now.
I do mean how the agent moves itself by player input (whether ‘player’ = human or ai).

Are these the the correct nodes?

I may just have to wait until tomorrow. I don’t fully understand.
The Adding and Setting of Float Actions is what I had before.
The other unconnected clusters are what I’m trying to experiment with now.

AuraNode · April 19, 2024, 11:36pm

Ok I think I figured it out.

You can copy this code and paste it in your SetActions function, just make sure to set the right references for your variables.

You can remove the unconnected nodes.

If the output float is positive, you will add a force in the positive forward direction and if it is negative you will add a force in the negative forward direction.

Same is true for the Right Left Action.

SetActions for Cube

Hope this helps!

splem7 · April 20, 2024, 2:17am

If you search ‘splem’ on blueprintue you’ll find my BPs.

They’re incomplete, but just so you have a sense of where I’m at.

I also have an Input Mapping Context for WASD that would move the cube in World Space.

Scene looks like this

GalletaRicolina · June 10, 2024, 1:22pm

Hello, I have been following you project and it looks amazing. I am also doing something similar. I am implementing DeepMimic and DIffMimic in Mujoco with Brax, the differentiable physics simulator. So until now I had a lot of troubles implementing the environment. So I was wondering how did u implement the observation of the agent. On the paper it says local joints relative to the root. So, basically I just transform all of my joints in global frame to root frame right? And for the rewards. For the pose reward, is that in root or global frame? I am not sure how to implement that.
Finally on the paper it says local and root frame. This is different or is the same?

murraythis · June 12, 2024, 3:39pm

Hey!

Thank you, and best of luck with your own implementation.

There’s two ways I’ve ‘observed’ the bone positions. The first is just use the joint rotation in the body frame (ie local frame) and ensure that the values are mapped to be between -1 and 1.

The second method (which is what I do now) is exactly what you mentioned doing. I take the world location of each body, create a vector from the root (I use the pelvis here) to what world location, and then unrotate that vector by the root (root_rotation.inverse() in world space). I believe I scale these vectors then so that they won’t exceed values of -1 and 1 though I think at one point I just normalised them instead.

The velocities I simply read from the bullet function compTreeVelocities which appears to solve the Forward Kinematics of the body and return the velocities in the parent space of the body. I may be wrong about that as it was a while ago I looked at it.

As for pose rewards I keep them in local space as does the paper. I tried the second method as mentioned above and while it worked it just took longer. I guess the idea is to never penalise matching the pose, which could happen with the second method where if the spine isn’t matching all child joints will be seen as not matching either.

Aplogies for being a bit rusty on the specifics, I am still working on this it’s just for the past couple of months I’ve been redoing most of the actual “robot” which means a whole load of maths I need to go through!

//