Hi,
I am training an agent on a turn based game. One side of the game makes (for now) random moves, the other side is a learning agent.
I am unclear on how to handle invalid moves. (Think tic-tac-toe like game). I am using Get Exclusive Discrete Action
to get the possible x,y coordinates to move to, but a bunch of them are not valid.
Ideally I could mask out the invalid moves, is this supported?
If not, how should I handle this in the PerformAgentAction
function? Is there a way to end that episode early? I see an End Training
function, but not an End Episode
one.
Or is there some other way to handle this? Right now my training gets stuck because the agent hasn’t made a valid move. I am able to see if the move is valid or not from inside the Action function. I suppose I could have it sample until it picks a valid one, but that seems wrong. I would have to give it a negative reward first somehow, and it’s not clear to me how to do that in the context of the Action
function.
Thank you for your help
Dan