How to create a self taught vehicle

In this post I want to talk about the AI created for Fuel Renegades, a game developed by a group of students at ESAT.

AI_FR_avoidance_33.gif

The beginning

The basic idea of the AI is to simulate the behavior of a real player: movement, overtakes, fast laps, circuit interactions, etc. So I started simulating the movement of the vehicle calculating the inputs that the vehicle received.

The inputs are:

  • Throttle
  • Brake / Going backwards
  • Turn
  • Drift
  • Interact with elements of the circuits (e.g. boost pads)
  • Vehicle movement

The circuits are based on a spline that will be the center of the road and are made with the condition that it is not necessary to drift to finish a clean lap, so drifting will not be a priority. Also, going backwards will not be necessary and unless we need to avoid an obstacle we will always speed up as much as possible. Therefore, our biggest problem is the steering.

To start I put an invisible ball a little forward of the position of the vehicle in the spline and have the vehicle follow it. We will talk more in depth about this later.

As simple as that, now we have two points with which to calculate the necessary turning input, the ball and the vehicle. With just a scalar product we get the angle between our position and the point where the vehicle wants to go (the invisible ball) and that angle will be converted to an input value with the turning range of the vehicle. How is calculated that range? I would like to say that I used a mathematical way to calculate it but it was just try and failure until it feels realistic.

The road

HighresScreenshot00000

Like Formula 1 races split their circuits in different sectors to time which one is faster, we divided our road (I mean spline really) but in much smaller sectors. How small? Small enough to keep the track of all the necessary steering changes due to curves but big enough to avoid advance more than one sector in the same frame. Also, we don’t need a new sector each frame.

spline01

So, the question now is, what will be a good distance? Well, I made some tests measuring the distance of the vehicle at maximum speed and turning it the maximum it can but I realize that the maximum speed could change in the process (and it do) so all the measures will be invalid and at the end all my measures were try and failure. So basically, I made some tests in a circuit with closed curves until the vehicle could be able to update its trace correctly and feel realistic.

Although all this work was good, was just the beginning. We only had a bunch of vehicles following the center of the road, not trying to take advantage of faster traces or trying to overtake each other. Therefore, I decided to split the width of the road equidistantly. This way I was able to find better traces and have an alternative route in case of overtaking.

spline02

To store all this data I used a hash table, the key was the relation between the position of the sector in the spline with the total length (if the sector is at 100 units and the spline is 1000 units, 100/1000 = 0.1). The value was a struct that stored an id of the checkpoint, its position, and another two tables storing the time of the trace and the number of checkpoints that the trace has passed (in total) with one row for each checkpoint of the next section.

checkpoints

Why is this really interesting? Well, you have to know that the divisions are symmetrical and if there is no divisions, there is only one checkpoint, the spline, so its id will be 0.5. With 1 division (per side) you have 3 checkpoints with ids 0, 0.5 and 1. Next division, 5 checkpoints, 0, 0.25, 0.5, 0.75, 1, and so on… So, in each division you add new checkpoints but without losing the previous information, therefore you can start with a naive spline that the vehicle will “learn” in one lap and add new divisions until you have cover all the road. This way you have faster results and it will be easier to penalize wrong traces when training.

Training

At this point we have all the necessary to start learning where are the best paths. The process is simple, the vehicle has to go checkpoint to checkpoint of different sectors evaluating all the possible combinations and penalizing those paths that take the vehicle out of the road. The key here is to penalize them instead of discarding them right away. Why? Because the vehicle almost never goes by the exact position of the checkpoints, they are just reference points for the turning calculation. So, the position of the vehicle when that checkpoint has to be evaluated is highly dependent of the previous checkpoints. For example:

badtrace

As you can see in the picture, the previous path of the vehicle makes it go away because it doesn’t have enough space to turn. To penalize the path I used the time passed as a weight value, increasing the current time stored if the trace is not valid but storing the time passed since the beginning of the lap until that checkpoint if the lap is completed. Also the number of checkpoints passed is stored in each checkpoint of the trace (only its updated if the number is higher than the previous value).

Lets look a similar case:

goodtrace

The final combination of checkpoints is the same but not all the previous trace. Now the vehicle have enough space to turn and continue the trace. That is why we should evaluate each combination more than once.

Of course, there is a limit, as much for start evaluating a checkpoint or avoid it, I have to configure some thresholds: one to consider that the checkpoint have enough information to tagged it as “learned” and another to tag that checkpoint is invalid. Therefore, we keep trying the same checkpoint in the division until we have learned it and continue the next checkpoint in the same division.

What happens when all the checkpoints in the division are learned? I keep the evaluation choosing one random checkpoint. If all are good paths they will keep between right values. And what happens if all the checkpoints in the division are invalid paths? Well, this is highly probable, especially in the first divisions of the lap, so I reset its values to the half of the range between learned and invalid checkpoint. If the, checkpoint is really invalid, it will be above the threshold right away, if not it will correct its value slowly. As you will notice, the difference between the thresholds has to be high enough to not be evaluating the same division all the time.

Improvements

The major problems that I had to face were the table look ups, so I try to optimize that ones that were hot spots mainly but of course there are another parts than can be check and be improved. The game was running at 60 fps so I left the AI optimization and move on to more necessary tasks.

Conclusion

This was my first approach to machine learning and I know there are still some things that can be improved and do better but in general I am very satisfied with the result. Any questions, advice and ideas, I would be very glad to hear them.

Thanks for reading.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.