In my last post, I applied a reinforcement learning algorithm called Q-Learning to maximize velocity in simulations attempting a human powered land speed record. This algorithm was limited by:
- An incomplete physics model.
- A sub-optimal reward function.
- A single driver limit.
Improving the algorithm and simulation in these three ways is the focus of this paper.
Abstract
In this paper a reinforcement learning algorithm called Q-Learning is applied to maximize velocity and minimize driver fatigue in a human powered vehicle attempting a land speed record. Q-Learning results in a set of Q-values for state-action pairs, called a policy. That policy is a function of power input over time for a given driver, vehicle, and environment combination that maximizes velocity for the driver’s fatigue limit. This function of power input over time results in higher achievable speeds than other methods of developing a power over time function. Several extensions of the Q-Learning algorithm are explored, along with a multi-driver vehicle simulation result.