Motion prediction for traffic scenarios with hybrid recurrent-convolutional neural networks

Thumbnail Image
Λαγουτάρης, Βασίλειος
Journal Title
Journal ISSN
Volume Title
Self-driving is a very important topic that is being researched both by large automotive organizations and the academic world. The creation and deployment of self-driving vehicles is expected to dramatically reduce road accidents and improve the quality of life of millions of people. However, despite all the attention self-driving vehicles have been getting, they still have a long way to go before matching the performance of the best human drivers. That is because the problem of successfully creating a self-driving vehicle is very complex and consists of multiple subproblems. An essential prerequisite for self-driving to become a reality, is the ability of the self-driving vehicle to be able to anticipate the behavior of other moving agents in its environment. If the self-driving vehicle is able to successfully predict the movement of the agents around it, then it would have an easier time generating the trajectory that it will itself follow, while maximizing safety for everyone. As has happened in many scientific areas within the last few years, deep learning-based methods have dominated this field. These methods are based on Neural Networks which can be Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs), among others. In this work a few different approaches from each category are presented and compared between each other. The first approaches to tackle the problem of motion prediction that will be presented, are based on Recurrent Neural Networks and more specifically in the encoder-decoder architecture, which has shown great results in time-series applications. In addition, a method from a recent paper based on Convolutional Neural Networks is presented, as CNNs are able to model spatial interactions in images, which often appear in self-driving applications. Finally, two variations of a novel hybrid method that combines Recurrent and Convolutional Neural Networks are presented. This method takes as input a Bird’s Eye View image, which is a representation of the driving scene as it would be viewed from the sky. From this image a Convolutional Neural Network can model the spatial interactions between moving agents. The features it extracts from the image are used in the decoder of a sequence to sequence model to aid with the prediction of the moving agent’s future trajectories. All the examined models were trained and evaluated on the Lyft Level 5 Motion Prediction Dataset, which is the largest and most detailed publicly available self-driving dataset for motion prediction. Moreover, all methods were compared with the results achieved by researchers and professionals in the Kaggle competition titled “Lyft Motion Prediction for Autonomous Vehicles”, that took place in late 2020.
Self-driving, Motion prediction, Neural networks