Skip to the content.

Waymo Motion Forecasting

Repository: WaymoMotionEstimator
Technical Report: ConvMLP Paper

Project Overview

This project implements trajectory prediction models on the Waymo Open Motion Dataset (WOMD). Given 1 second of past agent motion (10 timesteps at 10 Hz), the models forecast 8 seconds of future positions (80 timesteps). Both models were trained on 250 TFRecord files (~25% of the WOMD training partition) using a GPU-accelerated Google Compute Engine VM, streaming data directly from Google Cloud Storage.

Two architectures are implemented: a single-mode baseline (ConvMLP) and a multi-modal extension (MultiModalConvMLP) that predicts 6 possible future trajectories with confidence scores. Both operate on individual agent tracks without map or scene context.

Results

Model Avg Loss Avg ADE (m) Avg FDE (m)
ConvMLP 434.36 14.25 31.83
MultiModalConvMLP (best-of-6) 87.35 3.85 10.32

The multi-modal model achieves a 73% reduction in ADE and 68% reduction in FDE compared to the baseline, demonstrating that real-world motion is inherently multi-modal — a single predicted trajectory cannot capture the range of plausible futures (turning, lane changing, stopping), while offering six modes and selecting the best one dramatically improves accuracy.

Single-Mode Baseline:

Trajectory comparison

The baseline produces straight-line extrapolations that diverge when the agent turns or brakes.

Multi-Modal Predictions:

Multimodal Predictions Multimodal Predictions

Six trajectory modes spread to cover distinct plausible futures, with the confidence head distributing probability across relevant modes.

My Contributions

Architecture Details

ConvMLP (Baseline): Two 1D causal convolution layers (64 filters, kernel size 3) encode the past trajectory, followed by a flatten layer and MLP decoder (128-unit hidden layer → 160-unit output reshaped to 80×2). Produces a single deterministic future trajectory.

MultiModalConvMLP: Shares the same convolutional encoder and a 128-unit shared feature layer. Six separate Dense heads each predict an independent (80, 2) trajectory. A softmax confidence head predicts the probability distribution over modes. Trained with winner-takes-all: only the closest mode to ground truth receives gradient, encouraging specialization across heads.

Tech Stack

Future Work


Link: GitHub Repository