Reinforcement Learning and World Model for Autonomous Driving Intern - 2026

Nvidia • Full-time • China, Shanghai • 1w ago

We are in search of a hardworking intern with expertise in Reinforcement Learning and Multi-Modal World Simulation Model to propel the evolution of ML-centric autonomous driving and Physical AI solutions. The focus of this role lies in model-centric RL, learning about world simulation models, and translating state-of-the-art (SOTA) algorithms into real-world applications, allowing vehicles to interpret, anticipate, and respond astutely in challenging dynamic contexts. This is a rare opportunity to shape the next frontier of intelligent driving, where imagination meets real-world impact. If you’re excited by the idea of building SOTA simulation techs and systems that learn, adapt, and truly “think,” we’d love to have you on board. Join us, join a team where your input plays a crucial role in fast-tracking the growth of autonomous vehicles with the state of art solutions.

What you'll be doing:

Develop and refine multi-modal world models and integrate them into our simulation system.
Train and evaluate self-supervised latent dynamics and sensor generation models for the joint tasks of trajectory prediction, goal-conditioned ego control, and sensor data synthesis. Explore and prototype hybrid architectures combining world models, generative (e.g., diffusion, flow matching) models, and policy gradients for realistic and robust simulation.
Collaborate with End-to-End Driving Model teams to deploy world-model-based policies to simulated RL environments and accelerate the training of the driving systems.
Contribute to system development for continuous learning and simulation adaptation (Sim2Real transfer).

What we need to see:

Pursuing PhD in Computer Science, Machine Learning, or a related field, with neural rendering, robotics, or simulation background.
Strong understanding of reinforcement learning (policy gradients, actor-critic, offline RL).
Familiarity with visual representation learning and 4D scene representation (NeRF, Gaussian Splatting, occupancy networks and contrastive, masked modeling, or generative world simulation) for world simulation.
Experience building large-scale training pipelines with temporal consistency and simulation data replay.
Publications or open-source contributions in RL, model-based control, or autonomous systems.
Passion for developing learning systems that can “imagine” and plan in the real world.