On June 4, at the GTC conference in Taipei, NVIDIA released Cosmos 3. Jensen Huang put it this way. “Thanks to breakthroughs in multimodal reasoning language, vision, and world models, the era of physical AI explosion is approaching.”
Translation: AI is finally leaving the chat window and entering the real world.
Cosmos 3 is a fully open-source physical AI foundation model — the first of its kind. It can understand and generate text, images, video, environmental audio, and action sequences. It can simulate physics. It can predict what happens next in a scene. And it can tell a robot how to move its arm. All of it, all at once.

How It Works
The architecture is what makes this interesting. Cosmos 3 uses a dual-transformer design.
The reasoning transformer analyzes object interaction logic, motion paths, and spatiotemporal dynamics. In plain English, it figures out how the physical world works. The generation transformer uses those insights to produce accurate video frames and action trajectories.
The key insight: understand physics first, then act. Most models try to skip straight to acting. That is why they fail in the real world. Cosmos 3 takes a different approach. Understand the rules. Then follow them.
NVIDIA positions Cosmos 3 as three things in one: a vision-language model for cross-modal understanding and reasoning, a world model and video foundation model for simulating physical environments and predicting future states, and a world action model backbone for training robots to perform specific tasks. One model. Three jobs.
The Product Lineup
Cosmos 3 comes in three flavors.
Model | Target Use Case | Status |
|---|---|---|
Cosmos 3 Super | High-precision compute (robotics and AV post-training) | Released |
Cosmos 3 Nano | Resource-constrained devices, high-quality video and action reasoning | Released |
Cosmos 3 Edge | Real-time edge inference | Coming soon |
Super handles the heavy lifting. Nano runs on smaller devices. Edge goes directly into robots and cars. There is a Cosmos 3 for every use case, from data center to dashboard.
Benchmarks
In head-to-head comparisons, Cosmos 3 ranks first among open models across multiple physical AI benchmarks. Artificial Analysis, Physics-IQ, PAI-Bench, R-Bench for world generation accuracy. RoboLab, RoboArena for action policy. VANTAGE-Bench, TAR for visual understanding.
First place across the board. The open-source physical AI leaderboard now has a clear winner. For now.
The Cosmos Coalition
NVIDIA is not going it alone. The company launched the Cosmos Coalition, a global developer collaboration alliance. Members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI.
The stated goal: advance next-generation world models and accelerate physical AI innovation and interoperability. The unstated goal: build an ecosystem so no one has to start from scratch. NVIDIA wants to be the Windows of physical AI. Open, standardized, everywhere.
Who Is Already Using It
Cosmos 3 is already in production across physical AI sectors.
Sector | Partners |
|---|---|
Robotics | Agile Robots, Doosan Robotics, LG Electronics, Samsung, Skild AI |
Autonomous Vehicles | Li Auto |
Vision AI | Centific, Fogsphere, Linker Vision, Milestone Systems, Yuan |
Li Auto, the Chinese EV maker, is using Cosmos 3 for autonomous driving development. Samsung is using it for robotics. The list is growing. And it will keep growing.
How to Get It
You can try Cosmos 3 today on build.nvidia.com. Download the open models from Hugging Face. Access resources and customization tools through GitHub. Deploy it as an NVIDIA NIM microservice. It is open, accessible, and ready to use.
NVIDIA could have locked this down. They chose not to. That is a strategic decision worth paying attention to.
Why This Matters
Physical AI will benefit three industries first.
Autonomous driving gets world models, reinforcement learning, and end-to-end algorithms that accelerate deployment. Embodied intelligence gets robot simulation training, edge inference, and motion control. Industrial software gets CAE simulation, digital twins, industrial control, and energy scheduling.
ZD Net analyst commentary suggests autonomous driving will likely be the first to achieve both the “data loop“ and the “commercial loop” for physical AI. Self-driving taxis, passenger vehicles, autonomous trucks — that is where the money is. Robots follow close behind.
This marks NVIDIA‘s shift from AI training infrastructure to AI deployment platform. GPUs used to be the shovel. Now NVIDIA is selling the shovel, the mine, and the logistics network. Huang said at GTC: “We are moving from just talking about AI to showing how AI acts.” That is not marketing. That is a product roadmap.

The Open Source Question
NVIDIA chose to open source Cosmos 3. That is not a small decision. The company could have kept it closed. It could have charged for access. It did not.
This suggests NVIDIA wants to be the standard, not just the supplier. Windows beat Mac because it ran everywhere. Android beat iOS because it was open. Cosmos 3 is NVIDIA‘s bet that open wins again.
Whoever controls the foundation model for physical AI controls the next generation of robotics. NVIDIA just made its play. And they made it open.
P.S. Cosmos 3 is not a model. NVIDIA released a model, a dataset, a developer coalition, a cloud service, and an edge deployment option all at once. That is not a product launch. That is an operating system announcement for the physical world. Pay attention.