Georeferenced synthetic data for VLM / VLAM
and End2End model training in Physical AI
Generate world-accurate digital twins at scale and
bridge the gap to reality with the AVES COSMOS Control UI
Eliminate your Synthetic Data Generation Bottlenecks
Why simulator-only data still falls behind the real world
Simulators provide perfect ground truth and controllable scenarios. But relying on raw simulator output alone introduces three blockers at deployment:
A workflow that keeps simulator control without realism limits
The AVES COSMOS Control UI combines georeferenced worlds with photorealistic visuals to close the sim-to-real gap and increases scenario diversity.
Domain Gap
Even with perfect labels, simulated imagery often differs from the real world—creating a domain gap that limits model performance in real-world deployment.
Close the domain gap
Photorealistic variations that mimic real-world conditions such as weather, glare, and sensor artifacts.
Scalability constraints
Creating diverse simulation scenarios requires significant manual efforts and compute, making large-scale coverage difficult to scale.
Scale diversity efficiently
Reproducible, parameterized scenario generation instead of manual single scenarios.
Limited visual realism
Traditional simulator renders often lack the photorealism needed for reliable real-world model performance – resulting in intensive post processing and domain adoption.
Deployment-ready datasets
Synchronized video, labels, calibration and metadata in a consistent schema.


Georeferenced Sim-to-Real, finally available At Scale:
The AVES COSMOS Control UI brings real-world maps into controllable simulation workflows and integrates latest NVIDIA Cosmos Transfer models to augment photorealistic synthetic video data and to supercharge your dataset diversity.
User Specifications
Region, ODD Cameras, Scenario mix
AVES World Generation
HD-map + 3D world semantic labels
CARLA Simulation
Controlled scenarios and ground truth
NVIDIA Cosmos Transfer
Photorealistic output: Rain, Glare, Seasonal, Lens Effects
Massive Dataset Delivery
Videos, per-frame labels, Metadata
Powering Use Cases at the Forefront of Physical AI
VLM / VLAM Training
Synthetic data enables scalable training of Video Language Models (VLMs) and Video Language Action Models (VLAMs) by generating diverse, controllable, and richly annotated environments that are difficult to capture in the real world.
Scenario-driven video datasets allow models to learn complex spatial, temporal, and semantic relationships across dynamic scenes, enabling robust perception, reasoning, and decision-making before real-world deployment.
End2End AD Stacks
Synthetic training data enables scalable development of end-to-end autonomous driving stacks by generating diverse, controllable driving scenarios that are difficult or unsafe to capture in the real world.
Richly annotated environments spanning weather conditions, traffic patterns, and rare edge cases allow models to learn robust driving behavior from large-scale experience. This accelerates training for acting safely in complex real-world environments.
Smart City Analytics
Synthetic data enables scalable training of VLMs for smart city analytics by generating diverse, privacy-safe video scenarios across urban environments.
Simulated traffic situations, pedestrian behavior, and rare events allow models to interpret, reason about, and summarize CCTV footage more effectively. This enables cities and operators to deploy AI systems for improved traffic monitoring, safety analysis, and urban operations faster.
Experience Synthetic Video Data Generation with the AVES COSMOS Control UI
Seen the demo – now let’s explore how it works for your use case!
Want to Build Your Own Workflow? - Get Started Now!
Frequently Asked Questions
Why not train directly on raw simulator output?
Even high-quality simulator 3D environments renders still look synthetic. That gap in texture, lighting, weather, and sensor appearance can reduce performance when models trained on simulator data are deployed in real environments.
What kinds of variation can be generated?
The workflow supports variations across lighting, weather, and road-surface conditions. Currently there are 18 different preset augmentation types, including sunrise, night, fog, rain, snow, dry roads, and puddles. Additional augmentation types can be prompted.
How is ground truth preserved if the visuals change?
Ground truth is generated in simulation first, then augmented visually afterward. The workflow is designed to preserve scene structure, anomaly behavior, and annotations while changing appearance, which is exactly why the pipeline is created for synthetic training data rather than just visual content generation.
Why is georeferencing important?
Because it ties synthetic data to real deployment conditions. AVES generates simulation-ready digital twins from satellite and aerial data so models can be trained on worlds that reflect actual roads, junctions, and camera viewpoints, not just abstract test environments.
How does the workflow work?
The workflow starts with user requirements such as region, ODD, geo-location, and camera position. AVES Reality generates the virtual environment as a digital twin, CARLA simulates the scenario and produces ground-truth outputs, and NVIDIA Cosmos Transfer turns that base simulation into photorealistic variants for scalable dataset generation.
Can customers define their own location and camera setup?
Yes. As the AVES Launcher is integrated in the AVES Cosmos Control UI, it allows starting from user-defined specifications such as region, ODD, and geo-location. CARLA simulator as next step allows for camera position selection.
What comes out of CARLA before augmentation?
CARLA produces the controllable simulation layer and ground truth videos. These include RGB, depth, semantic and instance segmentation, edge maps, normals, 2D and 3D bounding-box overlays, and object or collision metadata.
What does NVIDIA Cosmos Transfer add?
NVIDIA Cosmos Transfer converts synthetic simulator outputs into photorealistic, diverse datasets. Its role is to expand one scenario into many realistic visual conditions while preserving the ground truth, structure, and annotations of the original simulation.
What does AVES Reality contribute to the Synthetic Data Generation workflow?
AVES Reality provides the georeferenced base layer: HD roadmaps, 3D world geometry, and semantic structure generated from geodata. That makes the pipeline location-specific from the start, rather than relying on generic synthetic scenes.
What problem does AVES Synthetic Data Generation solve?
Simulator only data gives you clean labels and full control, but it often falls short in deployment because of three core issues: domain gap, limited visual realism, and the cost of scaling scenario diversity. The AVES SDG workflow including the AVES COSMOS Control UI closes that sim2real gap, while it scales scenario diversity and boosts photorealism.
How is the AVES COSMOS Control UI deployed and accessible?
The AVES COSMOS Control UI is currently available as a developer release and may involve toolchain setup depending on your target environment. As deployment can include simulator integration and environment-specific configuration, we currently offer access through dedicated solution discussions. Contact us to explore the right setup for your workflow and infrastructure.
Your question wasn’t answered?