Week 7: Sim2Real Pipelines and IsaacLab

Purpose of this lecture#

Reinforcement learning for robotics almost never happens directly on physical hardware at scale. The sample efficiency constraints examined in Week 6 — thousands of hours of robot time for millions of transitions — make direct hardware training impractical for all but the simplest tasks. The practical answer is simulation: train in a virtual environment that approximates the physics of the real world, then transfer the learned policy to hardware.

This is the sim2real problem — not a single technique but a full engineering pipeline spanning physics modeling, simulation infrastructure, domain randomization, and system identification. Getting it right determines whether the policy works on the real robot. Getting it wrong produces policies that are expert simulators and useless manipulators. This lecture examines the problem from both ends: what properties of simulation make transfer possible, and what failure modes make it fail.

Why simulation is indispensable#

The asymmetry between simulation and hardware is stark. A modern GPU-accelerated simulator running on a cluster can execute tens of thousands of parallel environment instances, collecting millions of transitions per minute. The same experiment on a single physical robot would require years. Simulation enables the data volumes that modern RL algorithms require while providing safe, free-reset, repeatable environments where exploration cannot damage hardware or injure operators.

But simulation is an approximation. Every simulated environment encodes assumptions — about rigid-body dynamics, friction models, actuator behavior, sensor noise, and contact resolution — that diverge from physical reality in ways that the simulator designer may not anticipate or control. A policy learned in a simulator that models friction as a simple Coulomb model will encounter stiction, viscoelasticity, and asymmetric friction at the physical robot and behave erroneously. The gap between simulated and real dynamics is the fundamental obstacle that all sim2real methods attempt to address.

The Isaac Sim and IsaacLab stack#

NVIDIA's Isaac Sim is a high-fidelity robotics simulator built on the Omniverse USD (Universal Scene Description) platform, using PhysX as its physics engine for rigid-body and articulated dynamics and Hydra for photorealistic rendering. The key features for robot learning are GPU acceleration of both physics simulation and rendering, articulated robot support with configurable joint models, and tight integration with ROS 2 and standard robot learning frameworks.

The physics engine in Isaac Sim uses a position-based dynamics (PBD) solver for contact and a Featherstone recursive dynamics algorithm for articulated bodies. GPU parallelism is achieved by batching many simulation instances as a single tensor computation: joint positions, velocities, and applied torques for $N$ parallel environments are represented as $[N, n_{\text{dof}}]$ tensors and updated simultaneously in a single GPU kernel call. At high parallelization factors ( $N \geq 1000$ ), the per-environment marginal cost approaches zero, enabling asymptotic data collection rates that make billion-step training feasible.

IsaacLab (formerly Isaac Gym, then Isaac Orbit) is the robot learning framework built on top of Isaac Sim. It provides task templates and environment interfaces that implement the OpenAI Gym API, built-in domain randomization utilities for physics and appearance parameters, a manager-based architecture that separates task logic, observation construction, and reward computation for modularity, and integration with standard RL libraries (RSL-RL, RL-games, Stable Baselines3).

The central technical mechanism enabling IsaacLab's parallelism is tensor-based vectorized computation. All $N$ parallel environment instances share a single physics simulation, with their states represented as batched tensors. Joint positions, velocities, and forces for all environments are stored in $[N, n_{\text{dof}}]$ tensors that are updated simultaneously by a single PhysX GPU kernel call. Observation construction is performed by slicing and transforming these state tensors entirely on the GPU — no data is copied to CPU between physics steps and policy inference. Reward computation is similarly vectorized: the reward function is evaluated as an elementwise operation on the $[N]$ -dimensional state and action tensors, returning a vector of $N$ scalar rewards in a single GPU kernel. Episode termination conditions (joint limit violations, task completion, maximum episode length) are evaluated in parallel as boolean masks on the state tensor, and terminated environments are reset by writing new initial conditions into the corresponding tensor rows without interrupting the other $N-1$ simulations. This design means the marginal cost of adding one more parallel environment is approximately zero as long as the GPU has remaining capacity — enabling training at $N = 4096$ or higher parallel instances on a single high-end GPU.

Robot modeling: assets and fidelity#

Simulation fidelity begins with the digital twin — the robot model that the simulator uses to compute dynamics. For robots with available URDF (Unified Robot Description Format) or USD files, importing is straightforward; the critical parameters are joint mass and inertia tensors, actuator models (motor gain, back-EMF, friction), joint limits, and collision geometry.

Mass and inertia errors compound through the dynamics equations. If the simulated link inertia is incorrect by 10%, the simulated response to a torque command will be incorrect by a corresponding amount, and the policy's learned dynamics model will not match the physical robot. Obtaining accurate inertia parameters requires either CAD models of the robot's mechanical components (available from manufacturers for most platforms) or experimental identification through system identification experiments.

Actuator models are particularly important for sim2real transfer. Physical servo actuators exhibit position-dependent friction, current-dependent torque limits, thermal derating, and communication delays (from motor controller to computer and back). A simulation that models actuators as ideal torque sources with instantaneous response will produce policies that assume perfect torque control authority — authority the physical actuator cannot provide in contact-rich situations where the motor is near its torque limit.

The sim2real gap: sources and characterization#

The sim2real gap is the distribution shift between the dynamics experienced in simulation and those experienced on physical hardware. Its sources are diverse:

Unmodeled actuator dynamics include transmission flexibility (backlash, cable stretch in cable-driven robots), motor friction (position-dependent, velocity-dependent, and static), and thermal effects that change motor constants as the robot warms up during use. These effects are small individually but accumulate over a trajectory to produce systematic deviations from the simulated path.

Contact and friction modeling is the single hardest problem in physics simulation for manipulation. Physical contacts involve micro-scale surface deformation, viscoelastic material response, adhesion, and velocity-dependent friction behavior (Stribeck curve) that standard rigid-body simulators approximate with a single friction coefficient. A grasping policy trained with simulated friction coefficient $\mu = 0.5$ will fail to transfer when the physical friction is $\mu = 0.3$ unless the policy's learned grasp is robust to this variation.

Sensor noise and delays differ between simulation and hardware. Real sensors exhibit correlated noise, bias drift, and saturation behavior that ideal Gaussian noise models in simulation do not capture. The round-trip delay from commanding a joint position to reading the resulting encoder state is typically 1–5 ms on physical hardware, introducing a phase lag that destabilizes policies designed for zero-delay simulation.

Rendering gaps affect visual policies: real RGB images contain specular highlights, shadow variation, motion blur, and lens distortion not present in simulated renders. A policy trained on simulation renders will encounter a visually distinct domain at deployment, which can cause substantial degradation in visuomotor policies.

Domain randomization#

The principal strategy for bridging the sim2real gap is domain randomization: instead of trying to make the simulator accurate, make it variable enough that the real world is one of the cases the policy was trained on. If the policy performs well across a distribution of simulated environments with randomized physics parameters, visual appearances, and noise characteristics, and if the real environment lies within that distribution, the policy will transfer.

| Dynamics | Visual | Sensors & Delay | | --- | --- | --- | | Randomizing masses, friction, and gains so the policy learns to compensate for physical uncertainty across episodes. | Varying lighting, textures, and camera poses to prevent the policy from overfitting to specific simulation graphics. | Injecting noise and latency that mimic real-world hardware jitter and communications overhead. |

Dynamics randomization perturbs the physical parameters of the simulation on each episode reset: link masses are drawn from intervals around their nominal values, friction coefficients are sampled uniformly from a range, actuator gains are multiplied by random scalars, and communication delays are drawn from a distribution. The policy must learn behaviors that are robust to this variation — behaviors that succeed not by exploiting specific physics values but by closed-loop correction using sensor feedback.

Appearance randomization (visual domain randomization) perturbs the visual rendering on each episode: object textures are replaced with random RGB images or procedural patterns, lighting direction and intensity are randomized, camera pose is perturbed, and background geometry is varied. Visual policies trained under aggressive appearance randomization develop feature representations that are robust to appearance changes because none of the specific appearance details are reliable — the policy must rely on shape and geometry invariants that transfer to the real domain.

Structured randomization is more targeted: instead of randomizing all parameters uniformly, it focuses randomization on the parameters that are most uncertain (poor system identification) and narrows randomization ranges for parameters that are well-characterized. This concentrates the robustness budget where it is most needed.

The theoretical justification for domain randomization is rooted in a formal expected return objective. Let $\theta$ denote the vector of physical parameters (masses, friction coefficients, actuator gains, delays) sampled from the randomization distribution $p_{\text{rand}}(\theta)$ . For a given set of parameters $\theta$ , the environment transitions follow $p(\tau \mid \pi, \theta)$ — the distribution over trajectories $\tau = (s_0, a_0, r_0, s_1, \ldots)$ induced by executing policy $\pi$ in the environment with parameters $\theta$ . The domain randomization training objective is the expected return integrated over the parameter distribution:

J(\pi) = \mathbb{E}_{\theta \sim p_{\text{rand}}}\,\mathbb{E}_{\tau \sim p(\cdot \mid \pi, \theta)}\!\left[\sum_{t=0}^T r_t\right]

Maximizing $J(\pi)$ produces a policy that performs well on average across the randomization distribution. The guarantee is: let $p_{\text{real}}(\theta)$ be the distribution of physical parameters in the real world (unknown) and $p_{\text{rand}}(\theta)$ be the simulation's randomization distribution. If $p_{\text{real}} \ll p_{\text{rand}}$ (the real distribution is absolutely continuous with respect to the randomization distribution), then a policy that maximizes $J(\pi)$ in simulation is at least as good as any non-adaptive policy in the real world. The requirement that $p_{\text{real}} \ll p_{\text{rand}}$ motivates using wide randomization ranges — but wide ranges also make $J(\pi)$ harder to maximize because the policy must succeed under the hardest physics configurations. This tension between coverage width and training tractability is the central design tradeoff in domain randomization.

System identification#

Domain randomization solves the transfer problem at the cost of training efficiency: a policy that must succeed for all friction coefficients in $[0.1, 0.9]$ is harder to train than one that only needs to succeed for $\mu = 0.5$ . System identification reduces this cost by estimating the real robot's physical parameters from hardware experiments, allowing the simulation to be centered at the identified values before applying randomization.

Classical system identification for articulated robots collects torque-position-velocity data from controlled joint movements and fits parametric models for mass, inertia, friction, and actuator constants. The fitting procedure minimizes the mismatch between simulated and real joint trajectories under the same input sequence:

\hat{\theta} = \arg\min_\theta \sum_{t} \| q_{\text{real},t} - q_{\text{sim},t}(\theta) \|^2

where $\theta$ parameterizes the robot model. The identified parameters are then used as the nominal simulation values, with randomization applied as perturbations around these identified values. The randomization range reflects residual uncertainty in the identification.

The relationship between system identification and domain randomization is best understood as a sequential variance management strategy. Before any hardware experiments, $p_{\text{rand}}(\theta)$ has high variance — physical parameters are set to broad default ranges to ensure coverage but at the cost of training difficulty. System identification narrows this distribution by concentrating $p_{\text{rand}}(\theta)$ around the identified values $\hat{\theta}$ : the post-identification randomization range for each parameter $\theta_i$ shrinks from the prior interval $[\theta_i^{\text{low}}, \theta_i^{\text{high}}]$ to a tighter interval $[\hat{\theta}_i - \delta_i, \hat{\theta}_i + \delta_i]$ where $\delta_i$ reflects residual identification uncertainty. This narrowing directly reduces the variance of the training objective $J(\pi)$ under $p_{\text{rand}}$ , making it easier to optimize.

Adaptive domain randomization (ADR) then applies the reverse operation: it starts with the identified, narrow $p_{\text{rand}}(\theta)$ and progressively expands the ranges as the policy demonstrates robustness. When the policy achieves success rate above a threshold on the current distribution, the boundaries of $p_{\text{rand}}$ are widened by a small increment. When success rate drops below a lower threshold, the boundaries are contracted. This provides a curriculum over the variance of $p_{\text{rand}}$ that keeps the training difficulty in the productive range throughout training — and ultimately reaches the broad coverage needed for the $p_{\text{real}} \ll p_{\text{rand}}$ guarantee. The interplay between System ID (narrowing initial variance to center the distribution) and ADR (progressively expanding variance to achieve coverage) is the principled sim2real strategy employed in systems like OpenAI's Dactyl and NVIDIA's Isaac-trained locomotion policies.

Randomization schedules and training stability#

Starting training with maximum randomization from the beginning often stalls learning: if friction can be anywhere in $[0.05, 1.0]$ and delays can be anywhere in $[0, 50 \text{ms}]$ simultaneously, the policy encounters effectively independent dynamics on every episode and cannot build stable behavioral patterns. A randomization curriculum starts with narrow randomization around identified nominal values and progressively widens the ranges as the policy demonstrates robustness at each level.

The interaction between randomization strength and policy architecture is non-trivial. Policies that represent the domain parameters as part of the observation — either by directly observing them (unrealistic) or by inferring them from a history of state-action-observation transitions (domain randomization with history, DR+H) — can adapt their behavior to the specific parameters in the current episode, achieving better performance than policies that must be robust to all parameters simultaneously. This adaptive approach requires a policy with memory (recurrent or attention-based) that can maintain an implicit estimate of the current domain parameters.

GenAI context: sim2real as structured data augmentation#

The analogy between domain randomization and data augmentation in vision-language pretraining is precise and instructive.

| Robotics sim2real | Vision/language pretraining | |---|---| | Domain randomization | Data augmentation (crop, flip, color jitter) | | System identification | Dataset curation and filtering | | Sim2real gap | Covariate shift between training and test distribution | | Appearance randomization | Rendering/style variation in synthetic data | | Adaptive randomization | Curriculum or difficulty-aware sampling |

Both fields have converged on the same insight: training for distributional robustness is more effective than training for point-estimate accuracy. A policy trained on a narrow, perfectly simulated distribution fails when the real world deviates from that point; a policy trained on a broad, randomized distribution is robust to the inevitable deviation. The synthesis in both fields is the same: use structured domain knowledge (system identification, dataset curation) to center the distribution, then use randomization (domain randomization, augmentation) to broaden it.

Key takeaways#

Simulation is the primary training environment for robot RL because it provides unlimited, fast, safe data collection. Isaac Sim provides GPU-accelerated physics and rendering with parallel environment support; IsaacLab wraps this with robot learning infrastructure. The sim2real gap arises from unmodeled actuator dynamics, imprecise contact models, sensor noise and delay mismatches, and visual domain differences. Domain randomization addresses the gap by training policies over a distribution of simulated environments broad enough to contain the real-world configuration. System identification narrows the randomization domain to efficient ranges by estimating physical parameters from hardware experiments. Randomization curricula that widen the randomization range progressively maintain training stability while eventually reaching broad coverage. The design principle — robustness over accuracy — is the same principle underlying data augmentation in vision and language model training.

Conceptual questions#

A legged locomotion policy trained in Isaac Sim achieves near-perfect performance across 1000 parallel simulation instances but fails immediately on the physical robot, exhibiting oscillatory joint motions at the hip. Post-hoc analysis reveals that the simulation modeled actuators as ideal torque sources with zero delay, while the physical actuators have 3 ms communication delay and 15% torque saturation at high velocities. Explain exactly how these two modeling errors would interact to produce oscillatory motion. What changes to the simulation model and domain randomization schedule would prevent this failure?
The theoretical guarantee for domain randomization requires that the real-world parameter distribution $p_{\text{real}}(\theta)$ is absolutely continuous with respect to the randomization distribution $p_{\text{rand}}(\theta)$ . For a contact-rich grasping task, identify at least three physical parameters where this condition might fail in practice (i.e., the real parameter lies outside the randomization range), explain why it is difficult to discover these failures before deployment, and propose diagnostic procedures to detect range violations before physical testing.
System identification estimates physical parameters by minimizing the mismatch between simulated and real trajectories under controlled inputs. Explain why the identified parameters from quasi-static joint movements (slow, low-acceleration motions) may not produce accurate simulation of the high-acceleration, contact-rich motions used during manipulation RL training. How would you design an identification experiment specifically targeting the dynamic regime used during training? What parameter correlation structure would make the identification problem ill-conditioned?
A team uses adaptive domain randomization (ADR) to train a bin-picking policy. During training, the friction coefficient range expands progressively from $[0.4, 0.6]$ to $[0.1, 0.9]$ as the policy improves. After deployment, the physical friction is measured at $0.35$ — within the final randomization range — but the policy's success rate is only 40%, despite 85% success in simulation at $\mu = 0.35$ . Diagnose this failure. Consider the interaction between friction and the other parameters being simultaneously randomized, and identify which parameter combination in the randomization distribution is most likely underrepresented near $\mu = 0.35$ .
Visual domain randomization replaces simulation textures with random RGB patterns. A visuomotor policy trained with aggressive visual randomization achieves robust transfer but requires significantly more training steps than a policy trained with photorealistic simulation renders. Propose a training curriculum that achieves the final robustness of visual randomization while matching the early learning speed of photorealistic training. What theoretical property of the policy's learned visual representation changes between the two approaches, and how would you measure this?

Solutions

Delay + saturation → oscillation. The 3 ms delay adds phase lag while the 15% high-velocity torque saturation drops the effective loop gain nonlinearly; a policy tuned to ideal zero-delay torque sources is effectively high-gain, so the added lag pushes the hip loop past its stability margin into a limit cycle, which saturation then sustains. Fix by modeling actuator delay, saturation, and first-order actuator dynamics in sim and randomizing delay, torque limits, and gains.
Absolute-continuity failures. Real values can fall outside the randomization range for object friction, mass/inertia of novel objects, contact stiffness/restitution, and sensor latency. They are hard to catch pre-deployment because the true value is unknown and sim success masks the gap. Diagnostics: measure parameters directly, compare real-vs-sim rollouts under known inputs (system-ID residuals), and flag when real trajectories land in low-density regions of the sim distribution.
Quasi-static system ID. Slow motions identify kinematics, gravity, and static friction but excite none of the velocity/acceleration-dependent terms (damping, Coriolis, motor friction at speed, contact dynamics), so the fitted parameters mispredict high-acceleration contact motion. Design persistently-exciting, high-acceleration trajectories spanning the training velocity range and including contact events. Ill-conditioning arises when parameters are correlated (e.g., mass and link length, or friction and damping) so trajectories cannot separate them.
ADR friction failure. Success in sim at $\mu=0.35$ yet real failure means another co-randomized parameter combination near $\mu=0.35$ is underrepresented: ADR expands ranges on aggregate performance, so the joint distribution can be sparse at (low friction, a specific mass/stiffness) even though the friction marginal covers 0.35. The low-friction region was likely only ever sampled alongside otherwise-easy parameters, so the policy never trained on the real combination.
Visual-randomization curriculum. Start with photorealistic renders so the policy quickly learns control with clean perception, then progressively increase texture/lighting randomization to harden the visual features. The learned representation shifts from texture-dependent (overfit to render appearance) to texture-invariant/shape-based features; measure this via representation similarity across randomized views or the probing accuracy of frozen features on held-out appearances.

Looking ahead#

With simulation providing training scale and domain randomization providing robustness, the next question is what architectural and algorithmic designs best leverage large-scale simulated and real-world demonstration data. The answer has shifted from standard actor-critic architectures to transformer-based foundation models trained on massive multi-task datasets.

Week 8: Foundation Models for Manipulation — ACT and Action Chunking. We examine the Action Chunking Transformer architecture, the role of temporal action chunking in reducing the effective decision frequency, and how sequence modeling fundamentally changes the way policies represent and generate multi-step manipulation behaviors.

Purpose of this lecture#

Why simulation is indispensable#

The Isaac Sim and IsaacLab stack#

Robot modeling: assets and fidelity#

The sim2real gap: sources and characterization#

The sim2real gap is the distribution shift between the dynamics experienced in simulation and those experienced on physical hardware. Its sources are diverse:

Domain randomization#

J(\pi) = \mathbb{E}_{\theta \sim p_{\text{rand}}}\,\mathbb{E}_{\tau \sim p(\cdot \mid \pi, \theta)}\!\left[\sum_{t=0}^T r_t\right]

System identification#

\hat{\theta} = \arg\min_\theta \sum_{t} \| q_{\text{real},t} - q_{\text{sim},t}(\theta) \|^2

Randomization schedules and training stability#

GenAI context: sim2real as structured data augmentation#

The analogy between domain randomization and data augmentation in vision-language pretraining is precise and instructive.

Key takeaways#

Conceptual questions#

A legged locomotion policy trained in Isaac Sim achieves near-perfect performance across 1000 parallel simulation instances but fails immediately on the physical robot, exhibiting oscillatory joint motions at the hip. Post-hoc analysis reveals that the simulation modeled actuators as ideal torque sources with zero delay, while the physical actuators have 3 ms communication delay and 15% torque saturation at high velocities. Explain exactly how these two modeling errors would interact to produce oscillatory motion. What changes to the simulation model and domain randomization schedule would prevent this failure?
The theoretical guarantee for domain randomization requires that the real-world parameter distribution $p_{\text{real}}(\theta)$ is absolutely continuous with respect to the randomization distribution $p_{\text{rand}}(\theta)$ . For a contact-rich grasping task, identify at least three physical parameters where this condition might fail in practice (i.e., the real parameter lies outside the randomization range), explain why it is difficult to discover these failures before deployment, and propose diagnostic procedures to detect range violations before physical testing.
System identification estimates physical parameters by minimizing the mismatch between simulated and real trajectories under controlled inputs. Explain why the identified parameters from quasi-static joint movements (slow, low-acceleration motions) may not produce accurate simulation of the high-acceleration, contact-rich motions used during manipulation RL training. How would you design an identification experiment specifically targeting the dynamic regime used during training? What parameter correlation structure would make the identification problem ill-conditioned?
A team uses adaptive domain randomization (ADR) to train a bin-picking policy. During training, the friction coefficient range expands progressively from $[0.4, 0.6]$ to $[0.1, 0.9]$ as the policy improves. After deployment, the physical friction is measured at $0.35$ — within the final randomization range — but the policy's success rate is only 40%, despite 85% success in simulation at $\mu = 0.35$ . Diagnose this failure. Consider the interaction between friction and the other parameters being simultaneously randomized, and identify which parameter combination in the randomization distribution is most likely underrepresented near $\mu = 0.35$ .
Visual domain randomization replaces simulation textures with random RGB patterns. A visuomotor policy trained with aggressive visual randomization achieves robust transfer but requires significantly more training steps than a policy trained with photorealistic simulation renders. Propose a training curriculum that achieves the final robustness of visual randomization while matching the early learning speed of photorealistic training. What theoretical property of the policy's learned visual representation changes between the two approaches, and how would you measure this?

Solutions

Delay + saturation → oscillation. The 3 ms delay adds phase lag while the 15% high-velocity torque saturation drops the effective loop gain nonlinearly; a policy tuned to ideal zero-delay torque sources is effectively high-gain, so the added lag pushes the hip loop past its stability margin into a limit cycle, which saturation then sustains. Fix by modeling actuator delay, saturation, and first-order actuator dynamics in sim and randomizing delay, torque limits, and gains.
Absolute-continuity failures. Real values can fall outside the randomization range for object friction, mass/inertia of novel objects, contact stiffness/restitution, and sensor latency. They are hard to catch pre-deployment because the true value is unknown and sim success masks the gap. Diagnostics: measure parameters directly, compare real-vs-sim rollouts under known inputs (system-ID residuals), and flag when real trajectories land in low-density regions of the sim distribution.
Quasi-static system ID. Slow motions identify kinematics, gravity, and static friction but excite none of the velocity/acceleration-dependent terms (damping, Coriolis, motor friction at speed, contact dynamics), so the fitted parameters mispredict high-acceleration contact motion. Design persistently-exciting, high-acceleration trajectories spanning the training velocity range and including contact events. Ill-conditioning arises when parameters are correlated (e.g., mass and link length, or friction and damping) so trajectories cannot separate them.
ADR friction failure. Success in sim at $\mu=0.35$ yet real failure means another co-randomized parameter combination near $\mu=0.35$ is underrepresented: ADR expands ranges on aggregate performance, so the joint distribution can be sparse at (low friction, a specific mass/stiffness) even though the friction marginal covers 0.35. The low-friction region was likely only ever sampled alongside otherwise-easy parameters, so the policy never trained on the real combination.
Visual-randomization curriculum. Start with photorealistic renders so the policy quickly learns control with clean perception, then progressively increase texture/lighting randomization to harden the visual features. The learned representation shifts from texture-dependent (overfit to render appearance) to texture-invariant/shape-based features; measure this via representation similarity across randomized views or the probing accuracy of frozen features on held-out appearances.

Purpose of this lecture#

Why simulation is indispensable#

The Isaac Sim and IsaacLab stack#

Robot modeling: assets and fidelity#

The sim2real gap: sources and characterization#

Domain randomization#

System identification#

Randomization schedules and training stability#

GenAI context: sim2real as structured data augmentation#

Key takeaways#

Conceptual questions#

Looking ahead#

Further reading#

Week 7: Sim2Real Pipelines and IsaacLab

Purpose of this lecture#

Why simulation is indispensable#

The Isaac Sim and IsaacLab stack#

Robot modeling: assets and fidelity#

The sim2real gap: sources and characterization#

Domain randomization#

System identification#

Randomization schedules and training stability#

GenAI context: sim2real as structured data augmentation#

Key takeaways#

Conceptual questions#

Looking ahead#

Further reading#