Week 12: Safety, Constraints, and Reliability

Purpose of this lecture#

Learning-based robot policies have reached a level of capability where they can perform complex manipulation tasks with high success rates under nominal conditions. But nominal conditions are not all conditions. Real-world deployment exposes policies to sensor failures, mechanical wear, environmental variation, and the full tail of distributions that training datasets systematically underrepresent. In these off-nominal conditions, an unconstrained learned policy may take irreversible, damaging, or dangerous actions — not because it is generally poor, but because it is confident in regions where its training provides no evidence.

Safety in robot learning is therefore not a property of the algorithm alone. It is a property of the system: the combination of the learned policy, the sensing and actuation hardware, the safety filters and monitors layered on top of the policy, and the fallback mechanisms invoked when those monitors detect anomalies. This lecture develops the technical foundations for each layer of this safety architecture.

Safety as a systems problem#

Safety failures in deployed robot systems rarely trace to a single failure mode. They emerge from the interaction between multiple imperfect components: a policy that has learned a slightly incorrect dynamics model, a sensor that drifts over time, a contact geometry that is slightly different from training, and a control loop that has no mechanism to detect that these errors have compounded to a dangerous extent.

This systems character has a fundamental implication: safety cannot be fully learned. A policy that learns safe behavior by observing demonstrations will inherit the safety assumptions implicit in those demonstrations. Demonstrations collected by skilled human operators avoid dangerous configurations because the operator's judgment prevents them; the policy learns to avoid those configurations on its memorized training trajectories but has no explicit model of why they are dangerous or how to avoid them in novel configurations. Safety must therefore be enforced outside the learned policy, through formal constraints and monitors that are designed against the robot's physical specifications rather than inferred from data.

The three technical mechanisms that implement this principle are: (1) safety filters that modify or reject unsafe actions before execution; (2) Control Barrier Functions that provide formal guarantees of constraint satisfaction; and (3) runtime monitors that detect anomalous behavior and trigger fallback mechanisms before failures occur.

Safety filters and shielded policies#

A safety filter is a component positioned between the policy output and the robot's actuation layer. At each control cycle, the safety filter receives the policy's proposed action $\pi_\theta(s_t)$ , evaluates whether it satisfies all active constraints, and either passes the action unchanged (if safe) or replaces it with the nearest safe action (if unsafe).

The minimal safety filter enforces workspace and hardware limits: joint position limits $q_i \in [q_i^{\min}, q_i^{\max}]$ , velocity limits $|\dot{q}_i| \leq \dot{q}_i^{\max}$ , and torque limits $|\tau_i| \leq \tau_i^{\max}$ . These constraints reflect the physical specifications of the actuators and are hard: violating them can damage the robot mechanically or injure operators. Enforcing them at the filter level rather than trusting the policy to respect them is essential because the policy has no mechanism to guarantee constraint satisfaction — it can only approach the constraint set asymptotically as training progresses, never formally guarantee it.

More sophisticated safety filters enforce collision avoidance and workspace exclusion zones. Forward kinematics maps the joint configuration to a set of link poses, and collision checking evaluates whether any link pose intersects a forbidden zone (table surface, operator exclusion volume, adjacent equipment). If the policy's proposed action would move a link into a forbidden zone, the filter projects the action to the boundary of the permissible workspace. Efficient collision checking using signed distance fields or geometric primitives can run at 1 kHz control rates.

Shielded policies extend the safety filter concept to multi-step lookahead: rather than evaluating safety only at the immediately proposed action, the shield evaluates whether the proposed action leads to a state from which future safe actions exist. An action can be immediately safe but put the robot in a configuration from which the only available actions are unsafe (a safety funnel). Shield-based methods use a classical safe controller — often a manually designed impedance controller or model-predictive safety filter — to guarantee that safe funnels are avoided.

Control Barrier Functions#

Control Barrier Functions (CBFs; Ames et al., 2014) provide a formal framework for safety constraint enforcement with proven guarantees. The CBF framework defines safety as forward invariance of a safe set $\mathcal{C}$ and provides a computationally efficient method to enforce this invariance.

| Safe Set $\mathcal{C}$ | Barrier Condition | Class- $\mathcal{K}$ | | --- | --- | --- | | The region of the state space where safety constraints are satisfied ( $h(x) \geq 0$ ). | The time derivative $\dot{h}$ must exceed a lower bound that approaches zero as the state nears the boundary. | Functions $\kappa$ that govern how the safety margin decays, ensuring the state never crosses the boundary. |

CBF Definition#

For a control-affine dynamical system $\dot{x} = f(x) + g(x)u$ , a function $h: \mathcal{X} \to \mathbb{R}$ is a Control Barrier Function for the safe set $\mathcal{C} = \{x : h(x) \geq 0\}$ if there exists a class- $\mathcal{K}$ function $\kappa$ such that:

\sup_{u \in \mathcal{U}} \left[\dot{h}(x, u)\right] \geq -\kappa(h(x)) \quad \forall x \in \mathcal{C}

where $\dot{h}(x, u) = \nabla_x h(x)^\top (f(x) + g(x)u)$ is the time derivative of $h$ along the system trajectory. The CBF condition guarantees: if $h(x_0) \geq 0$ (the system starts safe), then applying any control that satisfies the CBF condition maintains $h(x_t) \geq 0$ for all $t \geq 0$ (the system remains safe forever).

CBF-QP Implementation#

The CBF-QP is the standard implementation: at each timestep, find the minimum perturbation to the policy's proposed action $u^* = \pi_\theta(s)$ that satisfies the CBF constraint:

u_{\text{safe}} = \arg\min_{u} \| u - u^* \|^2 \quad \text{s.t.} \quad \dot{h}(x, u) + \kappa(h(x)) \geq 0

This is a convex quadratic program solvable in microseconds for single constraints. The CBF-QP modifies $u^*$ minimally while formally guaranteeing safety. If the nominal policy $\pi_\theta$ is near-safe (proposes actions that violate the CBF constraint by a small margin), the CBF-QP correction is small and the overall behavior is close to the unconstrained policy.

Higher-Relative-Degree CBFs#

Higher-relative-degree CBFs handle constraints involving the state's higher derivatives. For a manipulator with joint position limits, the constraint $h(q) = q^{\max} - q$ has relative degree 2: the control input (joint torque $\tau$ ) appears only in $\ddot{h}$ , not in $\dot{h}$ . To see this, note that $\dot{h} = -\dot{q}$ depends only on velocity, while $\ddot{h} = -\ddot{q} = -M(q)^{-1}(\tau - C(q,\dot{q})\dot{q} - g(q))$ is where $\tau$ enters. A first-order CBF condition on $h$ directly cannot be satisfied in general because $\dot{h}$ is independent of $\tau$ ; the robot cannot immediately stop its velocity command. HOCBFs address this by constructing an intermediate function $\psi_1$ :

\psi_0 = h(q), \qquad \psi_1 = \dot{\psi}_0 + \alpha_1(\psi_0) = -\dot{q} + \alpha_1(q^{\max} - q)

and then requiring

\dot{\psi}_1(q, \dot{q}, \tau) + \alpha_2(\psi_1) \geq 0

which, after substituting $\ddot{q}$ , becomes a linear inequality in $\tau$ — preserving the convexity of the CBF-QP. The cascade of $\alpha_1, \alpha_2$ class- $\mathcal{K}$ functions acts as a two-stage deceleration requirement: $\alpha_1$ enforces that the system's velocity must be consistent with future constraint satisfaction, and $\alpha_2$ enforces the torque must decelerate the joint in time. This is necessary for any position constraint on a torque-controlled arm, making HOCBFs the standard formulation for hardware-limit enforcement in manipulation systems.

In the context of learned VLA policies, CBFs provide a guarantee layer that does not depend on the quality of the learned policy: regardless of what the policy proposes, the CBF-QP ensures the robot never violates the defined safety constraints. The policy can be retrained, updated, or replaced without modifying the CBF layer, and the safety guarantee remains valid as long as the CBF and system model are accurate.

Failure modes: covariate shift, rare contacts, and latency#

Understanding the mechanisms of deployment failure enables targeted countermeasures.

Covariate Shift#

Covariate shift is the primary failure mode for learned robot policies. The policy is trained on a distribution of observations $d_{\text{train}}(o)$ and deployed on a distribution $d_{\text{deploy}}(o)$ that differs due to environmental changes (different lighting, different objects, different table height), sensor drift, or mechanical wear. In regions where $d_{\text{deploy}}(o)$ has high probability and $d_{\text{train}}(o)$ has low probability, the policy's predictions are extrapolations from sparse training data and are likely to be incorrect — often confidently so.

Detecting covariate shift requires monitoring the policy's input distribution at runtime. For VLA policies with a pretrained vision-language backbone, a particularly effective approach is Mahalanobis distance computed in the VLM's latent embedding space. Let $z = \phi(o) \in \mathbb{R}^d$ be the latent embedding of the current observation under the frozen VLM encoder. From the training dataset, precompute the mean $\mu_{\text{train}}$ and covariance $\Sigma_{\text{train}}$ of the embedding distribution. At runtime, compute:

d_M(z) = \sqrt{(z - \mu_{\text{train}})^\top \Sigma_{\text{train}}^{-1} (z - \mu_{\text{train}})}

A threshold $d_M(z) > \tau_{\text{shift}}$ triggers a covariate shift alert. The Mahalanobis distance is superior to Euclidean distance in this context because the VLM's latent space is anisotropic — different dimensions encode features of very different variabilities — and $\Sigma_{\text{train}}^{-1}$ normalizes for this anisotropy. In practice, using a diagonal approximation of $\Sigma_{\text{train}}$ (ignoring cross-dimension correlations) reduces the $O(d^2)$ storage and $O(d^2)$ computation to $O(d)$ , making the check real-time feasible at 50 Hz even for $d = 4096$ -dimensional embeddings. Density estimation methods (normalizing flows, Gaussian mixture models) provide more expressive covariate shift detection but at higher computational cost.

Rare Contact Events#

Rare contact events are systematically underrepresented in demonstration datasets. Datasets collected from skilled human operators contain few examples of unexpected collisions, near-singular grasps, or objects tipping unexpectedly — because the operators avoided them. A policy trained on this data will have poor predictions in these states and may respond to unexpected contact with actions that amplify the contact force rather than withdrawing from it. Force-torque monitoring (triggering on end-effector force magnitudes exceeding a threshold), collision detection through torque anomaly detection, and reactive withdrawal controllers that engage on unexpected contact all provide protection against rare contact failures.

Hardware Latency#

Hardware latency is a subtle failure mode that can destabilize closed-loop policies. A policy trained in simulation with zero control delay assumes that the action commanded at time $t$ is applied instantaneously at $t$ . On physical hardware, the round-trip from commanding an action to observing its effect may be 5–20 ms. A policy that is marginally stable at zero delay may be unstable at 10 ms delay, exhibiting the characteristic oscillatory behavior of a phase-lagged feedback loop. Latency-aware training (adding a randomized communication delay to the simulation environment) and predictive control (commanding actions based on a forward model's prediction of the next state) are standard mitigations.

Runtime monitoring and anomaly detection#

Runtime monitoring converts the implicit safety assumptions of a learned policy into explicit metrics that can be tracked and acted on. Effective monitoring systems track:

Action Magnitude and Rate of Change#

Action magnitude and rate of change: sudden large actions (high $\| u_t \|$ or high $\| u_t - u_{t-1} \|$ ) indicate policy instability or distributional shift. A monitoring threshold triggers a pause or fallback when these metrics exceed values observed during nominal training.

Safety Constraint Margins#

Safety constraint margins: the CBF function values $\{h_i(x_t)\}$ provide a real-time indicator of how far the robot is from each constraint boundary. Persistently low constraint margins (the robot is spending a lot of time near the limit) indicate that the policy has learned a trajectory that relies on being near the constraint — a pattern that is fragile to perturbation.

Policy Uncertainty#

Policy uncertainty: for policies that output probability distributions (diffusion policies with multiple samples, Bayesian policies with MC dropout), the entropy of the action distribution at the current state is an uncertainty estimate. High entropy indicates that the policy is uncertain about what to do — often corresponding to near-boundary states in the training distribution.

Task Progress#

Task progress: maintaining a model of expected task progress (the robot should have grasped the object by step 30, placed it by step 60) allows the monitor to detect task failures before the episode terminates. If progress is below expectation at a monitoring checkpoint, a fallback mechanism can be triggered before the failure propagates.

Fallback and recovery strategies#

When monitoring detects an anomaly, the system must degrade gracefully. The spectrum of fallback options ranges from minor adjustments to full recovery handoffs:

Action Clipping and Filtering#

Action clipping and filtering: the mildest fallback clips the proposed action to a conservative magnitude (half the nominal maximum) and applies temporal smoothing. This limits damage from a transient anomaly while allowing the policy to continue if the anomaly was benign.

Freeze-and-Notify#

Freeze-and-notify: the robot stops in place, holding its current position with the impedance controller, and alerts the operator. This is appropriate for anomalies that indicate potential collision risk or unexpected contact — stopping prevents the anomaly from becoming a failure.

Fallback Controller Engagement#

Fallback controller engagement: a pre-designed classical controller (impedance controller, joint-space PD with gravity compensation, Cartesian stiffness controller) takes over from the learned policy. The fallback controller is designed to be safe by construction but has limited task capability; it restores the robot to a known-safe configuration from which the learned policy can be re-engaged. Critically, the handoff must be smooth: abrupt switching from policy output to fallback output generates a discontinuous torque command that excites mechanical resonances and can itself cause a joint limit violation. A blending function interpolates between the two controllers over a finite transition window:

u_{\text{exec}}(t) = \bigl(1 - \alpha(t)\bigr)\, u_{\text{policy}}(t) \;+\; \alpha(t)\, u_{\text{fallback}}(t)

where $\alpha(t) \in [0, 1]$ ramps from 0 (full policy authority) to 1 (full fallback authority) over a transition duration $T_{\text{blend}}$ (typically 50–200 ms). A linear ramp $\alpha(t) = \min(t / T_{\text{blend}}, 1)$ is simple and effective; an exponential profile $\alpha(t) = 1 - e^{-t/\tau_{\text{blend}}}$ produces a smoother torque rate at handoff initiation. During the blending window, the commanded torque remains continuous and the robot's velocity evolves smoothly, preventing the impulsive loads that hard switching would produce. Symmetric blending on re-engagement (when the policy is re-engaged after a safe recovery) applies the same interpolation in reverse.

Task Restart#

Task restart: the robot returns to the start configuration, the episode counter increments, and the task is reattempted with the learned policy. This is appropriate for recoverable task failures (dropped object, missed grasp) where restarting is feasible.

GenAI context: safety architecture parallels#

The architecture of robot safety has direct analogs in the GenAI safety stack:

| Robot safety | GenAI safety | |---|---| | Safety filters on action output | Output filters for harmful content | | CBF-QP constraint satisfaction | Constraint-based decoding | | Runtime covariate shift detection | Hallucination / OOD detection | | Fallback to classical controller | Refusal / safe completion fallback | | Human takeover on anomaly | Human-in-the-loop escalation |

The shared principle is that safety is enforced outside the model. A language model that has been aligned via RLHF to avoid harmful outputs can still produce harmful outputs in sufficiently unusual prompt distributions; a robot policy that has been trained with safety demonstrations can still execute dangerous actions in unusual physical configurations. In both domains, safety guarantees require external enforcement mechanisms — output filters, CBFs, runtime monitors — that do not rely on the model being correct.

Key takeaways#

Safety in robot deployment is a systems problem requiring safety enforcement at multiple layers, not just good policy training. Safety filters intercept unsafe actions before execution; CBF-QP provides formal forward-invariance guarantees for defined constraint sets by solving a real-time quadratic program that minimally modifies the policy's proposed action. Covariate shift, rare contact events, and control latency are the primary failure modes in deployed manipulation systems; each has targeted monitoring and mitigation strategies. Runtime monitoring of action magnitude, constraint margins, policy uncertainty, and task progress enables early failure detection before damage occurs. Fallback mechanisms ranging from action clipping to full controller handoff provide graceful degradation. The principle that safety is enforced externally rather than learned internally applies equally to robot policies and language model alignment.

Conceptual questions#

A 7-DoF manipulator arm has a CBF defined for joint limit safety: $h_i(q) = (q_i^{\max} - q_i)(q_i - q_i^{\min})$ for each joint $i$ . The current joint $i = 3$ state is $q_3 = q_3^{\max} - 0.05$ rad, with $\dot{q}_3 = 0.3$ rad/s and the policy proposing $\ddot{q}_3 = 0.8$ rad/s². The CBF constraint requires $\dot{h}_3 + \kappa h_3 \geq 0$ with $\kappa = 2$ . Compute $\dot{h}_3$ for the proposed action and determine whether the CBF-QP will modify it. If modification is required, find the maximum $\ddot{q}_3$ that satisfies the constraint.
A manufacturing robot experiences sensor drift: the force-torque sensor at the wrist develops a 5N bias in the z-axis over a 4-hour shift. The safety monitor uses a threshold of 20N to detect unexpected contacts. Analyze how this drift affects the safety monitor's sensitivity and specificity over the shift. At what bias level does the monitor fail to detect a real 15N contact force? Propose a drift-compensation method and an alternative contact detection strategy that is robust to sensor bias.
A VLA policy is deployed in a kitchen environment. The training dataset contained no examples of a wet table surface (which reduces friction by 40%). The policy attempts to slide an object across the wet surface using a motion that was optimal on the dry surface, but the object drifts off-course. Describe the full failure cascade: starting from the covariate shift (wet surface), through policy action errors, to the resulting physical failure. At each stage, identify which safety mechanism (CBF, anomaly monitor, fallback controller) would or would not detect and interrupt the cascade, and why.
Compare the computational complexity of two safety monitoring approaches: (a) fitting a normalizing flow density model to the training observations and evaluating log-likelihood at each timestep (at 50 Hz), and (b) maintaining a running nearest-neighbor index over training observations and checking the distance to the nearest training observation at each timestep. For a training dataset of 100,000 observations with 1000-dimensional visual features, estimate the latency of each approach and identify the algorithmic modifications needed to make each approach real-time feasible.
The shielded policy framework allows aggressive RL exploration while maintaining safety through a classical backup controller. During RL training, the backup controller engages frequently early in training (when the policy is random) and less frequently as training progresses. Analyze the effect of frequent backup engagement on the RL training dynamics: does the policy receive correct gradient signal for the states where the backup engaged? How does this affect the policy's ability to learn behavior near constraint boundaries? Propose a training curriculum for the shielding threshold that improves learning near boundaries without compromising safety.

Solutions

Joint-limit CBF. Here $\dot h_3 = \dot q_3\,[(q_3^{\max}+q_3^{\min}) - 2q_3]$ , which depends only on $\dot q_3$ — note $h$ is a function of position while the control is acceleration, so $h$ has relative degree 2 and $\ddot q_3$ does not appear in a first-order $\dot h + \kappa h \ge 0$ constraint. The correct formulation is an exponential/higher-order CBF whose constraint involves $\ddot q_3$ . With $q_3 = q_3^{\max}-0.05$ and $\dot q_3 = +0.3$ (moving toward the limit), $h_3$ is small and shrinking ( $\dot h_3 < 0$ ), so the safety filter must command a deceleration; the maximum admissible $\ddot q_3$ is the largest value keeping the higher-order CBF condition satisfied, i.e. the filter caps the proposed $0.8$ rad/s² and forces braking as the joint nears its limit.
Force-sensor drift. A $+5$ N z-bias eats the monitor's margin: a real 15 N contact reads $15+5 = 20$ N, exactly at the 20 N threshold, so any bias $\ge 5$ N makes a true 15 N contact go undetected — specificity drifts and sensitivity to real contacts collapses. Compensate by periodic re-zeroing when known to be contact-free, high-pass filtering (contacts are transient, drift is slow), or subtracting a model-predicted expected force; a bias-robust alternative is a momentum/residual observer that detects contact from dynamics rather than absolute force.
Wet-surface cascade. Covariate shift (40% less friction, unseen) → the dry-optimal slide imparts too much motion → the object drifts off course → task failure (and possibly a fall). A CBF would not catch it — no safety constraint is violated; it is a task error. An OOD/anomaly monitor could flag the novel low-friction observation or dynamics, and only then could a fallback controller intervene. So: CBF no, anomaly monitor yes (if trained), fallback only when the monitor triggers.
Monitor complexity. (a) A normalizing-flow log-likelihood is one network forward pass per step on the 1000-dim features — fixed cost, real-time at 50 Hz if the flow is shallow (GPU). (b) Exact nearest neighbor over $100{,}000 \times 1000$ is $\sim10^8$ ops/step, too slow naively. Make (b) real-time with an approximate-NN index (HNSW/FAISS) or product-quantization compression; make (a) real-time by limiting flow depth and batching. NF gives a smooth density; NN gives nonparametric coverage.
Shielded RL. When the backup engages, the executed action is not the policy's proposal, so crediting the policy with the backup's outcome gives misleading gradients — frequent early engagement means almost no correct signal near the boundary, and the policy never learns good boundary behavior. Curriculum: begin with a conservative shield (engages readily for safety) and gradually relax the shielding threshold as the policy improves so it experiences near-boundary states — crediting only the policy's own actions (off-policy correction) — tightening again only if safety degrades.

Looking ahead#

Safety mechanisms ensure that individual robot policies fail gracefully. The next challenge is scaling robot learning across tasks and embodiments — deploying systems that can perform many tasks on many robots, sharing knowledge and representations across the full diversity of deployment scenarios.

Week 13: Multi-Robot and Multi-Task Learning. We examine multi-task conditioning, hierarchical skill architectures, policy distillation, and the centralized training with decentralized execution paradigm that enables scalable multi-robot systems.

Purpose of this lecture#

Safety as a systems problem#

Safety filters and shielded policies#

Control Barrier Functions#

CBF Definition#

\sup_{u \in \mathcal{U}} \left[\dot{h}(x, u)\right] \geq -\kappa(h(x)) \quad \forall x \in \mathcal{C}

CBF-QP Implementation#

The CBF-QP is the standard implementation: at each timestep, find the minimum perturbation to the policy's proposed action $u^* = \pi_\theta(s)$ that satisfies the CBF constraint:

u_{\text{safe}} = \arg\min_{u} \| u - u^* \|^2 \quad \text{s.t.} \quad \dot{h}(x, u) + \kappa(h(x)) \geq 0

Higher-Relative-Degree CBFs#

\psi_0 = h(q), \qquad \psi_1 = \dot{\psi}_0 + \alpha_1(\psi_0) = -\dot{q} + \alpha_1(q^{\max} - q)

and then requiring

\dot{\psi}_1(q, \dot{q}, \tau) + \alpha_2(\psi_1) \geq 0

Failure modes: covariate shift, rare contacts, and latency#

Understanding the mechanisms of deployment failure enables targeted countermeasures.

Covariate Shift#

d_M(z) = \sqrt{(z - \mu_{\text{train}})^\top \Sigma_{\text{train}}^{-1} (z - \mu_{\text{train}})}

Rare Contact Events#

Hardware Latency#

Runtime monitoring and anomaly detection#

Runtime monitoring converts the implicit safety assumptions of a learned policy into explicit metrics that can be tracked and acted on. Effective monitoring systems track:

Action Magnitude and Rate of Change#

Safety Constraint Margins#

Policy Uncertainty#

Task Progress#

Fallback and recovery strategies#

When monitoring detects an anomaly, the system must degrade gracefully. The spectrum of fallback options ranges from minor adjustments to full recovery handoffs:

Action Clipping and Filtering#

Freeze-and-Notify#

Fallback Controller Engagement#

u_{\text{exec}}(t) = \bigl(1 - \alpha(t)\bigr)\, u_{\text{policy}}(t) \;+\; \alpha(t)\, u_{\text{fallback}}(t)

Task Restart#

GenAI context: safety architecture parallels#

The architecture of robot safety has direct analogs in the GenAI safety stack:

Key takeaways#

Conceptual questions#

A 7-DoF manipulator arm has a CBF defined for joint limit safety: $h_i(q) = (q_i^{\max} - q_i)(q_i - q_i^{\min})$ for each joint $i$ . The current joint $i = 3$ state is $q_3 = q_3^{\max} - 0.05$ rad, with $\dot{q}_3 = 0.3$ rad/s and the policy proposing $\ddot{q}_3 = 0.8$ rad/s². The CBF constraint requires $\dot{h}_3 + \kappa h_3 \geq 0$ with $\kappa = 2$ . Compute $\dot{h}_3$ for the proposed action and determine whether the CBF-QP will modify it. If modification is required, find the maximum $\ddot{q}_3$ that satisfies the constraint.
A manufacturing robot experiences sensor drift: the force-torque sensor at the wrist develops a 5N bias in the z-axis over a 4-hour shift. The safety monitor uses a threshold of 20N to detect unexpected contacts. Analyze how this drift affects the safety monitor's sensitivity and specificity over the shift. At what bias level does the monitor fail to detect a real 15N contact force? Propose a drift-compensation method and an alternative contact detection strategy that is robust to sensor bias.
A VLA policy is deployed in a kitchen environment. The training dataset contained no examples of a wet table surface (which reduces friction by 40%). The policy attempts to slide an object across the wet surface using a motion that was optimal on the dry surface, but the object drifts off-course. Describe the full failure cascade: starting from the covariate shift (wet surface), through policy action errors, to the resulting physical failure. At each stage, identify which safety mechanism (CBF, anomaly monitor, fallback controller) would or would not detect and interrupt the cascade, and why.
Compare the computational complexity of two safety monitoring approaches: (a) fitting a normalizing flow density model to the training observations and evaluating log-likelihood at each timestep (at 50 Hz), and (b) maintaining a running nearest-neighbor index over training observations and checking the distance to the nearest training observation at each timestep. For a training dataset of 100,000 observations with 1000-dimensional visual features, estimate the latency of each approach and identify the algorithmic modifications needed to make each approach real-time feasible.
The shielded policy framework allows aggressive RL exploration while maintaining safety through a classical backup controller. During RL training, the backup controller engages frequently early in training (when the policy is random) and less frequently as training progresses. Analyze the effect of frequent backup engagement on the RL training dynamics: does the policy receive correct gradient signal for the states where the backup engaged? How does this affect the policy's ability to learn behavior near constraint boundaries? Propose a training curriculum for the shielding threshold that improves learning near boundaries without compromising safety.

Solutions

Joint-limit CBF. Here $\dot h_3 = \dot q_3\,[(q_3^{\max}+q_3^{\min}) - 2q_3]$ , which depends only on $\dot q_3$ — note $h$ is a function of position while the control is acceleration, so $h$ has relative degree 2 and $\ddot q_3$ does not appear in a first-order $\dot h + \kappa h \ge 0$ constraint. The correct formulation is an exponential/higher-order CBF whose constraint involves $\ddot q_3$ . With $q_3 = q_3^{\max}-0.05$ and $\dot q_3 = +0.3$ (moving toward the limit), $h_3$ is small and shrinking ( $\dot h_3 < 0$ ), so the safety filter must command a deceleration; the maximum admissible $\ddot q_3$ is the largest value keeping the higher-order CBF condition satisfied, i.e. the filter caps the proposed $0.8$ rad/s² and forces braking as the joint nears its limit.
Force-sensor drift. A $+5$ N z-bias eats the monitor's margin: a real 15 N contact reads $15+5 = 20$ N, exactly at the 20 N threshold, so any bias $\ge 5$ N makes a true 15 N contact go undetected — specificity drifts and sensitivity to real contacts collapses. Compensate by periodic re-zeroing when known to be contact-free, high-pass filtering (contacts are transient, drift is slow), or subtracting a model-predicted expected force; a bias-robust alternative is a momentum/residual observer that detects contact from dynamics rather than absolute force.
Wet-surface cascade. Covariate shift (40% less friction, unseen) → the dry-optimal slide imparts too much motion → the object drifts off course → task failure (and possibly a fall). A CBF would not catch it — no safety constraint is violated; it is a task error. An OOD/anomaly monitor could flag the novel low-friction observation or dynamics, and only then could a fallback controller intervene. So: CBF no, anomaly monitor yes (if trained), fallback only when the monitor triggers.
Monitor complexity. (a) A normalizing-flow log-likelihood is one network forward pass per step on the 1000-dim features — fixed cost, real-time at 50 Hz if the flow is shallow (GPU). (b) Exact nearest neighbor over $100{,}000 \times 1000$ is $\sim10^8$ ops/step, too slow naively. Make (b) real-time with an approximate-NN index (HNSW/FAISS) or product-quantization compression; make (a) real-time by limiting flow depth and batching. NF gives a smooth density; NN gives nonparametric coverage.
Shielded RL. When the backup engages, the executed action is not the policy's proposal, so crediting the policy with the backup's outcome gives misleading gradients — frequent early engagement means almost no correct signal near the boundary, and the policy never learns good boundary behavior. Curriculum: begin with a conservative shield (engages readily for safety) and gradually relax the shielding threshold as the policy improves so it experiences near-boundary states — crediting only the policy's own actions (off-policy correction) — tightening again only if safety degrades.

Purpose of this lecture#

Safety as a systems problem#

Safety filters and shielded policies#

Control Barrier Functions#

CBF Definition#

CBF-QP Implementation#

Higher-Relative-Degree CBFs#

Failure modes: covariate shift, rare contacts, and latency#

Covariate Shift#

Rare Contact Events#

Hardware Latency#

Runtime monitoring and anomaly detection#

Action Magnitude and Rate of Change#

Safety Constraint Margins#

Policy Uncertainty#

Task Progress#

Fallback and recovery strategies#

Action Clipping and Filtering#

Freeze-and-Notify#

Fallback Controller Engagement#

Task Restart#

GenAI context: safety architecture parallels#

Key takeaways#

Conceptual questions#

Looking ahead#

Further reading#

Week 12: Safety, Constraints, and Reliability

Purpose of this lecture#

Safety as a systems problem#

Safety filters and shielded policies#

Control Barrier Functions#

CBF Definition#

CBF-QP Implementation#

Higher-Relative-Degree CBFs#

Failure modes: covariate shift, rare contacts, and latency#

Covariate Shift#

Rare Contact Events#

Hardware Latency#

Runtime monitoring and anomaly detection#

Action Magnitude and Rate of Change#

Safety Constraint Margins#

Policy Uncertainty#

Task Progress#

Fallback and recovery strategies#

Action Clipping and Filtering#

Freeze-and-Notify#

Fallback Controller Engagement#

Task Restart#

GenAI context: safety architecture parallels#

Key takeaways#

Conceptual questions#

Looking ahead#

Further reading#