Week 2: Dynamics and State Estimation

Purpose of this lecture#

In Week 1, we described how robots can move geometrically.
In practice, however, robots do not move freely through configuration space—they move under forces, inertia, friction, and noise.

This lecture introduces:

robot dynamics: how torques and forces produce motion,
state estimation: how robots infer their true state from noisy sensors.

These topics form the backbone of:

feedback control,
model-based planning,
sim-to-real transfer,
and safe, stable robot learning.

Learning-based methods assume a state estimate exists and is reasonably accurate. This lecture explains where that estimate comes from—and what happens when it is wrong.

Robot dynamics: the core problem#

Dynamics answers the question:

Given the robot's current state and applied forces, how will it move next?

Formally, we want equations of motion relating:

configuration $q$ ,
velocities $\dot{q}$ ,
accelerations $\ddot{q}$ ,
applied torques $\tau$ .

Two classical formulations yield the same equations through different derivation paths: Euler–Lagrange (energy-based) and Newton–Euler (force-based). Understanding both—and their structural properties—is essential for control design and for reasoning about why learning sometimes succeeds and sometimes fails.

Euler–Lagrange formulation#

The Euler–Lagrange approach derives dynamics from energy.

Lagrangian#

\mathcal{L}(q, \dot{q}) = T(q, \dot{q}) - V(q)

where $T$ is kinetic energy and $V$ is potential energy. The equations of motion are:

\frac{d}{dt} \left( \frac{\partial \mathcal{L}}{\partial \dot{q}} \right) - \frac{\partial \mathcal{L}}{\partial q} = \tau

Resulting structure (fixed-base manipulators)#

M(q)\ddot{q} + C(q,\dot{q})\dot{q} + g(q) = \tau

where $M(q)$ is the inertia matrix, $C(q,\dot{q})$ captures Coriolis and centrifugal terms, and $g(q)$ is the gravity vector.

Structural properties that matter#

These are not bookkeeping facts—they are the mathematical foundations of stability analysis and control design.

1. $M(q)$ is symmetric positive definite.

This means $M(q)$ is always invertible, which guarantees the forward dynamics problem:

\ddot{q} = M(q)^{-1}\bigl(\tau - C(q,\dot{q})\dot{q} - g(q)\bigr)

always has a unique solution. No configuration exists where the robot's acceleration is indeterminate. This underpins every simulation step in IsaacLab, MuJoCo, and any other physics engine—each timestep is solving exactly this equation.

2. $\dot{M}(q) - 2C(q,\dot{q})$ is skew-symmetric.

This is the passivity property. It implies that the total mechanical energy of the system can only change via external work done by $\tau$ —the internal dynamics are energy-conserving. Practically, this property:

enables energy-based Lyapunov stability proofs for feedback controllers,
is exploited by impedance and passivity-based controllers to guarantee stability without needing a perfect model,
appears in the theoretical analysis of actor-critic methods that use energy as a Lyapunov function.

If you read papers on contraction-based control or energy-shaping RL, you will encounter this property constantly.

3. The equation above applies to fixed-base systems only.

The floating-base generalization—which governs legged robots—is introduced below. The structure is similar but the contact force term fundamentally changes what is and is not directly controllable.

Why this formulation is useful#

Euler–Lagrange is compact, elegant, and generalizable to complex systems. Its main cost is that symbolic derivation becomes expensive for high-DoF robots. In practice, the structured form of $M$ , $C$ , $g$ is exploited analytically in control design even when numerical values are computed via Newton–Euler.

Newton–Euler formulation#

The Newton–Euler method derives dynamics from forces and moments applied directly to each link.

The two-pass algorithm#

Rather than deriving a global energy function, Newton–Euler processes the kinematic chain recursively:

Forward pass (base → end-effector): propagate joint velocities and accelerations link-by-link. For each link $i$ , compute the linear and angular acceleration of its center of mass given the motion of link $i-1$ and the current joint motion.

Backward pass (end-effector → base): propagate forces and torques in reverse. Starting from the end-effector (where external forces are known or zero), compute the force and torque that each link must exert on the previous link, and from that the required joint torque $\tau_i$ .

Properties#

This algorithm is $O(n)$ in the number of joints, making it the standard choice for real-time control. It is the computational backbone of most industrial robot controllers and is implemented in libraries like Pinocchio and KDL. For a 6-DoF arm, the full inverse dynamics can be evaluated in microseconds.

Both Euler–Lagrange and Newton–Euler yield identical equations of motion. The choice between them is a trade-off between analytical insight (Euler–Lagrange) and computational efficiency (Newton–Euler).

Floating-base dynamics#

Legged robots and humanoids are not fixed to the ground. This changes the problem fundamentally.

Equations of motion#

The generalized coordinates now include both the joint angles $q_j$ and the free-floating base pose, so $q = (q_b, q_j)$ with $\dim(q_b) = 6$ . The equations of motion become:

M(q)\ddot{q} + h(q,\dot{q}) = S^T \tau + J_c(q)^T f_{\text{contact}}

where:

$h(q,\dot{q})$ collects Coriolis, centrifugal, and gravity terms,
$S$ is the actuator selection matrix (zero rows for the unactuated base DoF),
$J_c(q)$ is the contact Jacobian,
$f_{\text{contact}}$ is the vector of ground reaction forces.

The contact Jacobian#

$J_c(q)$ maps contact-point velocities to generalized velocities, and by duality, maps contact forces in Cartesian space to generalized forces in joint space. It is the mathematical bridge between what happens at the feet and what happens at the joints and base.

When a foot is in contact with the ground and not slipping, the contact constraint imposes:

J_c(q)\dot{q} = 0

This constraint must be satisfied simultaneously with the equations of motion, turning locomotion into a constrained dynamics problem at every timestep.

Friction cone constraints#

Contact forces are not arbitrary. A foot can push against the ground but not pull. And the tangential (friction) force is bounded by the normal force:

\sqrt{f_x^2 + f_y^2} \leq \mu f_z, \quad f_z \geq 0

This is the friction cone constraint. It is what makes contact modeling hard:

Violating it causes slipping.
Satisfying it requires the robot to actively shape its contact forces through posture and joint torques.
In classical control, this is handled as a constrained optimization (e.g. whole-body control with QP solvers). In RL, it is typically enforced implicitly through reward penalties or terrain curriculum—the policy must learn to stay inside the cone without ever being told about it explicitly.

Hybrid dynamics and contact switching#

Legged locomotion involves discrete contact mode switches: stance legs become swing legs and vice versa. Each contact configuration defines a different constrained dynamical system. The overall locomotion dynamics are therefore hybrid: continuous within each contact mode, with discrete jumps when contact is made or broken.

This hybrid structure is one of the core reasons legged locomotion is hard:

Trajectory optimization must plan across contact mode switches.
RL policies must implicitly learn when to switch contacts.
Sim-to-real transfer is sensitive to contact timing—a small error in when a foot strikes the ground propagates into a different dynamical regime.

The "contact dynamics are discontinuous" observation from classical control is precisely this phenomenon. It is why learning-based control is attractive for legged systems: learning can implicitly absorb the complexity of contact switching that is hard to model analytically.

Domain randomization and dynamics mismatch#

Accurate dynamics modeling matters for sim-to-real, but perfect modeling is impossible. The dominant practical tool for handling dynamics mismatch is domain randomization: randomly varying physical parameters (link masses, friction coefficients, actuator damping, motor time constants) during training so that the policy learns to be robust to a distribution of dynamics rather than a single model.

Understanding the dynamics equation above clarifies which parameters matter most for randomization. Errors in $M(q)$ affect inertial response; errors in friction affect contact stability; errors in actuator models affect torque tracking. Domain randomization is not blind noise injection—it is targeted perturbation of the parameters that appear in the equations of motion.

Sensors and observation models#

Robots do not observe their true state directly. Instead, they receive sensor measurements:

z_t = h(x_t) + v_t

where $x_t$ is the true state, $h(\cdot)$ is the sensor model mapping state to expected measurement, and $v_t$ is sensor noise (typically modeled as zero-mean Gaussian with covariance $R$ ).

This equation is not just notation—it is the basis of the Kalman filter update step and every Bayesian estimator that follows. The key insight is that sensors give us probabilistic information about state, not state itself.

Common robot sensors#

Encoders#

Measure joint positions (and by differentiation, velocities). Accurate and high-rate but local—they say nothing about where the robot is in the world or what its base is doing.

IMU (Inertial Measurement Unit)#

Accelerometers measure specific force (acceleration minus gravity); gyroscopes measure angular velocity. IMUs are high-rate and don't require external reference, but they drift: integrating accelerometer readings to get velocity, and integrating velocity to get position, compounds small errors over time. Over seconds to minutes, a standalone IMU estimate of base position becomes unusable. This is why IMU alone cannot maintain long-horizon state estimates and must be fused with other modalities.

Vision and depth sensing#

Cameras and depth sensors provide rich environmental information and global position cues. They are lower-rate than IMUs (typically 10–30 Hz for depth, 30–60 Hz for RGB), sensitive to lighting and occlusion, and computationally expensive to process. In visual-inertial odometry (VIO), vision corrects IMU drift by providing long-range, globally consistent position estimates.

Depth sensors — RGB-D cameras such as the Intel RealSense or structured-light devices — provide per-pixel depth measurements alongside color, enabling 3D reconstruction of the environment. For locomotion RL, depth data is commonly processed into two intermediate representations. A heightmap is a 2D grid encoding the terrain elevation $h(x,y)$ sampled from the depth image, projected into the robot's local frame around the feet. It is compact (typically $100 \times 100$ or smaller), easy to augment with domain randomization (noise, quantization, partial occlusion), and directly relevant to foot placement planning. A point cloud retains the full 3D spatial structure of the scene, expressed as a set of 3D points $\{(x_i, y_i, z_i)\}$ , and is the natural output for grasping and manipulation tasks where object geometry matters.

Both representations are high-dimensional relative to proprioceptive signals. A $64 \times 64$ heightmap has 4,096 entries — far more than the 40–60-dimensional proprioceptive vector of a typical legged robot. Conditioning a policy directly on raw heightmaps causes a dramatic increase in sample complexity. The standard approach is to learn a latent encoder that compresses the high-dimensional visual observation into a low-dimensional representation $z_t = \phi_\psi(d_t)$ , where $d_t$ is the depth image or heightmap and $z_t \in \mathbb{R}^{k}$ with $k \ll \dim(d_t)$ . The encoder $\phi_\psi$ is typically trained alongside the policy (end-to-end) or through the teacher-student framework described in the learning connection section below: the teacher uses privileged terrain information, and the student encoder is trained to recover the latent code $z_t$ from raw depth observations.

Each sensor is partial and noisy. No single sensor is sufficient for robust locomotion or manipulation. State estimation is the problem of combining them.

State estimation: the Bayesian perspective#

State estimation is fundamentally a Bayesian inference problem.

At each timestep we maintain a belief: a probability distribution over the true state $x_t$ given all past measurements and controls. Formally:

p(x_t \mid z_{1:t}, u_{1:t})

The dynamics model $p(x_{t+1} \mid x_t, u_t)$ tells us how the state evolves under control. The sensor model $p(z_t \mid x_t)$ tells us how likely a measurement is given a state. Bayes' rule combines them.

The Kalman filter is the exact solution to this problem under linear-Gaussian assumptions. EKF, UKF, and particle filters are progressively more flexible approximations when those assumptions are violated.

Kalman filter (linear systems)#

For linear dynamics and observations with Gaussian noise:

x_{t+1} = A x_t + B u_t + w_t, \qquad w_t \sim \mathcal{N}(0, Q)

z_t = H x_t + v_t, \qquad v_t \sim \mathcal{N}(0, R)

the Kalman filter is the optimal (minimum mean-squared error) estimator.

Prediction step#

Using the dynamics model alone, propagate the current estimate forward:

\hat{x}_{t|t-1} = A\hat{x}_{t-1|t-1} + Bu_t

P_{t|t-1} = A P_{t-1|t-1} A^T + Q

where $P$ is the error covariance matrix—a measure of our current uncertainty about the state. High $Q$ (process noise) means the dynamics model is unreliable; the prediction step increases uncertainty.

Update step#

When a measurement $z_t$ arrives, compute the Kalman gain:

K_t = P_{t|t-1} H^T \bigl(H P_{t|t-1} H^T + R\bigr)^{-1}

$K_t$ is a weighting matrix that balances trust in the model prediction vs. trust in the measurement. When $R$ is small (accurate sensors), $K_t$ is large and the measurement dominates. When $R$ is large (noisy sensors), $K_t$ is small and the prediction dominates.

Then update the estimate:

\hat{x}_{t|t} = \hat{x}_{t|t-1} + K_t(z_t - H\hat{x}_{t|t-1})

P_{t|t} = (I - K_t H) P_{t|t-1}

The term $(z_t - H\hat{x}_{t|t-1})$ is the innovation—the discrepancy between what we measured and what the model predicted we would measure. The Kalman filter is an optimal way to use that discrepancy to correct the state estimate.

Python Implementation: 1D Kalman Filter#

The following code implements a simple 1D Kalman filter to track a robot's position based on a velocity model and noisy position measurements.

python · runs in browser

import numpy as np

class KalmanFilter1D:
    def __init__(self, x0, p0, q, r):
        self.x = x0  # Initial state estimate
        self.p = p0  # Initial error covariance
        self.q = q   # Process noise covariance
        self.r = r   # Measurement noise covariance

    def predict(self, u, dt=0.1):
        """
        Prediction step: x = A*x + B*u
        A = 1 (constant position model), B = dt
        """
        # Predicted state estimate
        self.x = self.x + u * dt
        # Predicted error covariance
        self.p = self.p + self.q

    def update(self, z):
        """
        Update step: Correct estimate with measurement z
        H = 1 (direct position measurement)
        """
        # Innovation (measurement residual)
        innovation = z - self.x
        
        # Kalman Gain: K = P / (P + R)
        k = self.p / (self.p + self.r)
        
        # Updated state estimate
        self.x = self.x + k * innovation
        # Updated error covariance
        self.p = (1 - k) * self.p
        
        return self.x, self.p

# Simulation setup
kf = KalmanFilter1D(x0=0, p0=1.0, q=0.1, r=5.0)
true_pos = 0
velocity = 1.0
dt = 0.1

print("Step | True Pos | Measurement | KF Estimate | Error")
print("-" * 50)

for step in range(10):
    true_pos += velocity * dt
    measurement = true_pos + np.random.normal(0, np.sqrt(5.0))
    
    kf.predict(u=velocity, dt=dt)
    est_x, est_p = kf.update(z=measurement)
    
    print(f"{step:4d} | {true_pos:8.2f} | {measurement:11.2f} | {est_x:11.2f} | {abs(true_pos-est_x):5.2f}")

Extended Kalman Filter (EKF)#

Robot dynamics and sensors are nonlinear. The EKF extends the Kalman filter by linearizing the nonlinear functions around the current estimate at each timestep:

Python Implementation: Extended Kalman Filter (Nonlinear)#

The following code illustrates the EKF principle by tracking a robot moving along a curved path ( $x, y, \theta$ ).

python · runs in browser

def ekf_predict(x, P, u, dt, Q):
    """
    Nonlinear prediction: x = f(x, u)
    f(x, u) = [x + v*cos(theta)*dt, y + v*sin(theta)*dt, theta + omega*dt]
    """
    v, omega = u
    theta = x[2]
    
    # Nonlinear transition
    x_next = x + np.array([
        v * np.cos(theta) * dt,
        v * np.sin(theta) * dt,
        omega * dt
    ])
    
    # Jacobian of f with respect to x (F)
    F = np.array([
        [1, 0, -v * np.sin(theta) * dt],
        [0, 1,  v * np.cos(theta) * dt],
        [0, 0, 1]
    ])
    
    P_next = F @ P @ F.T + Q
    return x_next, P_next

# Initial state [x, y, theta]
x_est = np.array([0.0, 0.0, 0.0])
P_est = np.eye(3)
u_input = np.array([1.0, 0.1]) # velocity, angular velocity
dt_val = 0.1
Q_noise = np.eye(3) * 0.01

x_est, P_est = ekf_predict(x_est, P_est, u_input, dt_val, Q_noise)
print(f"Predicted State: {x_est}")

F_t = \frac{\partial f}{\partial x}\bigg|_{\hat{x}_{t|t}}, \qquad H_t = \frac{\partial h}{\partial x}\bigg|_{\hat{x}_{t|t-1}}

These Jacobians replace $A$ and $H$ in the Kalman equations, giving a locally-linear approximation that is recomputed at every step.

Limitations#

Accuracy depends on how well the local linearization approximates the true nonlinearity.
Can diverge when the system is highly nonlinear or the initial estimate is far from the truth.
Covariance propagation via $F_t P F_t^T$ does not capture higher-order uncertainty.

Despite these limitations, EKFs remain a practical workhorse in robotics.

Beyond the EKF in modern legged systems#

For legged robots specifically, EKFs have been augmented or partially replaced by:

Invariant EKFs (InEKFs): exploit the Lie group structure of $SE(3)$ to improve linearization accuracy for orientation and velocity estimation. Used in MIT Cheetah and related systems.
Factor graph estimators (e.g. GTSAM, iSAM2): model the full history of states and measurements as a probabilistic graphical model, enabling loop closure and long-horizon consistency. Standard in SLAM systems.
Contact-aided state estimators: use contact detection (from foot force sensors or contact switches) as an additional constraint to improve base velocity estimation when visual or external references are unavailable.

Most modern locomotion deployments fuse IMU, encoders, and contact detection into a proprioceptive estimator for base velocity and orientation—often implemented as an EKF or InEKF—because GPS and visual odometry are unreliable in unstructured environments.

Sensor fusion#

No single sensor is sufficient. Sensor fusion combines complementary modalities to obtain estimates that are more accurate and more robust than any individual sensor.

| Sensor | Strength | Weakness | |---|---|---| | IMU | High-rate, no external reference | Drifts over time | | Encoders | Accurate joint state | No global pose information | | Vision | Global position, loop closure | Slow, fragile to occlusion/lighting | | Contact detection | Direct contact state | Binary, noisy at transitions |

Key principle: dynamics provide short-term prediction; exteroceptive sensing (vision, lidar) provides long-term correction. This is the same structure as a predictor-corrector and mirrors the belief update in a POMDP.

Learning connection#

Understanding dynamics and state estimation is not background material—it is directly embedded in the structure of modern robot learning systems.

Privileged information and teacher-student frameworks#

In simulation, a policy can be trained with access to the true state: exact base velocity, ground reaction forces, contact states, terrain height under each foot. None of these are directly available on hardware. This gap is not an implementation detail—it is a fundamental challenge in sim-to-real transfer.

The standard solution in locomotion RL (RMA, Walk These Ways, DreamWaQ, and others) is a teacher-student framework:

A teacher policy trains in simulation with privileged state access.
A student policy (or adaptation module) learns to estimate the privileged information from the observable history—proprioception, IMU, encoder data—and then either distills the teacher's behavior or adapts the policy latent code at deployment.

The adaptation module is, in effect, a learned state estimator. It is doing the same job as the EKF described above—inferring hidden state from a sequence of noisy observations— but it learns the mapping from data rather than deriving it analytically from the dynamics model. Understanding what it is trying to estimate, and why, requires understanding the dynamics and observation models introduced in this lecture.

POMDPs and history-conditioned policies#

When state estimation is imperfect, the robot operates in a Partially Observable Markov Decision Process (POMDP). The true state $x_t$ is hidden; only observations $z_t$ are available. The optimal policy in a POMDP is a function of the belief state $p(x_t \mid z_{1:t})$ —the full posterior over states given history.

Policies that stack recent observations (a fixed-length history window) or use recurrent architectures (LSTM, transformer) are implicitly approximating this belief state. The transformer memory over proprioceptive history in recent locomotion policies is performing the same computation as the Kalman smoother—integrating new observations with prior predictions—but learned from data.

The connection is direct: the Kalman filter you derived above is a special case of optimal belief state computation. Recurrent policies are a general learned approximation to the same problem.

Sim-to-real failure and estimation errors#

Empirically, errors in state estimation are among the most common causes of sim-to-real failure—often more significant than dynamics mismatch. A policy trained assuming perfect base velocity estimation will behave unpredictably when the on-hardware estimator lags, drifts, or becomes inconsistent during fast maneuvers or at contact transitions. Designing policies that are robust to estimation noise—through domain randomization on observation noise, or by training with a realistic observation model—requires understanding the estimation pipeline described in this lecture.

Key takeaways#

The concepts in this lecture form a chain that connects physics to perception to learning:

The Euler–Lagrange equations give the structure of robot dynamics, including properties (positive definiteness of $M$ , passivity of $\dot{M} - 2C$ ) that control designers and stability theorists rely on. Newton–Euler gives us an efficient way to compute those dynamics in real time. Floating-base dynamics extends the picture to legged systems, where contact forces, friction cone constraints, and hybrid switching govern what the robot can and cannot do. Sensors provide noisy, partial windows into the true state, modeled probabilistically. The Kalman filter is the optimal estimator under linear-Gaussian assumptions, combining dynamics prediction with measurement correction in a principled way. The EKF extends this to nonlinear systems and remains the workhorse of real-robot state estimation. And all of this connects directly to learning: privileged training, teacher-student distillation, and history-conditioned policies are architectural responses to the state estimation problem described in this lecture.

Conceptual questions#

The inertia matrix $M(q)$ is positive definite. What would it mean physically if $M(q)$ were singular at some configuration, and why would this break simulation?
A legged robot is executing a trot gait. Describe the sequence of contact mode switches during one stride. Why does each switch constitute a change in the robot's dynamical system, not just a change in its configuration?
A Kalman filter is running on a legged robot's IMU and encoder data. The filter consistently underestimates the robot's base velocity during fast accelerations. What could cause this, and what would you change in the filter to address it?
An RL locomotion policy is trained in simulation with access to the true contact state (which feet are in contact). At deployment, contact state must be estimated from foot force sensors with significant noise. Describe two different architectural strategies for handling this gap and what each one assumes about the estimation problem.
Why is a recurrent policy that conditions on a window of proprioceptive history performing a computation analogous to Bayesian filtering? What assumptions does the Kalman filter make that such a policy does not?

Solutions

Singular $M(q)$ . A singular inertia matrix means some acceleration direction has zero generalized inertia, so a finite force implies unbounded acceleration — a degenerate, unconstrained motion. Forward dynamics needs $\ddot q = M^{-1}(\tau - C\dot q - g)$ , and a non-invertible $M$ makes that blow up numerically. Real rigid-body $M$ is always positive definite, so singularity signals a modeling error such as a zero-mass link.
Trot contact switches. A trot alternates diagonal pairs: one diagonal in stance while the other swings, then they swap, with brief touchdown/liftoff transitions. Each switch activates or removes contact constraints, changing the active constraint manifold and therefore the equations of motion — it is a hybrid (piecewise) dynamical system, not merely a new configuration.
Kalman underestimates velocity. The process model likely assumes near-constant velocity with too-small process noise, so the filter lags rapid acceleration by over-trusting the model. Fix by raising the process-noise covariance on velocity/acceleration, adding an acceleration state, or feeding the IMU acceleration as a control input in the prediction step.
Contact-state gap. (a) Teacher-student / privileged distillation: a teacher trained with true contact distills into a student that infers contact from a proprioceptive history — assumes contact is observable from proprioception. (b) Contact-agnostic robust policy: a recurrent policy that never conditions on explicit contact and is hardened with noise/randomization — assumes the policy can be robust without an explicit estimate.
Recurrent policy as filtering. A window of proprioceptive history lets the network infer latent state (velocity, contact, terrain) just as a filter integrates past observations into a belief — both maintain an implicit posterior over hidden state. The Kalman filter assumes linear-Gaussian dynamics with known noise models; the recurrent policy assumes none of that and learns an arbitrary nonlinear filter from data.

Looking ahead#

With dynamics and state estimation in place, we now ask:

How do we control these systems reliably?

Week 3: Control Fundamentals
We study feedback control, stability, optimal control (LQR, MPC), and how these ideas connect directly to reinforcement learning.

Purpose of this lecture#

In Week 1, we described how robots can move geometrically.
In practice, however, robots do not move freely through configuration space—they move under forces, inertia, friction, and noise.

This lecture introduces:

robot dynamics: how torques and forces produce motion,
state estimation: how robots infer their true state from noisy sensors.

These topics form the backbone of:

feedback control,
model-based planning,
sim-to-real transfer,
and safe, stable robot learning.

Learning-based methods assume a state estimate exists and is reasonably accurate. This lecture explains where that estimate comes from—and what happens when it is wrong.

Robot dynamics: the core problem#

Dynamics answers the question:

Given the robot's current state and applied forces, how will it move next?

Formally, we want equations of motion relating:

configuration $q$ ,
velocities $\dot{q}$ ,
accelerations $\ddot{q}$ ,
applied torques $\tau$ .

Euler–Lagrange formulation#

The Euler–Lagrange approach derives dynamics from energy.

Lagrangian#

\mathcal{L}(q, \dot{q}) = T(q, \dot{q}) - V(q)

where $T$ is kinetic energy and $V$ is potential energy. The equations of motion are:

\frac{d}{dt} \left( \frac{\partial \mathcal{L}}{\partial \dot{q}} \right) - \frac{\partial \mathcal{L}}{\partial q} = \tau

Resulting structure (fixed-base manipulators)#

M(q)\ddot{q} + C(q,\dot{q})\dot{q} + g(q) = \tau

where $M(q)$ is the inertia matrix, $C(q,\dot{q})$ captures Coriolis and centrifugal terms, and $g(q)$ is the gravity vector.

Structural properties that matter#

These are not bookkeeping facts—they are the mathematical foundations of stability analysis and control design.

1. $M(q)$ is symmetric positive definite.

This means $M(q)$ is always invertible, which guarantees the forward dynamics problem:

\ddot{q} = M(q)^{-1}\bigl(\tau - C(q,\dot{q})\dot{q} - g(q)\bigr)

2. $\dot{M}(q) - 2C(q,\dot{q})$ is skew-symmetric.

enables energy-based Lyapunov stability proofs for feedback controllers,
is exploited by impedance and passivity-based controllers to guarantee stability without needing a perfect model,
appears in the theoretical analysis of actor-critic methods that use energy as a Lyapunov function.

If you read papers on contraction-based control or energy-shaping RL, you will encounter this property constantly.

3. The equation above applies to fixed-base systems only.

Why this formulation is useful#

Newton–Euler formulation#

The Newton–Euler method derives dynamics from forces and moments applied directly to each link.

The two-pass algorithm#

Rather than deriving a global energy function, Newton–Euler processes the kinematic chain recursively:

Properties#

Floating-base dynamics#

Legged robots and humanoids are not fixed to the ground. This changes the problem fundamentally.

Equations of motion#

The generalized coordinates now include both the joint angles $q_j$ and the free-floating base pose, so $q = (q_b, q_j)$ with $\dim(q_b) = 6$ . The equations of motion become:

M(q)\ddot{q} + h(q,\dot{q}) = S^T \tau + J_c(q)^T f_{\text{contact}}

where:

$h(q,\dot{q})$ collects Coriolis, centrifugal, and gravity terms,
$S$ is the actuator selection matrix (zero rows for the unactuated base DoF),
$J_c(q)$ is the contact Jacobian,
$f_{\text{contact}}$ is the vector of ground reaction forces.

The contact Jacobian#

When a foot is in contact with the ground and not slipping, the contact constraint imposes:

J_c(q)\dot{q} = 0

This constraint must be satisfied simultaneously with the equations of motion, turning locomotion into a constrained dynamics problem at every timestep.

Friction cone constraints#

Contact forces are not arbitrary. A foot can push against the ground but not pull. And the tangential (friction) force is bounded by the normal force:

\sqrt{f_x^2 + f_y^2} \leq \mu f_z, \quad f_z \geq 0

This is the friction cone constraint. It is what makes contact modeling hard:

Violating it causes slipping.
Satisfying it requires the robot to actively shape its contact forces through posture and joint torques.
In classical control, this is handled as a constrained optimization (e.g. whole-body control with QP solvers). In RL, it is typically enforced implicitly through reward penalties or terrain curriculum—the policy must learn to stay inside the cone without ever being told about it explicitly.

Hybrid dynamics and contact switching#

This hybrid structure is one of the core reasons legged locomotion is hard:

Trajectory optimization must plan across contact mode switches.
RL policies must implicitly learn when to switch contacts.
Sim-to-real transfer is sensitive to contact timing—a small error in when a foot strikes the ground propagates into a different dynamical regime.

Domain randomization and dynamics mismatch#

Sensors and observation models#

Robots do not observe their true state directly. Instead, they receive sensor measurements:

z_t = h(x_t) + v_t

where $x_t$ is the true state, $h(\cdot)$ is the sensor model mapping state to expected measurement, and $v_t$ is sensor noise (typically modeled as zero-mean Gaussian with covariance $R$ ).

Common robot sensors#

Encoders#

Measure joint positions (and by differentiation, velocities). Accurate and high-rate but local—they say nothing about where the robot is in the world or what its base is doing.

IMU (Inertial Measurement Unit)#

Vision and depth sensing#

Each sensor is partial and noisy. No single sensor is sufficient for robust locomotion or manipulation. State estimation is the problem of combining them.

State estimation: the Bayesian perspective#

State estimation is fundamentally a Bayesian inference problem.

At each timestep we maintain a belief: a probability distribution over the true state $x_t$ given all past measurements and controls. Formally:

p(x_t \mid z_{1:t}, u_{1:t})

Kalman filter (linear systems)#

For linear dynamics and observations with Gaussian noise:

x_{t+1} = A x_t + B u_t + w_t, \qquad w_t \sim \mathcal{N}(0, Q)

z_t = H x_t + v_t, \qquad v_t \sim \mathcal{N}(0, R)

the Kalman filter is the optimal (minimum mean-squared error) estimator.

Prediction step#

Using the dynamics model alone, propagate the current estimate forward:

\hat{x}_{t|t-1} = A\hat{x}_{t-1|t-1} + Bu_t

P_{t|t-1} = A P_{t-1|t-1} A^T + Q

Update step#

When a measurement $z_t$ arrives, compute the Kalman gain:

K_t = P_{t|t-1} H^T \bigl(H P_{t|t-1} H^T + R\bigr)^{-1}

Then update the estimate:

\hat{x}_{t|t} = \hat{x}_{t|t-1} + K_t(z_t - H\hat{x}_{t|t-1})

P_{t|t} = (I - K_t H) P_{t|t-1}

Python Implementation: 1D Kalman Filter#

The following code implements a simple 1D Kalman filter to track a robot's position based on a velocity model and noisy position measurements.

python · runs in browser

import numpy as np

class KalmanFilter1D:
    def __init__(self, x0, p0, q, r):
        self.x = x0  # Initial state estimate
        self.p = p0  # Initial error covariance
        self.q = q   # Process noise covariance
        self.r = r   # Measurement noise covariance

    def predict(self, u, dt=0.1):
        """
        Prediction step: x = A*x + B*u
        A = 1 (constant position model), B = dt
        """
        # Predicted state estimate
        self.x = self.x + u * dt
        # Predicted error covariance
        self.p = self.p + self.q

    def update(self, z):
        """
        Update step: Correct estimate with measurement z
        H = 1 (direct position measurement)
        """
        # Innovation (measurement residual)
        innovation = z - self.x
        
        # Kalman Gain: K = P / (P + R)
        k = self.p / (self.p + self.r)
        
        # Updated state estimate
        self.x = self.x + k * innovation
        # Updated error covariance
        self.p = (1 - k) * self.p
        
        return self.x, self.p

# Simulation setup
kf = KalmanFilter1D(x0=0, p0=1.0, q=0.1, r=5.0)
true_pos = 0
velocity = 1.0
dt = 0.1

print("Step | True Pos | Measurement | KF Estimate | Error")
print("-" * 50)

for step in range(10):
    true_pos += velocity * dt
    measurement = true_pos + np.random.normal(0, np.sqrt(5.0))
    
    kf.predict(u=velocity, dt=dt)
    est_x, est_p = kf.update(z=measurement)
    
    print(f"{step:4d} | {true_pos:8.2f} | {measurement:11.2f} | {est_x:11.2f} | {abs(true_pos-est_x):5.2f}")

Extended Kalman Filter (EKF)#

Robot dynamics and sensors are nonlinear. The EKF extends the Kalman filter by linearizing the nonlinear functions around the current estimate at each timestep:

Python Implementation: Extended Kalman Filter (Nonlinear)#

The following code illustrates the EKF principle by tracking a robot moving along a curved path ( $x, y, \theta$ ).

python · runs in browser

def ekf_predict(x, P, u, dt, Q):
    """
    Nonlinear prediction: x = f(x, u)
    f(x, u) = [x + v*cos(theta)*dt, y + v*sin(theta)*dt, theta + omega*dt]
    """
    v, omega = u
    theta = x[2]
    
    # Nonlinear transition
    x_next = x + np.array([
        v * np.cos(theta) * dt,
        v * np.sin(theta) * dt,
        omega * dt
    ])
    
    # Jacobian of f with respect to x (F)
    F = np.array([
        [1, 0, -v * np.sin(theta) * dt],
        [0, 1,  v * np.cos(theta) * dt],
        [0, 0, 1]
    ])
    
    P_next = F @ P @ F.T + Q
    return x_next, P_next

# Initial state [x, y, theta]
x_est = np.array([0.0, 0.0, 0.0])
P_est = np.eye(3)
u_input = np.array([1.0, 0.1]) # velocity, angular velocity
dt_val = 0.1
Q_noise = np.eye(3) * 0.01

x_est, P_est = ekf_predict(x_est, P_est, u_input, dt_val, Q_noise)
print(f"Predicted State: {x_est}")

F_t = \frac{\partial f}{\partial x}\bigg|_{\hat{x}_{t|t}}, \qquad H_t = \frac{\partial h}{\partial x}\bigg|_{\hat{x}_{t|t-1}}

These Jacobians replace $A$ and $H$ in the Kalman equations, giving a locally-linear approximation that is recomputed at every step.

Limitations#

Accuracy depends on how well the local linearization approximates the true nonlinearity.
Can diverge when the system is highly nonlinear or the initial estimate is far from the truth.
Covariance propagation via $F_t P F_t^T$ does not capture higher-order uncertainty.

Despite these limitations, EKFs remain a practical workhorse in robotics.

Beyond the EKF in modern legged systems#

For legged robots specifically, EKFs have been augmented or partially replaced by:

Invariant EKFs (InEKFs): exploit the Lie group structure of $SE(3)$ to improve linearization accuracy for orientation and velocity estimation. Used in MIT Cheetah and related systems.
Factor graph estimators (e.g. GTSAM, iSAM2): model the full history of states and measurements as a probabilistic graphical model, enabling loop closure and long-horizon consistency. Standard in SLAM systems.
Contact-aided state estimators: use contact detection (from foot force sensors or contact switches) as an additional constraint to improve base velocity estimation when visual or external references are unavailable.

Sensor fusion#

No single sensor is sufficient. Sensor fusion combines complementary modalities to obtain estimates that are more accurate and more robust than any individual sensor.

Learning connection#

Understanding dynamics and state estimation is not background material—it is directly embedded in the structure of modern robot learning systems.

Privileged information and teacher-student frameworks#

The standard solution in locomotion RL (RMA, Walk These Ways, DreamWaQ, and others) is a teacher-student framework:

A teacher policy trains in simulation with privileged state access.
A student policy (or adaptation module) learns to estimate the privileged information from the observable history—proprioception, IMU, encoder data—and then either distills the teacher's behavior or adapts the policy latent code at deployment.

POMDPs and history-conditioned policies#

The connection is direct: the Kalman filter you derived above is a special case of optimal belief state computation. Recurrent policies are a general learned approximation to the same problem.

Sim-to-real failure and estimation errors#

Key takeaways#

The concepts in this lecture form a chain that connects physics to perception to learning:

Conceptual questions#

The inertia matrix $M(q)$ is positive definite. What would it mean physically if $M(q)$ were singular at some configuration, and why would this break simulation?
A legged robot is executing a trot gait. Describe the sequence of contact mode switches during one stride. Why does each switch constitute a change in the robot's dynamical system, not just a change in its configuration?
A Kalman filter is running on a legged robot's IMU and encoder data. The filter consistently underestimates the robot's base velocity during fast accelerations. What could cause this, and what would you change in the filter to address it?
An RL locomotion policy is trained in simulation with access to the true contact state (which feet are in contact). At deployment, contact state must be estimated from foot force sensors with significant noise. Describe two different architectural strategies for handling this gap and what each one assumes about the estimation problem.
Why is a recurrent policy that conditions on a window of proprioceptive history performing a computation analogous to Bayesian filtering? What assumptions does the Kalman filter make that such a policy does not?

Solutions

Singular $M(q)$ . A singular inertia matrix means some acceleration direction has zero generalized inertia, so a finite force implies unbounded acceleration — a degenerate, unconstrained motion. Forward dynamics needs $\ddot q = M^{-1}(\tau - C\dot q - g)$ , and a non-invertible $M$ makes that blow up numerically. Real rigid-body $M$ is always positive definite, so singularity signals a modeling error such as a zero-mass link.
Trot contact switches. A trot alternates diagonal pairs: one diagonal in stance while the other swings, then they swap, with brief touchdown/liftoff transitions. Each switch activates or removes contact constraints, changing the active constraint manifold and therefore the equations of motion — it is a hybrid (piecewise) dynamical system, not merely a new configuration.
Kalman underestimates velocity. The process model likely assumes near-constant velocity with too-small process noise, so the filter lags rapid acceleration by over-trusting the model. Fix by raising the process-noise covariance on velocity/acceleration, adding an acceleration state, or feeding the IMU acceleration as a control input in the prediction step.
Contact-state gap. (a) Teacher-student / privileged distillation: a teacher trained with true contact distills into a student that infers contact from a proprioceptive history — assumes contact is observable from proprioception. (b) Contact-agnostic robust policy: a recurrent policy that never conditions on explicit contact and is hardened with noise/randomization — assumes the policy can be robust without an explicit estimate.
Recurrent policy as filtering. A window of proprioceptive history lets the network infer latent state (velocity, contact, terrain) just as a filter integrates past observations into a belief — both maintain an implicit posterior over hidden state. The Kalman filter assumes linear-Gaussian dynamics with known noise models; the recurrent policy assumes none of that and learns an arbitrary nonlinear filter from data.

Looking ahead#

With dynamics and state estimation in place, we now ask:

How do we control these systems reliably?

Week 3: Control Fundamentals
We study feedback control, stability, optimal control (LQR, MPC), and how these ideas connect directly to reinforcement learning.

Purpose of this lecture#

Robot dynamics: the core problem#

Euler–Lagrange formulation#

Lagrangian#

Resulting structure (fixed-base manipulators)#

Structural properties that matter#

Why this formulation is useful#

Newton–Euler formulation#

The two-pass algorithm#

Properties#

Floating-base dynamics#

Equations of motion#

The contact Jacobian#

Friction cone constraints#

Hybrid dynamics and contact switching#

Domain randomization and dynamics mismatch#

Sensors and observation models#

Common robot sensors#

Encoders#

IMU (Inertial Measurement Unit)#

Vision and depth sensing#

State estimation: the Bayesian perspective#

Kalman filter (linear systems)#

Prediction step#

Update step#

Python Implementation: 1D Kalman Filter#

Extended Kalman Filter (EKF)#

Python Implementation: Extended Kalman Filter (Nonlinear)#

Limitations#

Beyond the EKF in modern legged systems#

Sensor fusion#

Learning connection#

Privileged information and teacher-student frameworks#

POMDPs and history-conditioned policies#

Sim-to-real failure and estimation errors#

Key takeaways#

Conceptual questions#

Looking ahead#

Further reading#

Week 2: Dynamics and State Estimation

Purpose of this lecture#

Robot dynamics: the core problem#

Euler–Lagrange formulation#

Lagrangian#

Resulting structure (fixed-base manipulators)#

Structural properties that matter#

Why this formulation is useful#

Newton–Euler formulation#

The two-pass algorithm#

Properties#

Floating-base dynamics#

Equations of motion#

The contact Jacobian#

Friction cone constraints#

Hybrid dynamics and contact switching#

Domain randomization and dynamics mismatch#

Sensors and observation models#

Common robot sensors#

Encoders#

IMU (Inertial Measurement Unit)#

Vision and depth sensing#

State estimation: the Bayesian perspective#

Kalman filter (linear systems)#

Prediction step#

Update step#

Python Implementation: 1D Kalman Filter#

Extended Kalman Filter (EKF)#

Python Implementation: Extended Kalman Filter (Nonlinear)#

Limitations#

Beyond the EKF in modern legged systems#

Sensor fusion#

Learning connection#

Privileged information and teacher-student frameworks#

POMDPs and history-conditioned policies#

Sim-to-real failure and estimation errors#

Key takeaways#

Conceptual questions#

Looking ahead#

Further reading#