Purpose of this lecture
In Week 1, we described how robots can move geometrically.
In practice, however, robots do not move freely through configuration space—they move under forces, inertia, friction, and noise.
This lecture introduces:
- robot dynamics: how torques and forces produce motion,
- state estimation: how robots infer their true state from noisy sensors.
These topics form the backbone of:
- feedback control,
- model-based planning,
- sim-to-real transfer,
- and safe, stable robot learning.
Learning-based methods assume a state estimate exists and is reasonably accurate. This lecture explains where that estimate comes from—and what happens when it is wrong.
Robot dynamics: the core problem
Dynamics answers the question:
Given the robot's current state and applied forces, how will it move next?
Formally, we want equations of motion relating:
- configuration ,
- velocities ,
- accelerations ,
- applied torques .
Two classical formulations yield the same equations through different derivation paths: Euler–Lagrange (energy-based) and Newton–Euler (force-based). Understanding both—and their structural properties—is essential for control design and for reasoning about why learning sometimes succeeds and sometimes fails.
Euler–Lagrange formulation
The Euler–Lagrange approach derives dynamics from energy.
Lagrangian
where is kinetic energy and is potential energy. The equations of motion are:
Resulting structure (fixed-base manipulators)
where is the inertia matrix, captures Coriolis and centrifugal terms, and is the gravity vector.
Structural properties that matter
These are not bookkeeping facts—they are the mathematical foundations of stability analysis and control design.
1. is symmetric positive definite.
This means is always invertible, which guarantees the forward dynamics problem:
always has a unique solution. No configuration exists where the robot's acceleration is indeterminate. This underpins every simulation step in IsaacLab, MuJoCo, and any other physics engine—each timestep is solving exactly this equation.
2. is skew-symmetric.
This is the passivity property. It implies that the total mechanical energy of the system can only change via external work done by —the internal dynamics are energy-conserving. Practically, this property:
- enables energy-based Lyapunov stability proofs for feedback controllers,
- is exploited by impedance and passivity-based controllers to guarantee stability without needing a perfect model,
- appears in the theoretical analysis of actor-critic methods that use energy as a Lyapunov function.
If you read papers on contraction-based control or energy-shaping RLReinforcement Learning, you will encounter this property constantly.
3. The equation above applies to fixed-base systems only.
The floating-base generalization—which governs legged robots—is introduced below. The structure is similar but the contact force term fundamentally changes what is and is not directly controllable.
Why this formulation is useful
Euler–Lagrange is compact, elegant, and generalizable to complex systems. Its main cost is that symbolic derivation becomes expensive for high-DoF robots. In practice, the structured form of , , is exploited analytically in control design even when numerical values are computed via Newton–Euler.
Newton–Euler formulation
The Newton–Euler method derives dynamics from forces and moments applied directly to each link.
The two-pass algorithm
Rather than deriving a global energy function, Newton–Euler processes the kinematic chain recursively:
Forward pass (base → end-effector): propagate joint velocities and accelerations link-by-link. For each link , compute the linear and angular acceleration of its center of mass given the motion of link and the current joint motion.
Backward pass (end-effector → base): propagate forces and torques in reverse. Starting from the end-effector (where external forces are known or zero), compute the force and torque that each link must exert on the previous link, and from that the required joint torque .
Properties
This algorithm is in the number of joints, making it the standard choice for real-time control. It is the computational backbone of most industrial robot controllers and is implemented in libraries like Pinocchio and KDL. For a 6-DoF arm, the full inverse dynamics can be evaluated in microseconds.
Both Euler–Lagrange and Newton–Euler yield identical equations of motion. The choice between them is a trade-off between analytical insight (Euler–Lagrange) and computational efficiency (Newton–Euler).
Floating-base dynamics
Legged robots and humanoids are not fixed to the ground. This changes the problem fundamentally.
Equations of motion
The generalized coordinates now include both the joint angles and the free-floating base pose, so with . The equations of motion become:
where:
- collects Coriolis, centrifugal, and gravity terms,
- is the actuator selection matrix (zero rows for the unactuated base DoF),
- is the contact Jacobian,
- is the vector of ground reaction forces.
The contact Jacobian
maps contact-point velocities to generalized velocities, and by duality, maps contact forces in Cartesian space to generalized forces in joint space. It is the mathematical bridge between what happens at the feet and what happens at the joints and base.
When a foot is in contact with the ground and not slipping, the contact constraint imposes:
This constraint must be satisfied simultaneously with the equations of motion, turning locomotion into a constrained dynamics problem at every timestep.
Friction cone constraints
Contact forces are not arbitrary. A foot can push against the ground but not pull. And the tangential (friction) force is bounded by the normal force:
This is the friction cone constraint. It is what makes contact modeling hard:
- Violating it causes slipping.
- Satisfying it requires the robot to actively shape its contact forces through posture and joint torques.
- In classical control, this is handled as a constrained optimization (e.g. whole-body control with QP solvers). In RLReinforcement Learning, it is typically enforced implicitly through reward penalties or terrain curriculum—the policy must learn to stay inside the cone without ever being told about it explicitly.
Hybrid dynamics and contact switching
Legged locomotion involves discrete contact mode switches: stance legs become swing legs and vice versa. Each contact configuration defines a different constrained dynamical system. The overall locomotion dynamics are therefore hybrid: continuous within each contact mode, with discrete jumps when contact is made or broken.
This hybrid structure is one of the core reasons legged locomotion is hard:
- Trajectory optimization must plan across contact mode switches.
- RLReinforcement Learning policies must implicitly learn when to switch contacts.
- Sim-to-real transfer is sensitive to contact timing—a small error in when a foot strikes the ground propagates into a different dynamical regime.
The "contact dynamics are discontinuous" observation from classical control is precisely this phenomenon. It is why learning-based control is attractive for legged systems: learning can implicitly absorb the complexity of contact switching that is hard to model analytically.
Domain randomization and dynamics mismatch
Accurate dynamics modeling matters for sim-to-real, but perfect modeling is impossible. The dominant practical tool for handling dynamics mismatch is domain randomization: randomly varying physical parameters (link masses, friction coefficients, actuator damping, motor time constants) during training so that the policy learns to be robust to a distribution of dynamics rather than a single model.
Understanding the dynamics equation above clarifies which parameters matter most for randomization. Errors in affect inertial response; errors in friction affect contact stability; errors in actuator models affect torque tracking. Domain randomization is not blind noise injection—it is targeted perturbation of the parameters that appear in the equations of motion.
Sensors and observation models
Robots do not observe their true state directly. Instead, they receive sensor measurements:
where is the true state, is the sensor model mapping state to expected measurement, and is sensor noise (typically modeled as zero-mean Gaussian with covariance ).
This equation is not just notation—it is the basis of the Kalman filter update step and every Bayesian estimator that follows. The key insight is that sensors give us probabilistic information about state, not state itself.
Common robot sensors
Encoders
Measure joint positions (and by differentiation, velocities). Accurate and high-rate but local—they say nothing about where the robot is in the world or what its base is doing.
IMU (Inertial Measurement Unit)
Accelerometers measure specific force (acceleration minus gravity); gyroscopes measure angular velocity. IMUs are high-rate and don't require external reference, but they drift: integrating accelerometer readings to get velocity, and integrating velocity to get position, compounds small errors over time. Over seconds to minutes, a standalone IMU estimate of base position becomes unusable. This is why IMU alone cannot maintain long-horizon state estimates and must be fused with other modalities.
Vision and depth sensing
Cameras and depth sensors provide rich environmental information and global position cues. They are lower-rate than IMUs (typically 10–30 Hz for depth, 30–60 Hz for RGB), sensitive to lighting and occlusion, and computationally expensive to process. In visual-inertial odometry (VIO), vision corrects IMU drift by providing long-range, globally consistent position estimates.
Depth sensors — RGB-D cameras such as the Intel RealSense or structured-light devices — provide per-pixel depth measurements alongside color, enabling 3D reconstruction of the environment. For locomotion RLReinforcement Learning, depth data is commonly processed into two intermediate representations. A heightmap is a 2D grid encoding the terrain elevation sampled from the depth image, projected into the robot's local frame around the feet. It is compact (typically or smaller), easy to augment with domain randomization (noise, quantization, partial occlusion), and directly relevant to foot placement planning. A point cloud retains the full 3D spatial structure of the scene, expressed as a set of 3D points , and is the natural output for grasping and manipulation tasks where object geometry matters.
Both representations are high-dimensional relative to proprioceptive signals. A heightmap has 4,096 entries — far more than the 40–60-dimensional proprioceptive vector of a typical legged robot. Conditioning a policy directly on raw heightmaps causes a dramatic increase in sample complexity. The standard approach is to learn a latent encoder that compresses the high-dimensional visual observation into a low-dimensional representation , where is the depth image or heightmap and with . The encoder is typically trained alongside the policy (end-to-end) or through the teacher-student framework described in the learning connection section below: the teacher uses privileged terrain information, and the student encoder is trained to recover the latent code from raw depth observations.
Each sensor is partial and noisy. No single sensor is sufficient for robust locomotion or manipulation. State estimation is the problem of combining them.
State estimation: the Bayesian perspective
State estimation is fundamentally a Bayesian inference problem.
At each timestep we maintain a belief: a probability distribution over the true state given all past measurements and controls. Formally:
The dynamics model tells us how the state evolves under control. The sensor model tells us how likely a measurement is given a state. Bayes' rule combines them.
The Kalman filter is the exact solution to this problem under linear-Gaussian assumptions. EKF, UKF, and particle filters are progressively more flexible approximations when those assumptions are violated.
Kalman filter (linear systems)
For linear dynamics and observations with Gaussian noise:
the Kalman filter is the optimal (minimum mean-squared error) estimator.
Prediction step
Using the dynamics model alone, propagate the current estimate forward:
where is the error covariance matrix—a measure of our current uncertainty about the state. High (process noise) means the dynamics model is unreliable; the prediction step increases uncertainty.
Update step
When a measurement arrives, compute the Kalman gain:
is a weighting matrix that balances trust in the model prediction vs. trust in the measurement. When is small (accurate sensors), is large and the measurement dominates. When is large (noisy sensors), is small and the prediction dominates.
Then update the estimate:
The term is the innovation—the discrepancy between what we measured and what the model predicted we would measure. The Kalman filter is an optimal way to use that discrepancy to correct the state estimate.
Python Implementation: 1D Kalman Filter
The following code implements a simple 1D Kalman filter to track a robot's position based on a velocity model and noisy position measurements.
import numpy as np
class KalmanFilter1D:
def __init__(self, x0, p0, q, r):
self.x = x0 # Initial state estimate
self.p = p0 # Initial error covariance
self.q = q # Process noise covariance
self.r = r # Measurement noise covariance
def predict(self, u, dt=0.1):
"""
Prediction step: x = A*x + B*u
A = 1 (constant position model), B = dt
"""
# Predicted state estimate
self.x = self.x + u * dt
# Predicted error covariance
self.p = self.p + self.q
def update(self, z):
"""
Update step: Correct estimate with measurement z
H = 1 (direct position measurement)
"""
# Innovation (measurement residual)
innovation = z - self.x
# Kalman Gain: K = P / (P + R)
k = self.p / (self.p + self.r)
# Updated state estimate
self.x = self.x + k * innovation
# Updated error covariance
self.p = (1 - k) * self.p
return self.x, self.p
# Simulation setup
kf = KalmanFilter1D(x0=0, p0=1.0, q=0.1, r=5.0)
true_pos = 0
velocity = 1.0
dt = 0.1
print("Step | True Pos | Measurement | KF Estimate | Error")
print("-" * 50)
for step in range(10):
true_pos += velocity * dt
measurement = true_pos + np.random.normal(0, np.sqrt(5.0))
kf.predict(u=velocity, dt=dt)
est_x, est_p = kf.update(z=measurement)
print(f"{step:4d} | {true_pos:8.2f} | {measurement:11.2f} | {est_x:11.2f} | {abs(true_pos-est_x):5.2f}")
Extended Kalman Filter (EKF)
Robot dynamics and sensors are nonlinear. The EKF extends the Kalman filter by linearizing the nonlinear functions around the current estimate at each timestep:
Python Implementation: Extended Kalman Filter (Nonlinear)
The following code illustrates the EKF principle by tracking a robot moving along a curved path ().
def ekf_predict(x, P, u, dt, Q):
"""
Nonlinear prediction: x = f(x, u)
f(x, u) = [x + v*cos(theta)*dt, y + v*sin(theta)*dt, theta + omega*dt]
"""
v, omega = u
theta = x[2]
# Nonlinear transition
x_next = x + np.array([
v * np.cos(theta) * dt,
v * np.sin(theta) * dt,
omega * dt
])
# Jacobian of f with respect to x (F)
F = np.array([
[1, 0, -v * np.sin(theta) * dt],
[0, 1, v * np.cos(theta) * dt],
[0, 0, 1]
])
P_next = F @ P @ F.T + Q
return x_next, P_next
# Initial state [x, y, theta]
x_est = np.array([0.0, 0.0, 0.0])
P_est = np.eye(3)
u_input = np.array([1.0, 0.1]) # velocity, angular velocity
dt_val = 0.1
Q_noise = np.eye(3) * 0.01
x_est, P_est = ekf_predict(x_est, P_est, u_input, dt_val, Q_noise)
print(f"Predicted State: {x_est}")
These Jacobians replace and in the Kalman equations, giving a locally-linear approximation that is recomputed at every step.
Limitations
- Accuracy depends on how well the local linearization approximates the true nonlinearity.
- Can diverge when the system is highly nonlinear or the initial estimate is far from the truth.
- Covariance propagation via does not capture higher-order uncertainty.
Despite these limitations, EKFs remain a practical workhorse in robotics.
Beyond the EKF in modern legged systems
For legged robots specifically, EKFs have been augmented or partially replaced by:
- Invariant EKFs (InEKFs): exploit the Lie group structure of to improve linearization accuracy for orientation and velocity estimation. Used in MIT Cheetah and related systems.
- Factor graph estimators (e.g. GTSAM, iSAM2): model the full history of states and measurements as a probabilistic graphical model, enabling loop closure and long-horizon consistency. Standard in SLAM systems.
- Contact-aided state estimators: use contact detection (from foot force sensors or contact switches) as an additional constraint to improve base velocity estimation when visual or external references are unavailable.
Most modern locomotion deployments fuse IMU, encoders, and contact detection into a proprioceptive estimator for base velocity and orientation—often implemented as an EKF or InEKF—because GPS and visual odometry are unreliable in unstructured environments.
Sensor fusion
No single sensor is sufficient. Sensor fusion combines complementary modalities to obtain estimates that are more accurate and more robust than any individual sensor.
| Sensor | Strength | Weakness | |---|---|---| | IMU | High-rate, no external reference | Drifts over time | | Encoders | Accurate joint state | No global pose information | | Vision | Global position, loop closure | Slow, fragile to occlusion/lighting | | Contact detection | Direct contact state | Binary, noisy at transitions |
Key principle: dynamics provide short-term prediction; exteroceptive sensing (vision, lidar) provides long-term correction. This is the same structure as a predictor-corrector and mirrors the belief update in a POMDPPartially Observable Markov Decision Process.
Learning connection
Understanding dynamics and state estimation is not background material—it is directly embedded in the structure of modern robot learning systems.
Privileged information and teacher-student frameworks
In simulation, a policy can be trained with access to the true state: exact base velocity, ground reaction forces, contact states, terrain height under each foot. None of these are directly available on hardware. This gap is not an implementation detail—it is a fundamental challenge in sim-to-real transfer.
The standard solution in locomotion RLReinforcement Learning (RMA, Walk These Ways, DreamWaQ, and others) is a teacher-student framework:
- A teacher policy trains in simulation with privileged state access.
- A student policy (or adaptation module) learns to estimate the privileged information from the observable history—proprioception, IMU, encoder data—and then either distills the teacher's behavior or adapts the policy latent code at deployment.
The adaptation module is, in effect, a learned state estimator. It is doing the same job as the EKF described above—inferring hidden state from a sequence of noisy observations— but it learns the mapping from data rather than deriving it analytically from the dynamics model. Understanding what it is trying to estimate, and why, requires understanding the dynamics and observation models introduced in this lecture.
POMDPs and history-conditioned policies
When state estimation is imperfect, the robot operates in a Partially Observable Markov Decision Process (POMDPPartially Observable Markov Decision Process). The true state is hidden; only observations are available. The optimal policy in a POMDPPartially Observable Markov Decision Process is a function of the belief state —the full posterior over states given history.
Policies that stack recent observations (a fixed-length history window) or use recurrent architectures (LSTM, transformer) are implicitly approximating this belief state. The transformer memory over proprioceptive history in recent locomotion policies is performing the same computation as the Kalman smoother—integrating new observations with prior predictions—but learned from data.
The connection is direct: the Kalman filter you derived above is a special case of optimal belief state computation. Recurrent policies are a general learned approximation to the same problem.
Sim-to-real failure and estimation errors
Empirically, errors in state estimation are among the most common causes of sim-to-real failure—often more significant than dynamics mismatch. A policy trained assuming perfect base velocity estimation will behave unpredictably when the on-hardware estimator lags, drifts, or becomes inconsistent during fast maneuvers or at contact transitions. Designing policies that are robust to estimation noise—through domain randomization on observation noise, or by training with a realistic observation model—requires understanding the estimation pipeline described in this lecture.
Key takeaways
The concepts in this lecture form a chain that connects physics to perception to learning:
The Euler–Lagrange equations give the structure of robot dynamics, including properties (positive definiteness of , passivity of ) that control designers and stability theorists rely on. Newton–Euler gives us an efficient way to compute those dynamics in real time. Floating-base dynamics extends the picture to legged systems, where contact forces, friction cone constraints, and hybrid switching govern what the robot can and cannot do. Sensors provide noisy, partial windows into the true state, modeled probabilistically. The Kalman filter is the optimal estimator under linear-Gaussian assumptions, combining dynamics prediction with measurement correction in a principled way. The EKF extends this to nonlinear systems and remains the workhorse of real-robot state estimation. And all of this connects directly to learning: privileged training, teacher-student distillation, and history-conditioned policies are architectural responses to the state estimation problem described in this lecture.
Conceptual questions
-
The inertia matrix is positive definite. What would it mean physically if were singular at some configuration, and why would this break simulation?
-
A legged robot is executing a trot gait. Describe the sequence of contact mode switches during one stride. Why does each switch constitute a change in the robot's dynamical system, not just a change in its configuration?
-
A Kalman filter is running on a legged robot's IMU and encoder data. The filter consistently underestimates the robot's base velocity during fast accelerations. What could cause this, and what would you change in the filter to address it?
-
An RLReinforcement Learning locomotion policy is trained in simulation with access to the true contact state (which feet are in contact). At deployment, contact state must be estimated from foot force sensors with significant noise. Describe two different architectural strategies for handling this gap and what each one assumes about the estimation problem.
-
Why is a recurrent policy that conditions on a window of proprioceptive history performing a computation analogous to Bayesian filtering? What assumptions does the Kalman filter make that such a policy does not?
Looking ahead
With dynamics and state estimation in place, we now ask:
How do we control these systems reliably?
Week 3: Control Fundamentals
We study feedback control, stability, optimal control (LQR, MPC), and how these ideas
connect directly to reinforcement learning.
Further reading
- Featherstone, R. (2014). Rigid Body Dynamics Algorithms. Springer. (Crucial for understanding how physics simulators compute floating-base dynamics).
- Thrun, Burgard, Fox (2005). Chapter 3: Gaussian Filters (EKF, UKF).