Week 1: Robot Modeling and Kinematics

Purpose of this lecture#

Before robots can learn, plan, or reason, they must be modeled.

Robot learning algorithms — imitation learning, reinforcement learning, and foundation models — do not replace classical robotics. They sit on top of it. Without a precise description of how a robot moves, how its joints affect its body, and how its geometry constrains motion, learning-based methods become brittle, unsafe, or uninterpretable.

This lecture establishes the geometric and kinematic foundations required for all subsequent topics in robot learning. The emphasis is mathematical: we will derive the forward kinematic map using screw theory, formalize the numerical inverse kinematics update rule, define the manipulability measure and its geometric interpretation, and connect workspace analysis to reward function design. These are not peripheral topics — every learning algorithm in the subsequent thirteen weeks operates within the kinematic structure introduced here.

Configuration spaces#

The configuration of a robot specifies its complete geometric state. The configuration space (C-space) is the set of all possible configurations.

For a fixed-base manipulator with $n$ joints, the configuration is the vector of joint variables $q = (q_1, q_2, \dots, q_n)$ . Revolute joints contribute angles on the circle $\mathbb{T}$ ; prismatic joints contribute translations in $\mathbb{R}$ . The full C-space is thus a product manifold of circles and lines, not simply $\mathbb{R}^n$ — a distinction that matters when measuring distances between configurations or designing exploration strategies for RL.

For mobile robots, configuration includes the planar pose $(x, y, \theta)$ , while for legged robots the configuration is composed of a floating base — the 6 DoF pose of the root body — combined with the joint angles for each limb. C-spaces easily exceed 30–50 dimensions for modern legged systems, and the dimensionality directly determines the difficulty of exploration, planning, and policy learning.

Floating-base systems and underactuation#

A key distinction for legged and mobile robots: the base is an unactuated degree of freedom. Unlike a fixed-base manipulator where every DoF has a dedicated motor, a legged robot cannot directly command its base position or orientation. Base motion arises indirectly through contact forces at the feet.

This has immediate consequences for learning. The policy must reason about contact implicitly or explicitly. State estimation (Week 2) is necessary because base pose is not directly sensed with a rigid reference. Reward functions that penalize base motion must account for the fact that the robot cannot directly minimize it — it can only do so by learning the right contact strategy. This distinction between actuated joints and unactuated base is one of the most important conceptual boundaries between manipulation and locomotion RL.

Computational representation#

In practice, robot kinematics are specified in URDF (Unified Robot Description Format) or MJCF (MuJoCo XML) files. These formats encode the kinematic tree (parent-child joint hierarchy), joint types, limits, and axes, along with link masses, inertias, and collision geometry. When you load a robot in IsaacLab, MuJoCo, or PyBullet, you are instantiating this kinematic tree. Understanding the structure of these files makes it far easier to debug simulation errors, set up reward functions, and implement custom controllers.

Forward kinematics#

Forward kinematics (FK) maps joint configurations to end-effector pose: $f(q) \rightarrow T_{ee} \in SE(3)$ . Given $q$ , there is exactly one end-effector pose — FK is always well-defined and computationally inexpensive.

The classical formalization of FK uses Denavit-Hartenberg (DH) parameters, which assign four parameters per joint to describe the relative transform between adjacent link frames. While DH conventions are widely used in industry, they have a fundamental limitation: the frame assignment is not unique and the parameters are sensitive to singularities in the convention itself. The modern alternative, based on the Product of Exponentials formula from screw theory, is geometrically cleaner and integrates naturally with the $SE(3)$ structure needed for learning-based methods.

Product of Exponentials formula#

In the screw-theoretic approach, each joint axis is represented as a screw axis $\mathcal{S}_i \in \mathbb{R}^6$ , defined in the fixed base frame when the robot is at its zero configuration $M$ . The screw axis captures both the direction of joint motion and the coupling between translation and rotation:

\mathcal{S}_i = \begin{pmatrix} \omega_i \\ v_i \end{pmatrix}

where $\omega_i \in \mathbb{R}^3$ is the unit rotation axis of joint $i$ and $v_i = -\omega_i \times r_i$ for revolute joints (with $r_i$ a point on the axis) or $v_i = \hat{d}_i$ for prismatic joints (with $\hat{d}_i$ the direction of translation). The $4\times 4$ matrix exponential of the screw axis scaled by the joint variable $q_i$ yields a rigid body transform:

e^{[\mathcal{S}_i]q_i} = \begin{bmatrix} e^{[\omega_i]q_i} & (I - e^{[\omega_i]q_i})(\omega_i \times v_i) + \omega_i \omega_i^T v_i q_i \\ 0 & 1 \end{bmatrix}

where $[\omega_i]$ is the $3\times 3$ skew-symmetric matrix corresponding to $\omega_i$ , and $e^{[\omega_i]q_i}$ is computed via Rodrigues' rotation formula. The full forward kinematics map is then the Product of Exponentials (PoE):

T_{sb}(q) = e^{[\mathcal{S}_1]q_1}\, e^{[\mathcal{S}_2]q_2} \cdots e^{[\mathcal{S}_n]q_n}\, M

where $M \in SE(3)$ is the end-effector pose at the zero configuration. Each factor $e^{[\mathcal{S}_i]q_i}$ is itself an element of $SE(3)$ , and their composition is also in $SE(3)$ — the PoE formula respects the group structure exactly, with no approximation or singularity in the parameterization.

The PoE formula has several advantages over DH conventions that matter for robot learning. The screw axes $\mathcal{S}_i$ are defined in a single global frame, making them easy to specify geometrically without a DH frame-assignment procedure. The formula differentiates cleanly: the Jacobian columns are the transformed screw axes, which we will derive below. And the exponential map connects directly to Lie algebra operations that appear throughout equivariant learning architectures and geometric trajectory optimization.

Key property#

Forward kinematics is deterministic, efficient, and always well-defined. It serves as a differentiable function through which gradients from task-space losses can propagate to joint-space parameters — an essential property for trajectory optimization, operational-space control, and learning architectures that reason in end-effector coordinates.

Inverse kinematics#

Inverse kinematics (IK) solves the opposite problem: given a desired end-effector pose $T^{\text{des}} \in SE(3)$ , find joint angles $q$ such that $f(q) = T^{\text{des}}$ . This problem is generically harder than FK and has no closed-form solution for most robot geometries.

The challenges#

The IK problem exhibits three structural difficulties that have direct consequences for how learning systems handle it. First, there may be multiple solutions: a redundant robot (with more than 6 DoF) has infinitely many joint configurations that achieve any reachable end-effector pose, and even a non-redundant robot typically has a discrete set of solutions (elbow up vs. elbow down, wrist flip). Second, there may be no solution: not all points in task space are reachable from all configurations, and even within the reachable workspace, not all orientations are achievable at all positions. Third, solutions may be numerically ill-conditioned near singular configurations, where small changes in the desired pose require large changes in joint angles.

Numerical IK: the iterative update rule#

The standard approach for general robot geometries is iterative numerical IK. Starting from an initial configuration $q_0$ , the algorithm computes a small joint-space step that reduces the task-space error at each iteration. Let the task-space error be $\Delta x \in \mathbb{R}^6$ , encoding both position and orientation error between the current end-effector pose and the desired pose. The orientation error is computed as the logarithm of the relative rotation $\log(R_{\text{des}}^T R_{\text{current}}) \in \mathfrak{so}(3)$ , which gives an axis-angle representation free from gimbal lock.

The linearized relationship between joint velocity and task-space velocity is $\dot{x} = J(q)\dot{q}$ . Treating the error as a desired velocity and inverting gives the pseudoinverse IK update:

\Delta q = J^+(q)\,\Delta x

where $J^+(q) = J^T(q)(J(q)J^T(q))^{-1}$ is the Moore-Penrose pseudoinverse of the Jacobian. This update is the minimum-norm joint displacement that achieves the desired task-space step, and iterating it from $q_0$ converges to a solution (if one exists nearby) for well-conditioned configurations.

Damped Least Squares (Levenberg-Marquardt)#

Near singular configurations, $J(q)J^T(q)$ becomes ill-conditioned and its inverse amplifies errors catastrophically — small task-space errors produce enormous joint-space steps. The Damped Least Squares (DLS) method, also known as the Levenberg-Marquardt regularization, addresses this by adding a damping term $\lambda^2 I$ to the matrix being inverted:

\Delta q = J^T(q)\bigl(J(q)J^T(q) + \lambda^2 I\bigr)^{-1}\,\Delta x

The scalar $\lambda > 0$ is the damping coefficient. When the Jacobian is full rank and the configuration is well-conditioned, the damping term is negligible and the update approximates the pseudoinverse update. As the robot approaches a singularity and $\det(JJ^T) \to 0$ , the damping term dominates and bounds the joint velocities at the cost of allowing some task-space error. This graceful degradation near singularities is the key property: DLS never generates unbounded joint commands, making it safe for hardware deployment.

The choice of $\lambda$ trades off between IK accuracy (small $\lambda$ , more susceptible to singularities) and smoothness (large $\lambda$ , always safe but less accurate). Adaptive schemes set $\lambda$ as a function of the smallest singular value of $J$ , providing singularity-dependent damping that is tight away from singularities and conservative near them.

Learning connection#

The relationship between IK and learned policies is determined by the action space. Task-space policies — such as diffusion policy and ACT, which output desired end-effector poses — must solve IK as a post-processing step, either via analytical IK (for specific geometries), numerical DLS IK, or by learning an implicit IK through end-to-end training. Joint-space policies — which dominate locomotion RL and are common in torque-controlled manipulation — output joint positions, velocities, or torques directly, bypassing IK entirely. Understanding which regime the policy operates in determines what geometric structure it must learn and where IK-induced errors can enter the pipeline.

Rigid body transforms and SE(3)#

Robot motion occurs in 3D rigid body space. A rigid body pose is an element of the Special Euclidean group:

SE(3) = \left\{ \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} \;\middle|\; R \in SO(3), \; t \in \mathbb{R}^3 \right\}

where $SO(3)$ is the group of 3D rotations and $t \in \mathbb{R}^3$ is the translation.

Why naive Euclidean loss on rotation is wrong#

Rotation is not a vector space. Common rotation representations each carry failure modes when treated as flat Euclidean quantities. Euler angles suffer from gimbal lock — at certain configurations, two rotational axes align and a degree of freedom is lost. Quaternions have the double cover property: $q$ and $-q$ represent the same rotation. A loss function $\|q_1 - q_2\|^2$ computed naively treats them as maximally different when they are in fact identical — the antipodal problem. Rotation matrices are overconstrained ( $9$ parameters for $3$ DoF); unconstrained optimization produces outputs that are not valid rotations.

The geometrically correct notion of distance between rotations is the geodesic distance on $SO(3)$ :

d(R_1, R_2) = \left\| \log(R_1^T R_2) \right\|_F

where $\log$ is the matrix logarithm mapping back to the Lie algebra $\mathfrak{so}(3)$ . When supervising orientation targets, geodesic or axis-angle losses should be used rather than MSE on raw quaternion components. When averaging poses (e.g., in ensemble policies or diffusion rollouts), averaging in the tangent space rather than the ambient representation space produces geometrically consistent results.

Several modern architectures exploit $SE(3)$ structure directly. Equivariant networks (SE(3)-Transformers, ESCNN) build in symmetry so that rotating the scene rotates the output predictably, improving sample efficiency. Six-DoF diffusion policies operate in $SE(3)$ and require proper geodesic noise schedules. Foundation models (ACT, $\pi_0$ , GR00T) embed geometric inductive biases either explicitly in their action representations or implicitly through large-scale training on pose-annotated data.

Jacobians and differential kinematics#

The Jacobian relates joint velocities to end-effector velocities:

\dot{x} = J(q)\,\dot{q}

where $\dot{x} \in \mathbb{R}^6$ is the task-space velocity (3 linear + 3 angular) and $\dot{q} \in \mathbb{R}^n$ is the vector of joint velocities. In the PoE formulation, the $i$ -th column of the body Jacobian is the screw axis $\mathcal{S}_i$ transformed to the current configuration by the partial forward kinematics up to joint $i$ , making the Jacobian a natural byproduct of the FK computation.

Building intuition: planar 2-DoF example#

Consider a 2-link planar arm with link lengths $l_1, l_2$ and joint angles $q_1, q_2$ . The end-effector position is:

x = l_1 \cos q_1 + l_2 \cos(q_1 + q_2), \quad y = l_1 \sin q_1 + l_2 \sin(q_1 + q_2)

Differentiating:

J(q) = \frac{\partial (x,y)}{\partial (q_1, q_2)} = \begin{bmatrix} -l_1 \sin q_1 - l_2 \sin(q_1+q_2) & -l_2 \sin(q_1+q_2) \\ l_1 \cos q_1 + l_2 \cos(q_1+q_2) & l_2 \cos(q_1+q_2) \end{bmatrix}

Python Implementation: 2-DOF Forward Kinematics#

The following code implements the forward kinematics and Jacobian calculation for the 2-link planar arm described above.

python · runs in browser

import numpy as np

def compute_forward_kinematics(q, l1=1.0, l2=1.0):
    """
    Computes the end-effector (x, y) position for a 2-DOF planar arm.
    
    Args:
        q: array-like of length 2 [q1, q2] in radians
        l1, l2: lengths of the first and second links
    """
    q1, q2 = q
    
    # x = l1*cos(q1) + l2*cos(q1 + q2)
    x = l1 * np.cos(q1) + l2 * np.cos(q1 + q2)
    
    # y = l1*sin(q1) + l2*sin(q1 + q2)
    y = l1 * np.sin(q1) + l2 * np.sin(q1 + q2)
    
    return np.array([x, y])

def compute_jacobian(q, l1=1.0, l2=1.0):
    """
    Computes the 2x2 geometric Jacobian for a 2-DOF planar arm.
    """
    q1, q2 = q
    
    # Define partial derivatives for J
    j11 = -l1 * np.sin(q1) - l2 * np.sin(q1 + q2)
    j12 = -l2 * np.sin(q1 + q2)
    j21 = l1 * np.cos(q1) + l2 * np.cos(q1 + q2)
    j22 = l2 * np.cos(q1 + q2)
    
    return np.array([[j11, j12], 
                     [j21, j22]])

# Example: Arm at [45 deg, 45 deg]
q_test = np.array([np.pi/4, np.pi/4])
pos = compute_forward_kinematics(q_test)
J = compute_jacobian(q_test)

print(f"End-effector position: {pos}")
print(f"Jacobian determinant: {np.linalg.det(J):.4f}")

Python Implementation: Numerical Inverse Kinematics#

While forward kinematics is analytical, inverse kinematics often requires numerical methods for complex chains. Here is a simple Jacobian-based IK solver.

python · runs in browser

def compute_inverse_kinematics(target_pos, q_initial, max_steps=100, tol=1e-3):
    """
    Solves for q that produces target_pos using Newton's method.
    """
    q = q_initial.copy()

    for i in range(max_steps):
        current_pos = compute_forward_kinematics(q)
        error = target_pos - current_pos

        if np.linalg.norm(error) < tol:
            print(f"IK converged in {i} steps")
            return q

        # q_next = q + J_inv * error
        J = compute_jacobian(q)
        # Use pseudo-inverse for robustness near singularities
        q = q + np.linalg.pinv(J) @ error

    return q

# Test IK
target = np.array([1.5, 0.5])
q_sol = compute_inverse_kinematics(target, np.array([0.1, 0.1]))
print(f"Solved joint angles: {q_sol}")
print(f"Verification pos: {compute_forward_kinematics(q_sol)}")

This matrix tells you how each joint velocity contributes to end-effector velocity. When $\det(J) = 0$ , the robot is at a singularity — one direction of task-space motion becomes uncontrollable.

Geometric vs. analytic Jacobian#

The geometric Jacobian $J_g$ relates joint velocities to the physical linear and angular velocity of the end-effector, with angular velocity expressed as $\omega \in \mathbb{R}^3$ directly. The analytic Jacobian $J_a$ relates joint velocities to the time derivative of a chosen rotation parametrization (e.g., $\dot{\phi}$ for Euler angles). It depends on the representation and is not defined at parametrization singularities. Mixing rotation representations and Jacobians — computing a control law with the analytic Jacobian but measuring error in quaternion space — introduces errors. The geometric Jacobian should be used unless there is a specific reason for the analytic form.

Uses#

The Jacobian appears throughout the robot learning stack: in velocity-level control and impedance control; in force control via the transpose $\tau = J^T F$ ; in sensitivity analysis and workspace characterization; and in hybrid learning-plus-control pipelines using operational-space controllers. Jacobian-based control is often embedded as a differentiable layer inside learned pipelines. Operational-space controllers use $J$ to map policy outputs from task space to joint torques. Impedance control wraps a stiffness law around $J^T$ , providing compliant behavior that RL policies can learn to exploit. Gradient-based trajectory optimization differentiates through $J(q)$ to propagate task-space costs into joint space.

Singularities and manipulability#

A singularity occurs when the Jacobian loses rank — $\text{rank}(J(q)) < \min(m, n)$ . At singular configurations, one or more directions of task-space motion become uncontrollable. Achieving motion in those directions requires theoretically infinite joint velocities, and numerically the pseudoinverse amplifies errors catastrophically, causing large commanded torques and potential hardware damage.

A concrete hardware failure mode#

Consider a 6-DoF arm approaching a wrist singularity: the last three joint axes become coplanar, collapsing the rank of the wrist Jacobian. A policy executing a trajectory that passes through this configuration generates a torque spike. In simulation this may be suppressed by joint velocity limits; on hardware it triggers emergency stops or damages actuators. This is a realistic and well-documented failure mode in sim-to-real transfer, and it arises precisely because the policy was not trained with awareness of Jacobian conditioning.

Manipulability ellipsoids#

The concept of manipulability, introduced by Yoshikawa (1985), provides a scalar measure of how far a configuration is from singularity and how well the robot can generate motion and force in different directions. The manipulability measure is defined as:

w(q) = \sqrt{\det\bigl(J(q)J(q)^T\bigr)}

This scalar is zero at singular configurations (where $\det(JJ^T) = 0$ ) and strictly positive elsewhere. Larger values of $w(q)$ indicate that the robot has greater ability to generate end-effector velocities in all directions simultaneously.

The manipulability ellipsoid gives a geometric picture of directional capability. It is the image of the unit ball in joint-velocity space under the map $\dot{q} \mapsto \dot{x} = J(q)\dot{q}$ . Formally, the set of achievable task-space velocities for unit joint speed is:

\mathcal{E}(q) = \{\dot{x} : \dot{x}^T (JJ^T)^{-1} \dot{x} \leq 1\}

This is an ellipsoid in task space whose semi-axes are the singular values $\sigma_1, \sigma_2, \ldots, \sigma_m$ of $J(q)$ , oriented along the corresponding left singular vectors. The longest axis of the ellipsoid points in the direction of maximum end-effector velocity per unit joint speed — the most dexterous direction. The shortest axis points toward the near-singular direction where even large joint velocities produce little end-effector motion.

Physically, this means a robot at a stretched-out configuration (near a boundary of the workspace) has a very elongated manipulability ellipsoid: it can move easily along one direction but has poor speed and force transmission in others. A robot near the center of its workspace typically has a more spherical ellipsoid, indicating uniform capability across directions. The same ellipsoid (inverted, via the force-velocity duality $\tau = J^T F$ ) characterizes the force ellipsoid — directions in which the robot can generate contact forces most efficiently.

Implications for learning#

Manipulability has direct implications for reward design and policy evaluation. A policy that routinely passes through low-manipulability configurations will generate large torque commands that stress actuators and cause sim-to-real failures. Dataset design for imitation learning should filter demonstrations that approach singular configurations, or add corrective perturbations that teach recovery. In RL, the manipulability measure $w(q)$ can be incorporated as an auxiliary reward term — encouraging the policy to maintain well-conditioned configurations — or as a constraint in a CBF-QP safety filter. Empirically, adding even a small manipulability penalty to the reward function significantly reduces the frequency of near-singular trajectories and improves hardware deployment success rates.

Workspace analysis#

The workspace of a robot is the set of end-effector poses that are reachable by some joint configuration. Understanding workspace geometry is not merely a static fact about the robot's hardware — it directly determines what tasks are feasible, how reward functions must be designed, and how curriculum learning should be structured. These connections are often underappreciated by practitioners who focus on algorithmic choices while treating workspace constraints as a background assumption.

The reachable workspace consists of all poses $T \in SE(3)$ for which there exists at least one $q \in \mathcal{Q}_{\text{valid}}$ (respecting joint limits) such that $f(q) = T$ . The dexterous workspace is the more restrictive subset of poses achievable across all end-effector orientations — it is typically a much smaller region concentrated near the robot's kinematic center. For manipulation tasks where orientation matters (grasping, insertion, surface-following), the dexterous workspace is the operationally relevant constraint, not the larger reachable workspace.

Workspace analysis shapes reward function design in non-obvious ways. A naive dense reward that penalizes the distance from the end-effector to a goal position will push the policy to command configurations that move toward the goal — but if the goal is outside the reachable workspace, the policy will drive joint configurations toward their limits and potentially into singularities in pursuit of an impossible target. The correct practice is to verify that all sampled goals during training lie within the reachable workspace, or to include a workspace penalty that discourages the policy from commanding end-effector targets that require configurations near joint limits or singularities. Formally, this can be implemented as an auxiliary reward term $-\mathbb{1}[f(q) \notin \mathcal{W}_{\text{reachable}}]$ or as a distance penalty from the workspace boundary.

Curriculum learning must also respect workspace geometry. As task difficulty increases during training — for instance, by sampling goal positions from progressively larger regions of space — the curriculum schedule must ensure that the sampled goals remain within the reachable workspace. Goals progressively closer to the workspace boundary increase both geometric difficulty (the robot must approach configurations with lower manipulability) and numerical difficulty (IK solutions near the boundary are sensitive to small perturbations). A well-designed curriculum tracks manipulability across the training distribution and avoids the regime where frequent near-singular configurations begin damaging hardware.

The task-specific workspace — the subset of the reachable workspace that is actually required by the deployment task — is an important practical abstraction. For a table-top pick-and-place task, the task-specific workspace is a small region above the table surface. Training a policy that explores only this region (rather than the full reachable workspace) converges faster, produces more precise policies, and avoids the waste of training capacity on configurations that will never be needed at deployment.

Why kinematics still matters in the age of learning#

Modern robot learning systems still execute actions in joint space or task space, still collide with physical joint limits and singular configurations, and still fail when geometric structure is ignored in loss functions or reward design. Even the most capable foundation models — ACT, $\pi_0$ , GR00T — either rely on kinematic structure for supervision, embed geometric inductive biases in their architectures, or implicitly learn them from massive pose-annotated datasets. In each case, the underlying geometry does not disappear. It is either handled correctly or it becomes a source of failure.

Learning does not replace kinematics. It builds on it.

Key takeaways#

The concepts introduced this week form a dependency chain, not a flat list of independent facts. C-space defines the space in which the robot lives. Forward kinematics — formalized most cleanly through the Product of Exponentials formula — maps points in that space to geometry in the world, using screw axis representations that respect the Lie group structure of $SE(3)$ . The Jacobian linearizes FK locally, giving a first-order picture of how joint motion produces task-space motion and providing the core operator for numerical IK via the pseudoinverse update $\Delta q = J^+(q)\,\Delta x$ . Damped Least Squares IK regularizes this update near singularities by adding $\lambda^2 I$ to the inversion, bounding joint commands at the cost of some task-space accuracy. Singularities are the configurations where the Jacobian loses rank; the manipulability measure $w(q) = \sqrt{\det(JJ^T)}$ quantifies proximity to singularity, and the manipulability ellipsoid characterizes directional force and velocity transmission geometrically. Workspace analysis characterizes globally where the robot can and cannot reach, and these constraints propagate directly into reward function design, curriculum scheduling, and action-space parameterization. Throughout, the correct mathematical home for rigid body poses is $SE(3)$ — a structure that must be respected in any learning system that reasons about robot geometry.

Conceptual questions#

A locomotion policy is trained entirely in joint space and never explicitly reasons about the robot's base pose. In what sense does $SE(3)$ geometry still matter for this policy? How does the fact that the base is unactuated appear in the RL problem formulation?
You are training a manipulation policy using behavioral cloning. Your action space is end-effector pose represented as a quaternion plus translation. You use MSE loss on the raw quaternion components. What can go wrong because of the antipodal problem, and how would you fix it using a geodesic loss?
A 7-DoF robot arm has more joints than the 6 DoF needed to specify an end-effector pose. How does this redundancy appear in the Jacobian — specifically, what is $\text{rank}(J)$ and what is the dimension of its null space? How might a policy exploit the null space to satisfy secondary objectives (e.g., joint limit avoidance or manipulability maximization) while achieving the primary task?
The manipulability measure $w(q) = \sqrt{\det(J(q)J(q)^T)}$ is incorporated as an auxiliary reward term during RL training with a weight of $\alpha = 0.1$ . Describe the geometric effect on the policy's preferred configurations. What failure mode could arise if $\alpha$ is set too high relative to the task reward? How does the manipulability ellipsoid of the learned policy differ from a policy trained with $\alpha = 0$ ?
A task-space policy outputs desired end-effector poses at 10 Hz, which are tracked by DLS numerical IK. The robot approaches a wrist singularity ( $\sigma_{\min}(J) < 0.05$ ) mid-trajectory. With $\lambda = 0$ (pure pseudoinverse), describe the failure mode in terms of joint velocities. With $\lambda = 0.1$ (damped), describe how the behavior changes and what task-space error is introduced. At what point would you recommend the safety filter (Week 12) should engage, and what fallback behavior would it trigger?

Solutions

Even a joint-space policy depends on $SE(3)$ because the base is a floating, unactuated body whose pose evolves only through contact forces. In the RL formulation the base pose appears in the state (it is needed for balance) but never in the action space; the unactuated base is an underactuation constraint, so the policy must steer base motion indirectly through contacts.
Quaternions double-cover $SO(3)$ ( $q$ and $-q$ are the same rotation), so MSE on the raw components penalizes identical rotations differently and gives discontinuous targets. Use a geodesic loss invariant to the sign ambiguity, e.g. $1 - |\langle q_{\text{pred}}, q_{\text{target}}\rangle|$ or the rotation angle $2\arccos|\langle q_{\text{pred}}, q_{\text{target}}\rangle|$ .
$J$ is $6\times 7$ , so generically $\operatorname{rank}(J)=6$ and the null space has dimension $7-6=1$ . The policy can add null-space motion $\dot q = (I - J^{+}J)\dot q_0$ that changes the configuration without moving the end-effector, spending that one degree of freedom on secondary objectives like joint-limit avoidance or manipulability maximization.
Adding $\alpha\, w(q)$ biases the policy toward well-conditioned configurations away from singularities (a rounder manipulability ellipsoid). If $\alpha$ is too large the policy sacrifices task reward to chase manipulability — avoiding necessary near-singular poses or contorting just to inflate $w$ — and fails the task. With $\alpha=0$ the ellipsoid is shaped only by the task and can be far more anisotropic.
With $\lambda = 0$ , as $\sigma_{\min}(J)\to 0$ the pseudoinverse blows up, producing huge, unsafe joint velocities along the singular direction. With $\lambda = 0.1$ (damped least squares) velocities stay bounded, but a task-space tracking error is introduced in that direction. The Week-12 safety filter should engage once $\sigma_{\min}$ drops below threshold (≈0.05) or velocity limits are hit, triggering a fallback that slows/halts the trajectory or reprojects the target off the singular direction.

Looking ahead#

With kinematics in place, we now add physics and uncertainty.

Week 2: Dynamics and State Estimation. We study robot dynamics through the Euler-Lagrange and Newton-Euler formalisms, floating-base dynamics for legged systems, and how robots estimate their state from noisy sensors through Kalman filtering and its nonlinear extensions — all critical for control and learning in the real world.

Purpose of this lecture#

Before robots can learn, plan, or reason, they must be modeled.

Configuration spaces#

The configuration of a robot specifies its complete geometric state. The configuration space (C-space) is the set of all possible configurations.

Floating-base systems and underactuation#

Computational representation#

Forward kinematics#

Product of Exponentials formula#

\mathcal{S}_i = \begin{pmatrix} \omega_i \\ v_i \end{pmatrix}

e^{[\mathcal{S}_i]q_i} = \begin{bmatrix} e^{[\omega_i]q_i} & (I - e^{[\omega_i]q_i})(\omega_i \times v_i) + \omega_i \omega_i^T v_i q_i \\ 0 & 1 \end{bmatrix}

T_{sb}(q) = e^{[\mathcal{S}_1]q_1}\, e^{[\mathcal{S}_2]q_2} \cdots e^{[\mathcal{S}_n]q_n}\, M

Key property#

Inverse kinematics#

The challenges#

Numerical IK: the iterative update rule#

The linearized relationship between joint velocity and task-space velocity is $\dot{x} = J(q)\dot{q}$ . Treating the error as a desired velocity and inverting gives the pseudoinverse IK update:

\Delta q = J^+(q)\,\Delta x

Damped Least Squares (Levenberg-Marquardt)#

\Delta q = J^T(q)\bigl(J(q)J^T(q) + \lambda^2 I\bigr)^{-1}\,\Delta x

Learning connection#

Rigid body transforms and SE(3)#

Robot motion occurs in 3D rigid body space. A rigid body pose is an element of the Special Euclidean group:

SE(3) = \left\{ \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} \;\middle|\; R \in SO(3), \; t \in \mathbb{R}^3 \right\}

where $SO(3)$ is the group of 3D rotations and $t \in \mathbb{R}^3$ is the translation.

Why naive Euclidean loss on rotation is wrong#

The geometrically correct notion of distance between rotations is the geodesic distance on $SO(3)$ :

d(R_1, R_2) = \left\| \log(R_1^T R_2) \right\|_F

Jacobians and differential kinematics#

The Jacobian relates joint velocities to end-effector velocities:

\dot{x} = J(q)\,\dot{q}

Building intuition: planar 2-DoF example#

Consider a 2-link planar arm with link lengths $l_1, l_2$ and joint angles $q_1, q_2$ . The end-effector position is:

x = l_1 \cos q_1 + l_2 \cos(q_1 + q_2), \quad y = l_1 \sin q_1 + l_2 \sin(q_1 + q_2)

Differentiating:

J(q) = \frac{\partial (x,y)}{\partial (q_1, q_2)} = \begin{bmatrix} -l_1 \sin q_1 - l_2 \sin(q_1+q_2) & -l_2 \sin(q_1+q_2) \\ l_1 \cos q_1 + l_2 \cos(q_1+q_2) & l_2 \cos(q_1+q_2) \end{bmatrix}

Python Implementation: 2-DOF Forward Kinematics#

The following code implements the forward kinematics and Jacobian calculation for the 2-link planar arm described above.

python · runs in browser

import numpy as np

def compute_forward_kinematics(q, l1=1.0, l2=1.0):
    """
    Computes the end-effector (x, y) position for a 2-DOF planar arm.
    
    Args:
        q: array-like of length 2 [q1, q2] in radians
        l1, l2: lengths of the first and second links
    """
    q1, q2 = q
    
    # x = l1*cos(q1) + l2*cos(q1 + q2)
    x = l1 * np.cos(q1) + l2 * np.cos(q1 + q2)
    
    # y = l1*sin(q1) + l2*sin(q1 + q2)
    y = l1 * np.sin(q1) + l2 * np.sin(q1 + q2)
    
    return np.array([x, y])

def compute_jacobian(q, l1=1.0, l2=1.0):
    """
    Computes the 2x2 geometric Jacobian for a 2-DOF planar arm.
    """
    q1, q2 = q
    
    # Define partial derivatives for J
    j11 = -l1 * np.sin(q1) - l2 * np.sin(q1 + q2)
    j12 = -l2 * np.sin(q1 + q2)
    j21 = l1 * np.cos(q1) + l2 * np.cos(q1 + q2)
    j22 = l2 * np.cos(q1 + q2)
    
    return np.array([[j11, j12], 
                     [j21, j22]])

# Example: Arm at [45 deg, 45 deg]
q_test = np.array([np.pi/4, np.pi/4])
pos = compute_forward_kinematics(q_test)
J = compute_jacobian(q_test)

print(f"End-effector position: {pos}")
print(f"Jacobian determinant: {np.linalg.det(J):.4f}")

Python Implementation: Numerical Inverse Kinematics#

While forward kinematics is analytical, inverse kinematics often requires numerical methods for complex chains. Here is a simple Jacobian-based IK solver.

python · runs in browser

def compute_inverse_kinematics(target_pos, q_initial, max_steps=100, tol=1e-3):
    """
    Solves for q that produces target_pos using Newton's method.
    """
    q = q_initial.copy()

    for i in range(max_steps):
        current_pos = compute_forward_kinematics(q)
        error = target_pos - current_pos

        if np.linalg.norm(error) < tol:
            print(f"IK converged in {i} steps")
            return q

        # q_next = q + J_inv * error
        J = compute_jacobian(q)
        # Use pseudo-inverse for robustness near singularities
        q = q + np.linalg.pinv(J) @ error

    return q

# Test IK
target = np.array([1.5, 0.5])
q_sol = compute_inverse_kinematics(target, np.array([0.1, 0.1]))
print(f"Solved joint angles: {q_sol}")
print(f"Verification pos: {compute_forward_kinematics(q_sol)}")

This matrix tells you how each joint velocity contributes to end-effector velocity. When $\det(J) = 0$ , the robot is at a singularity — one direction of task-space motion becomes uncontrollable.

Geometric vs. analytic Jacobian#

Uses#

Singularities and manipulability#

A concrete hardware failure mode#

Manipulability ellipsoids#

w(q) = \sqrt{\det\bigl(J(q)J(q)^T\bigr)}

\mathcal{E}(q) = \{\dot{x} : \dot{x}^T (JJ^T)^{-1} \dot{x} \leq 1\}

Implications for learning#

Workspace analysis#

Why kinematics still matters in the age of learning#

Learning does not replace kinematics. It builds on it.

Key takeaways#

Conceptual questions#

A locomotion policy is trained entirely in joint space and never explicitly reasons about the robot's base pose. In what sense does $SE(3)$ geometry still matter for this policy? How does the fact that the base is unactuated appear in the RL problem formulation?
You are training a manipulation policy using behavioral cloning. Your action space is end-effector pose represented as a quaternion plus translation. You use MSE loss on the raw quaternion components. What can go wrong because of the antipodal problem, and how would you fix it using a geodesic loss?
A 7-DoF robot arm has more joints than the 6 DoF needed to specify an end-effector pose. How does this redundancy appear in the Jacobian — specifically, what is $\text{rank}(J)$ and what is the dimension of its null space? How might a policy exploit the null space to satisfy secondary objectives (e.g., joint limit avoidance or manipulability maximization) while achieving the primary task?
The manipulability measure $w(q) = \sqrt{\det(J(q)J(q)^T)}$ is incorporated as an auxiliary reward term during RL training with a weight of $\alpha = 0.1$ . Describe the geometric effect on the policy's preferred configurations. What failure mode could arise if $\alpha$ is set too high relative to the task reward? How does the manipulability ellipsoid of the learned policy differ from a policy trained with $\alpha = 0$ ?
A task-space policy outputs desired end-effector poses at 10 Hz, which are tracked by DLS numerical IK. The robot approaches a wrist singularity ( $\sigma_{\min}(J) < 0.05$ ) mid-trajectory. With $\lambda = 0$ (pure pseudoinverse), describe the failure mode in terms of joint velocities. With $\lambda = 0.1$ (damped), describe how the behavior changes and what task-space error is introduced. At what point would you recommend the safety filter (Week 12) should engage, and what fallback behavior would it trigger?

Solutions

Even a joint-space policy depends on $SE(3)$ because the base is a floating, unactuated body whose pose evolves only through contact forces. In the RL formulation the base pose appears in the state (it is needed for balance) but never in the action space; the unactuated base is an underactuation constraint, so the policy must steer base motion indirectly through contacts.
Quaternions double-cover $SO(3)$ ( $q$ and $-q$ are the same rotation), so MSE on the raw components penalizes identical rotations differently and gives discontinuous targets. Use a geodesic loss invariant to the sign ambiguity, e.g. $1 - |\langle q_{\text{pred}}, q_{\text{target}}\rangle|$ or the rotation angle $2\arccos|\langle q_{\text{pred}}, q_{\text{target}}\rangle|$ .
$J$ is $6\times 7$ , so generically $\operatorname{rank}(J)=6$ and the null space has dimension $7-6=1$ . The policy can add null-space motion $\dot q = (I - J^{+}J)\dot q_0$ that changes the configuration without moving the end-effector, spending that one degree of freedom on secondary objectives like joint-limit avoidance or manipulability maximization.
Adding $\alpha\, w(q)$ biases the policy toward well-conditioned configurations away from singularities (a rounder manipulability ellipsoid). If $\alpha$ is too large the policy sacrifices task reward to chase manipulability — avoiding necessary near-singular poses or contorting just to inflate $w$ — and fails the task. With $\alpha=0$ the ellipsoid is shaped only by the task and can be far more anisotropic.
With $\lambda = 0$ , as $\sigma_{\min}(J)\to 0$ the pseudoinverse blows up, producing huge, unsafe joint velocities along the singular direction. With $\lambda = 0.1$ (damped least squares) velocities stay bounded, but a task-space tracking error is introduced in that direction. The Week-12 safety filter should engage once $\sigma_{\min}$ drops below threshold (≈0.05) or velocity limits are hit, triggering a fallback that slows/halts the trajectory or reprojects the target off the singular direction.

Looking ahead#

With kinematics in place, we now add physics and uncertainty.

Purpose of this lecture#

Configuration spaces#

Floating-base systems and underactuation#

Computational representation#

Forward kinematics#

Product of Exponentials formula#

Key property#

Inverse kinematics#

The challenges#

Numerical IK: the iterative update rule#

Damped Least Squares (Levenberg-Marquardt)#

Learning connection#

Rigid body transforms and SE(3)#

Why naive Euclidean loss on rotation is wrong#

Jacobians and differential kinematics#

Building intuition: planar 2-DoF example#

Python Implementation: 2-DOF Forward Kinematics#

Python Implementation: Numerical Inverse Kinematics#

Geometric vs. analytic Jacobian#

Uses#

Singularities and manipulability#

A concrete hardware failure mode#

Manipulability ellipsoids#

Implications for learning#

Workspace analysis#

Why kinematics still matters in the age of learning#

Key takeaways#

Conceptual questions#

Looking ahead#

Further reading#

Week 1: Robot Modeling and Kinematics

Purpose of this lecture#

Configuration spaces#

Floating-base systems and underactuation#

Computational representation#

Forward kinematics#

Product of Exponentials formula#

Key property#

Inverse kinematics#

The challenges#

Numerical IK: the iterative update rule#

Damped Least Squares (Levenberg-Marquardt)#

Learning connection#

Rigid body transforms and SE(3)#

Why naive Euclidean loss on rotation is wrong#

Jacobians and differential kinematics#

Building intuition: planar 2-DoF example#

Python Implementation: 2-DOF Forward Kinematics#

Python Implementation: Numerical Inverse Kinematics#

Geometric vs. analytic Jacobian#

Uses#

Singularities and manipulability#

A concrete hardware failure mode#

Manipulability ellipsoids#

Implications for learning#

Workspace analysis#

Why kinematics still matters in the age of learning#

Key takeaways#

Conceptual questions#

Looking ahead#

Further reading#