Skip to main content
illumin8
Courses
Week 1: Robot Modeling and Kinematics
Robot Learning
01Week 1: Robot Modeling and Kinematics
02Week 2: Dynamics and State Estimation
03Week 3: Control Fundamentals
04Week 4: Teleoperation and Data Collection
05Week 5: Imitation Learning
06Week 6: Reinforcement Learning for Robotics
07Week 7: Sim2Real Pipelines and IsaacLab
08Week 8: Foundation Models for Manipulation — ACT and Action Chunking
09Week 9: Flow Matching and Diffusion for Robot Policies
10Week 10: Vision–Language–Action Models
11Week 11: Fine-Tuning and Adaptation
12Week 12: Safety, Constraints, and Reliability
13Week 13: Multi-Robot and Multi-Task Learning
14Week 14: Sim2Real Capstone
Week 1

Week 1: Robot Modeling and Kinematics

✦Learning Outcomes
  • Implement forward kinematics using the Product of Exponentials (PoE) formula
  • Solve inverse kinematics using numerical methods (Damped Least Squares)
  • Analyze Jacobians and understand differential kinematics
  • Understand singularities and manipulability measures
  • Connect kinematics to robot learning pipelines
◆Prerequisites

Background knowledge assumed:

  • Linear algebra (vectors, matrices, transformations)
  • Basic programming (Python recommended)

Recommended: Review the Course 2 Index for an overview of the learning path.

Purpose of this lecture

Before robots can learn, plan, or reason, they must be modeled.

Robot learning algorithms — imitation learning, reinforcement learning, and foundation models — do not replace classical robotics. They sit on top of it. Without a precise description of how a robot moves, how its joints affect its body, and how its geometry constrains motion, learning-based methods become brittle, unsafe, or uninterpretable.

This lecture establishes the geometric and kinematic foundations required for all subsequent topics in robot learning. The emphasis is mathematical: we will derive the forward kinematic map using screw theory, formalize the numerical inverse kinematics update rule, define the manipulability measure and its geometric interpretation, and connect workspace analysis to reward function design. These are not peripheral topics — every learning algorithm in the subsequent thirteen weeks operates within the kinematic structure introduced here.


Configuration spaces

The configuration of a robot specifies its complete geometric state. The configuration space (C-space) is the set of all possible configurations.

For a fixed-base manipulator with nnn joints, the configuration is the vector of joint variables q=(q1,q2,…,qn)q = (q_1, q_2, \dots, q_n)q=(q1​,q2​,…,qn​). Revolute joints contribute angles on the circle T\mathbb{T}T; prismatic joints contribute translations in R\mathbb{R}R. The full C-space is thus a product manifold of circles and lines, not simply Rn\mathbb{R}^nRn — a distinction that matters when measuring distances between configurations or designing exploration strategies for RLReinforcement Learning.

For mobile robots, configuration includes the planar pose (x,y,θ)(x, y, \theta)(x,y,θ), while for legged robots the configuration is composed of a floating base — the 6 DoF pose of the root body — combined with the joint angles for each limb. C-spaces easily exceed 30–50 dimensions for modern legged systems, and the dimensionality directly determines the difficulty of exploration, planning, and policy learning.

Floating-base systems and underactuation

A key distinction for legged and mobile robots: the base is an unactuated degree of freedom. Unlike a fixed-base manipulator where every DoF has a dedicated motor, a legged robot cannot directly command its base position or orientation. Base motion arises indirectly through contact forces at the feet.

This has immediate consequences for learning. The policy must reason about contact implicitly or explicitly. State estimation (Week 2) is necessary because base pose is not directly sensed with a rigid reference. Reward functions that penalize base motion must account for the fact that the robot cannot directly minimize it — it can only do so by learning the right contact strategy. This distinction between actuated joints and unactuated base is one of the most important conceptual boundaries between manipulation and locomotion RLReinforcement Learning.

Computational representation

In practice, robot kinematics are specified in URDF (Unified Robot Description Format) or MJCF (MuJoCo XML) files. These formats encode the kinematic tree (parent-child joint hierarchy), joint types, limits, and axes, along with link masses, inertias, and collision geometry. When you load a robot in IsaacLab, MuJoCo, or PyBullet, you are instantiating this kinematic tree. Understanding the structure of these files makes it far easier to debug simulation errors, set up reward functions, and implement custom controllers.


Forward kinematics

Forward kinematics (FK) maps joint configurations to end-effector pose: f(q)→Tee∈SE(3)f(q) \rightarrow T_{ee} \in SE(3)f(q)→Tee​∈SE(3). Given qqq, there is exactly one end-effector pose — FK is always well-defined and computationally inexpensive.

The classical formalization of FK uses Denavit-Hartenberg (DH) parameters, which assign four parameters per joint to describe the relative transform between adjacent link frames. While DH conventions are widely used in industry, they have a fundamental limitation: the frame assignment is not unique and the parameters are sensitive to singularities in the convention itself. The modern alternative, based on the Product of Exponentials formula from screw theory, is geometrically cleaner and integrates naturally with the SE(3)SE(3)SE(3) structure needed for learning-based methods.

Product of Exponentials formula

In the screw-theoretic approach, each joint axis is represented as a screw axis Si∈R6\mathcal{S}_i \in \mathbb{R}^6Si​∈R6, defined in the fixed base frame when the robot is at its zero configuration MMM. The screw axis captures both the direction of joint motion and the coupling between translation and rotation:

Si=(ωivi)\mathcal{S}_i = \begin{pmatrix} \omega_i \\ v_i \end{pmatrix}Si​=(ωi​vi​​)

where ωi∈R3\omega_i \in \mathbb{R}^3ωi​∈R3 is the unit rotation axis of joint iii and vi=−ωi×riv_i = -\omega_i \times r_ivi​=−ωi​×ri​ for revolute joints (with rir_iri​ a point on the axis) or vi=d^iv_i = \hat{d}_ivi​=d^i​ for prismatic joints (with d^i\hat{d}_id^i​ the direction of translation). The 4×44\times 44×4 matrix exponential of the screw axis scaled by the joint variable qiq_iqi​ yields a rigid body transform:

e[Si]qi=[e[ωi]qi(I−e[ωi]qi)(ωi×vi)+ωiωiTviqi01]e^{[\mathcal{S}_i]q_i} = \begin{bmatrix} e^{[\omega_i]q_i} & (I - e^{[\omega_i]q_i})(\omega_i \times v_i) + \omega_i \omega_i^T v_i q_i \\ 0 & 1 \end{bmatrix}e[Si​]qi​=[e[ωi​]qi​0​(I−e[ωi​]qi​)(ωi​×vi​)+ωi​ωiT​vi​qi​1​]

where [ωi][\omega_i][ωi​] is the 3×33\times 33×3 skew-symmetric matrix corresponding to ωi\omega_iωi​, and e[ωi]qie^{[\omega_i]q_i}e[ωi​]qi​ is computed via Rodrigues' rotation formula. The full forward kinematics map is then the Product of Exponentials (PoE):

Tsb(q)=e[S1]q1 e[S2]q2⋯e[Sn]qn MT_{sb}(q) = e^{[\mathcal{S}_1]q_1}\, e^{[\mathcal{S}_2]q_2} \cdots e^{[\mathcal{S}_n]q_n}\, MTsb​(q)=e[S1​]q1​e[S2​]q2​⋯e[Sn​]qn​M

where M∈SE(3)M \in SE(3)M∈SE(3) is the end-effector pose at the zero configuration. Each factor e[Si]qie^{[\mathcal{S}_i]q_i}e[Si​]qi​ is itself an element of SE(3)SE(3)SE(3), and their composition is also in SE(3)SE(3)SE(3) — the PoE formula respects the group structure exactly, with no approximation or singularity in the parameterization.

The PoE formula has several advantages over DH conventions that matter for robot learning. The screw axes Si\mathcal{S}_iSi​ are defined in a single global frame, making them easy to specify geometrically without a DH frame-assignment procedure. The formula differentiates cleanly: the Jacobian columns are the transformed screw axes, which we will derive below. And the exponential map connects directly to Lie algebra operations that appear throughout equivariant learning architectures and geometric trajectory optimization.

Key property

Forward kinematics is deterministic, efficient, and always well-defined. It serves as a differentiable function through which gradients from task-space losses can propagate to joint-space parameters — an essential property for trajectory optimization, operational-space control, and learning architectures that reason in end-effector coordinates.


Inverse kinematics

Inverse kinematics (IK) solves the opposite problem: given a desired end-effector pose Tdes∈SE(3)T^{\text{des}} \in SE(3)Tdes∈SE(3), find joint angles qqq such that f(q)=Tdesf(q) = T^{\text{des}}f(q)=Tdes. This problem is generically harder than FK and has no closed-form solution for most robot geometries.

The challenges

The IK problem exhibits three structural difficulties that have direct consequences for how learning systems handle it. First, there may be multiple solutions: a redundant robot (with more than 6 DoF) has infinitely many joint configurations that achieve any reachable end-effector pose, and even a non-redundant robot typically has a discrete set of solutions (elbow up vs. elbow down, wrist flip). Second, there may be no solution: not all points in task space are reachable from all configurations, and even within the reachable workspace, not all orientations are achievable at all positions. Third, solutions may be numerically ill-conditioned near singular configurations, where small changes in the desired pose require large changes in joint angles.

Numerical IK: the iterative update rule

The standard approach for general robot geometries is iterative numerical IK. Starting from an initial configuration q0q_0q0​, the algorithm computes a small joint-space step that reduces the task-space error at each iteration. Let the task-space error be Δx∈R6\Delta x \in \mathbb{R}^6Δx∈R6, encoding both position and orientation error between the current end-effector pose and the desired pose. The orientation error is computed as the logarithm of the relative rotation log⁡(RdesTRcurrent)∈so(3)\log(R_{\text{des}}^T R_{\text{current}}) \in \mathfrak{so}(3)log(RdesT​Rcurrent​)∈so(3), which gives an axis-angle representation free from gimbal lock.

The linearized relationship between joint velocity and task-space velocity is x˙=J(q)q˙\dot{x} = J(q)\dot{q}x˙=J(q)q˙​. Treating the error as a desired velocity and inverting gives the pseudoinverse IK update:

Δq=J+(q) Δx\Delta q = J^+(q)\,\Delta xΔq=J+(q)Δx

where J+(q)=JT(q)(J(q)JT(q))−1J^+(q) = J^T(q)(J(q)J^T(q))^{-1}J+(q)=JT(q)(J(q)JT(q))−1 is the Moore-Penrose pseudoinverse of the Jacobian. This update is the minimum-norm joint displacement that achieves the desired task-space step, and iterating it from q0q_0q0​ converges to a solution (if one exists nearby) for well-conditioned configurations.

Damped Least Squares (Levenberg-Marquardt)

Near singular configurations, J(q)JT(q)J(q)J^T(q)J(q)JT(q) becomes ill-conditioned and its inverse amplifies errors catastrophically — small task-space errors produce enormous joint-space steps. The Damped Least Squares (DLS) method, also known as the Levenberg-Marquardt regularization, addresses this by adding a damping term λ2I\lambda^2 Iλ2I to the matrix being inverted:

Δq=JT(q)(J(q)JT(q)+λ2I)−1 Δx\Delta q = J^T(q)\bigl(J(q)J^T(q) + \lambda^2 I\bigr)^{-1}\,\Delta xΔq=JT(q)(J(q)JT(q)+λ2I)−1Δx

The scalar λ>0\lambda > 0λ>0 is the damping coefficient. When the Jacobian is full rank and the configuration is well-conditioned, the damping term is negligible and the update approximates the pseudoinverse update. As the robot approaches a singularity and det⁡(JJT)→0\det(JJ^T) \to 0det(JJT)→0, the damping term dominates and bounds the joint velocities at the cost of allowing some task-space error. This graceful degradation near singularities is the key property: DLS never generates unbounded joint commands, making it safe for hardware deployment.

The choice of λ\lambdaλ trades off between IK accuracy (small λ\lambdaλ, more susceptible to singularities) and smoothness (large λ\lambdaλ, always safe but less accurate). Adaptive schemes set λ\lambdaλ as a function of the smallest singular value of JJJ, providing singularity-dependent damping that is tight away from singularities and conservative near them.

Learning connection

The relationship between IK and learned policies is determined by the action space. Task-space policies — such as diffusion policy and ACTAction Chunking with Transformers, which output desired end-effector poses — must solve IK as a post-processing step, either via analytical IK (for specific geometries), numerical DLS IK, or by learning an implicit IK through end-to-end training. Joint-space policies — which dominate locomotion RLReinforcement Learning and are common in torque-controlled manipulation — output joint positions, velocities, or torques directly, bypassing IK entirely. Understanding which regime the policy operates in determines what geometric structure it must learn and where IK-induced errors can enter the pipeline.


Rigid body transforms and SE(3)

Robot motion occurs in 3D rigid body space. A rigid body pose is an element of the Special Euclidean group:

SE(3)={[Rt01]  |  R∈SO(3),  t∈R3}SE(3) = \left\{ \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} \;\middle|\; R \in SO(3), \; t \in \mathbb{R}^3 \right\}SE(3)={[R0​t1​]​R∈SO(3),t∈R3}

where SO(3)SO(3)SO(3) is the group of 3D rotations and t∈R3t \in \mathbb{R}^3t∈R3 is the translation.

Why naive Euclidean loss on rotation is wrong

Rotation is not a vector space. Common rotation representations each carry failure modes when treated as flat Euclidean quantities. Euler angles suffer from gimbal lock — at certain configurations, two rotational axes align and a degree of freedom is lost. Quaternions have the double cover property: qqq and −q-q−q represent the same rotation. A loss function ∥q1−q2∥2\|q_1 - q_2\|^2∥q1​−q2​∥2 computed naively treats them as maximally different when they are in fact identical — the antipodal problem. Rotation matrices are overconstrained (999 parameters for 333 DoF); unconstrained optimization produces outputs that are not valid rotations.

The geometrically correct notion of distance between rotations is the geodesic distance on SO(3)SO(3)SO(3):

d(R1,R2)=∥log⁡(R1TR2)∥Fd(R_1, R_2) = \left\| \log(R_1^T R_2) \right\|_Fd(R1​,R2​)=​log(R1T​R2​)​F​

where log⁡\loglog is the matrix logarithm mapping back to the Lie algebra so(3)\mathfrak{so}(3)so(3). When supervising orientation targets, geodesic or axis-angle losses should be used rather than MSE on raw quaternion components. When averaging poses (e.g., in ensemble policies or diffusion rollouts), averaging in the tangent space rather than the ambient representation space produces geometrically consistent results.

Several modern architectures exploit SE(3)SE(3)SE(3) structure directly. Equivariant networks (SE(3)-Transformers, ESCNN) build in symmetry so that rotating the scene rotates the output predictably, improving sample efficiency. Six-DoF diffusion policies operate in SE(3)SE(3)SE(3) and require proper geodesic noise schedules. Foundation models (ACTAction Chunking with Transformers, π0\pi_0π0​, GR00T) embed geometric inductive biases either explicitly in their action representations or implicitly through large-scale training on pose-annotated data.


Jacobians and differential kinematics

The Jacobian relates joint velocities to end-effector velocities:

x˙=J(q) q˙\dot{x} = J(q)\,\dot{q}x˙=J(q)q˙​

where x˙∈R6\dot{x} \in \mathbb{R}^6x˙∈R6 is the task-space velocity (3 linear + 3 angular) and q˙∈Rn\dot{q} \in \mathbb{R}^nq˙​∈Rn is the vector of joint velocities. In the PoE formulation, the iii-th column of the body Jacobian is the screw axis Si\mathcal{S}_iSi​ transformed to the current configuration by the partial forward kinematics up to joint iii, making the Jacobian a natural byproduct of the FK computation.

Building intuition: planar 2-DoF example

Consider a 2-link planar arm with link lengths l1,l2l_1, l_2l1​,l2​ and joint angles q1,q2q_1, q_2q1​,q2​. The end-effector position is:

x=l1cos⁡q1+l2cos⁡(q1+q2),y=l1sin⁡q1+l2sin⁡(q1+q2)x = l_1 \cos q_1 + l_2 \cos(q_1 + q_2), \quad y = l_1 \sin q_1 + l_2 \sin(q_1 + q_2)x=l1​cosq1​+l2​cos(q1​+q2​),y=l1​sinq1​+l2​sin(q1​+q2​)

Differentiating:

J(q)=∂(x,y)∂(q1,q2)=[−l1sin⁡q1−l2sin⁡(q1+q2)−l2sin⁡(q1+q2)l1cos⁡q1+l2cos⁡(q1+q2)l2cos⁡(q1+q2)]J(q) = \frac{\partial (x,y)}{\partial (q_1, q_2)} = \begin{bmatrix} -l_1 \sin q_1 - l_2 \sin(q_1+q_2) & -l_2 \sin(q_1+q_2) \\ l_1 \cos q_1 + l_2 \cos(q_1+q_2) & l_2 \cos(q_1+q_2) \end{bmatrix}J(q)=∂(q1​,q2​)∂(x,y)​=[−l1​sinq1​−l2​sin(q1​+q2​)l1​cosq1​+l2​cos(q1​+q2​)​−l2​sin(q1​+q2​)l2​cos(q1​+q2​)​]

Python Implementation: 2-DOF Forward Kinematics

The following code implements the forward kinematics and Jacobian calculation for the 2-link planar arm described above.

import numpy as np

def compute_forward_kinematics(q, l1=1.0, l2=1.0):
    """
    Computes the end-effector (x, y) position for a 2-DOF planar arm.
    
    Args:
        q: array-like of length 2 [q1, q2] in radians
        l1, l2: lengths of the first and second links
    """
    q1, q2 = q
    
    # x = l1*cos(q1) + l2*cos(q1 + q2)
    x = l1 * np.cos(q1) + l2 * np.cos(q1 + q2)
    
    # y = l1*sin(q1) + l2*sin(q1 + q2)
    y = l1 * np.sin(q1) + l2 * np.sin(q1 + q2)
    
    return np.array([x, y])

def compute_jacobian(q, l1=1.0, l2=1.0):
    """
    Computes the 2x2 geometric Jacobian for a 2-DOF planar arm.
    """
    q1, q2 = q
    
    # Define partial derivatives for J
    j11 = -l1 * np.sin(q1) - l2 * np.sin(q1 + q2)
    j12 = -l2 * np.sin(q1 + q2)
    j21 = l1 * np.cos(q1) + l2 * np.cos(q1 + q2)
    j22 = l2 * np.cos(q1 + q2)
    
    return np.array([[j11, j12], 
                     [j21, j22]])

# Example: Arm at [45 deg, 45 deg]
q_test = np.array([np.pi/4, np.pi/4])
pos = compute_forward_kinematics(q_test)
J = compute_jacobian(q_test)

print(f"End-effector position: {pos}")
print(f"Jacobian determinant: {np.linalg.det(J):.4f}")

Python Implementation: Numerical Inverse Kinematics

While forward kinematics is analytical, inverse kinematics often requires numerical methods for complex chains. Here is a simple Jacobian-based IK solver.

def compute_inverse_kinematics(target_pos, q_initial, max_steps=100, tol=1e-3):
    """
    Solves for q that produces target_pos using Newton's method.
    """
    q = q_initial.copy()

    for i in range(max_steps):
        current_pos = compute_forward_kinematics(q)
        error = target_pos - current_pos

        if np.linalg.norm(error) < tol:
            print(f"IK converged in {i} steps")
            return q

        # q_next = q + J_inv * error
        J = compute_jacobian(q)
        # Use pseudo-inverse for robustness near singularities
        q = q + np.linalg.pinv(J) @ error

    return q

# Test IK
target = np.array([1.5, 0.5])
q_sol = compute_inverse_kinematics(target, np.array([0.1, 0.1]))
print(f"Solved joint angles: {q_sol}")
print(f"Verification pos: {compute_forward_kinematics(q_sol)}")

This matrix tells you how each joint velocity contributes to end-effector velocity. When det⁡(J)=0\det(J) = 0det(J)=0, the robot is at a singularity — one direction of task-space motion becomes uncontrollable.

Geometric vs. analytic Jacobian

The geometric Jacobian JgJ_gJg​ relates joint velocities to the physical linear and angular velocity of the end-effector, with angular velocity expressed as ω∈R3\omega \in \mathbb{R}^3ω∈R3 directly. The analytic Jacobian JaJ_aJa​ relates joint velocities to the time derivative of a chosen rotation parametrization (e.g., ϕ˙\dot{\phi}ϕ˙​ for Euler angles). It depends on the representation and is not defined at parametrization singularities. Mixing rotation representations and Jacobians — computing a control law with the analytic Jacobian but measuring error in quaternion space — introduces errors. The geometric Jacobian should be used unless there is a specific reason for the analytic form.

Uses

The Jacobian appears throughout the robot learning stack: in velocity-level control and impedance control; in force control via the transpose τ=JTF\tau = J^T Fτ=JTF; in sensitivity analysis and workspace characterization; and in hybrid learning-plus-control pipelines using operational-space controllers. Jacobian-based control is often embedded as a differentiable layer inside learned pipelines. Operational-space controllers use JJJ to map policy outputs from task space to joint torques. Impedance control wraps a stiffness law around JTJ^TJT, providing compliant behavior that RLReinforcement Learning policies can learn to exploit. Gradient-based trajectory optimization differentiates through J(q)J(q)J(q) to propagate task-space costs into joint space.


Singularities and manipulability

A singularity occurs when the Jacobian loses rank — rank(J(q))<min⁡(m,n)\text{rank}(J(q)) < \min(m, n)rank(J(q))<min(m,n). At singular configurations, one or more directions of task-space motion become uncontrollable. Achieving motion in those directions requires theoretically infinite joint velocities, and numerically the pseudoinverse amplifies errors catastrophically, causing large commanded torques and potential hardware damage.

A concrete hardware failure mode

Consider a 6-DoF arm approaching a wrist singularity: the last three joint axes become coplanar, collapsing the rank of the wrist Jacobian. A policy executing a trajectory that passes through this configuration generates a torque spike. In simulation this may be suppressed by joint velocity limits; on hardware it triggers emergency stops or damages actuators. This is a realistic and well-documented failure mode in sim-to-real transfer, and it arises precisely because the policy was not trained with awareness of Jacobian conditioning.

Manipulability ellipsoids

The concept of manipulability, introduced by Yoshikawa (1985), provides a scalar measure of how far a configuration is from singularity and how well the robot can generate motion and force in different directions. The manipulability measure is defined as:

w(q)=det⁡(J(q)J(q)T)w(q) = \sqrt{\det\bigl(J(q)J(q)^T\bigr)}w(q)=det(J(q)J(q)T)​

This scalar is zero at singular configurations (where det⁡(JJT)=0\det(JJ^T) = 0det(JJT)=0) and strictly positive elsewhere. Larger values of w(q)w(q)w(q) indicate that the robot has greater ability to generate end-effector velocities in all directions simultaneously.

The manipulability ellipsoid gives a geometric picture of directional capability. It is the image of the unit ball in joint-velocity space under the map q˙↦x˙=J(q)q˙\dot{q} \mapsto \dot{x} = J(q)\dot{q}q˙​↦x˙=J(q)q˙​. Formally, the set of achievable task-space velocities for unit joint speed is:

E(q)={x˙:x˙T(JJT)−1x˙≤1}\mathcal{E}(q) = \{\dot{x} : \dot{x}^T (JJ^T)^{-1} \dot{x} \leq 1\}E(q)={x˙:x˙T(JJT)−1x˙≤1}

This is an ellipsoid in task space whose semi-axes are the singular values σ1,σ2,…,σm\sigma_1, \sigma_2, \ldots, \sigma_mσ1​,σ2​,…,σm​ of J(q)J(q)J(q), oriented along the corresponding left singular vectors. The longest axis of the ellipsoid points in the direction of maximum end-effector velocity per unit joint speed — the most dexterous direction. The shortest axis points toward the near-singular direction where even large joint velocities produce little end-effector motion.

Physically, this means a robot at a stretched-out configuration (near a boundary of the workspace) has a very elongated manipulability ellipsoid: it can move easily along one direction but has poor speed and force transmission in others. A robot near the center of its workspace typically has a more spherical ellipsoid, indicating uniform capability across directions. The same ellipsoid (inverted, via the force-velocity duality τ=JTF\tau = J^T Fτ=JTF) characterizes the force ellipsoid — directions in which the robot can generate contact forces most efficiently.

Implications for learning

Manipulability has direct implications for reward design and policy evaluation. A policy that routinely passes through low-manipulability configurations will generate large torque commands that stress actuators and cause sim-to-real failures. Dataset design for imitation learning should filter demonstrations that approach singular configurations, or add corrective perturbations that teach recovery. In RLReinforcement Learning, the manipulability measure w(q)w(q)w(q) can be incorporated as an auxiliary reward term — encouraging the policy to maintain well-conditioned configurations — or as a constraint in a CBF-QP safety filter. Empirically, adding even a small manipulability penalty to the reward function significantly reduces the frequency of near-singular trajectories and improves hardware deployment success rates.


Workspace analysis

The workspace of a robot is the set of end-effector poses that are reachable by some joint configuration. Understanding workspace geometry is not merely a static fact about the robot's hardware — it directly determines what tasks are feasible, how reward functions must be designed, and how curriculum learning should be structured. These connections are often underappreciated by practitioners who focus on algorithmic choices while treating workspace constraints as a background assumption.

The reachable workspace consists of all poses T∈SE(3)T \in SE(3)T∈SE(3) for which there exists at least one q∈Qvalidq \in \mathcal{Q}_{\text{valid}}q∈Qvalid​ (respecting joint limits) such that f(q)=Tf(q) = Tf(q)=T. The dexterous workspace is the more restrictive subset of poses achievable across all end-effector orientations — it is typically a much smaller region concentrated near the robot's kinematic center. For manipulation tasks where orientation matters (grasping, insertion, surface-following), the dexterous workspace is the operationally relevant constraint, not the larger reachable workspace.

Workspace analysis shapes reward function design in non-obvious ways. A naive dense reward that penalizes the distance from the end-effector to a goal position will push the policy to command configurations that move toward the goal — but if the goal is outside the reachable workspace, the policy will drive joint configurations toward their limits and potentially into singularities in pursuit of an impossible target. The correct practice is to verify that all sampled goals during training lie within the reachable workspace, or to include a workspace penalty that discourages the policy from commanding end-effector targets that require configurations near joint limits or singularities. Formally, this can be implemented as an auxiliary reward term −1[f(q)∉Wreachable]-\mathbb{1}[f(q) \notin \mathcal{W}_{\text{reachable}}]−1[f(q)∈/Wreachable​] or as a distance penalty from the workspace boundary.

Curriculum learning must also respect workspace geometry. As task difficulty increases during training — for instance, by sampling goal positions from progressively larger regions of space — the curriculum schedule must ensure that the sampled goals remain within the reachable workspace. Goals progressively closer to the workspace boundary increase both geometric difficulty (the robot must approach configurations with lower manipulability) and numerical difficulty (IK solutions near the boundary are sensitive to small perturbations). A well-designed curriculum tracks manipulability across the training distribution and avoids the regime where frequent near-singular configurations begin damaging hardware.

The task-specific workspace — the subset of the reachable workspace that is actually required by the deployment task — is an important practical abstraction. For a table-top pick-and-place task, the task-specific workspace is a small region above the table surface. Training a policy that explores only this region (rather than the full reachable workspace) converges faster, produces more precise policies, and avoids the waste of training capacity on configurations that will never be needed at deployment.


Why kinematics still matters in the age of learning

Modern robot learning systems still execute actions in joint space or task space, still collide with physical joint limits and singular configurations, and still fail when geometric structure is ignored in loss functions or reward design. Even the most capable foundation models — ACTAction Chunking with Transformers, π0\pi_0π0​, GR00T — either rely on kinematic structure for supervision, embed geometric inductive biases in their architectures, or implicitly learn them from massive pose-annotated datasets. In each case, the underlying geometry does not disappear. It is either handled correctly or it becomes a source of failure.

Learning does not replace kinematics. It builds on it.


Key takeaways

The concepts introduced this week form a dependency chain, not a flat list of independent facts. C-space defines the space in which the robot lives. Forward kinematics — formalized most cleanly through the Product of Exponentials formula — maps points in that space to geometry in the world, using screw axis representations that respect the Lie group structure of SE(3)SE(3)SE(3). The Jacobian linearizes FK locally, giving a first-order picture of how joint motion produces task-space motion and providing the core operator for numerical IK via the pseudoinverse update Δq=J+(q) Δx\Delta q = J^+(q)\,\Delta xΔq=J+(q)Δx. Damped Least Squares IK regularizes this update near singularities by adding λ2I\lambda^2 Iλ2I to the inversion, bounding joint commands at the cost of some task-space accuracy. Singularities are the configurations where the Jacobian loses rank; the manipulability measure w(q)=det⁡(JJT)w(q) = \sqrt{\det(JJ^T)}w(q)=det(JJT)​ quantifies proximity to singularity, and the manipulability ellipsoid characterizes directional force and velocity transmission geometrically. Workspace analysis characterizes globally where the robot can and cannot reach, and these constraints propagate directly into reward function design, curriculum scheduling, and action-space parameterization. Throughout, the correct mathematical home for rigid body poses is SE(3)SE(3)SE(3) — a structure that must be respected in any learning system that reasons about robot geometry.


Conceptual questions

  1. A locomotion policy is trained entirely in joint space and never explicitly reasons about the robot's base pose. In what sense does SE(3)SE(3)SE(3) geometry still matter for this policy? How does the fact that the base is unactuated appear in the RLReinforcement Learning problem formulation?

  2. You are training a manipulation policy using behavioral cloning. Your action space is end-effector pose represented as a quaternion plus translation. You use MSE loss on the raw quaternion components. What can go wrong because of the antipodal problem, and how would you fix it using a geodesic loss?

  3. A 7-DoF robot arm has more joints than the 6 DoF needed to specify an end-effector pose. How does this redundancy appear in the Jacobian — specifically, what is rank(J)\text{rank}(J)rank(J) and what is the dimension of its null space? How might a policy exploit the null space to satisfy secondary objectives (e.g., joint limit avoidance or manipulability maximization) while achieving the primary task?

  4. The manipulability measure w(q)=det⁡(J(q)J(q)T)w(q) = \sqrt{\det(J(q)J(q)^T)}w(q)=det(J(q)J(q)T)​ is incorporated as an auxiliary reward term during RLReinforcement Learning training with a weight of α=0.1\alpha = 0.1α=0.1. Describe the geometric effect on the policy's preferred configurations. What failure mode could arise if α\alphaα is set too high relative to the task reward? How does the manipulability ellipsoid of the learned policy differ from a policy trained with α=0\alpha = 0α=0?

  5. A task-space policy outputs desired end-effector poses at 10 Hz, which are tracked by DLS numerical IK. The robot approaches a wrist singularity (σmin⁡(J)<0.05\sigma_{\min}(J) < 0.05σmin​(J)<0.05) mid-trajectory. With λ=0\lambda = 0λ=0 (pure pseudoinverse), describe the failure mode in terms of joint velocities. With λ=0.1\lambda = 0.1λ=0.1 (damped), describe how the behavior changes and what task-space error is introduced. At what point would you recommend the safety filter (Week 12) should engage, and what fallback behavior would it trigger?

✦Solutions
  1. Even a joint-space policy depends on SE(3)SE(3)SE(3) because the base is a floating, unactuated body whose pose evolves only through contact forces. In the RL formulation the base pose appears in the state (it is needed for balance) but never in the action space; the unactuated base is an underactuation constraint, so the policy must steer base motion indirectly through contacts.
  2. Quaternions double-cover SO(3)SO(3)SO(3) (qqq and −q-q−q are the same rotation), so MSE on the raw components penalizes identical rotations differently and gives discontinuous targets. Use a geodesic loss invariant to the sign ambiguity, e.g. 1−∣⟨qpred,qtarget⟩∣1 - |\langle q_{\text{pred}}, q_{\text{target}}\rangle|1−∣⟨qpred​,qtarget​⟩∣ or the rotation angle 2arccos⁡∣⟨qpred,qtarget⟩∣2\arccos|\langle q_{\text{pred}}, q_{\text{target}}\rangle|2arccos∣⟨qpred​,qtarget​⟩∣.
  3. JJJ is 6×76\times 76×7, so generically rank⁡(J)=6\operatorname{rank}(J)=6rank(J)=6 and the null space has dimension 7−6=17-6=17−6=1. The policy can add null-space motion q˙=(I−J+J)q˙0\dot q = (I - J^{+}J)\dot q_0q˙​=(I−J+J)q˙​0​ that changes the configuration without moving the end-effector, spending that one degree of freedom on secondary objectives like joint-limit avoidance or manipulability maximization.
  4. Adding α w(q)\alpha\, w(q)αw(q) biases the policy toward well-conditioned configurations away from singularities (a rounder manipulability ellipsoid). If α\alphaα is too large the policy sacrifices task reward to chase manipulability — avoiding necessary near-singular poses or contorting just to inflate www — and fails the task. With α=0\alpha=0α=0 the ellipsoid is shaped only by the task and can be far more anisotropic.
  5. With λ=0\lambda = 0λ=0, as σmin⁡(J)→0\sigma_{\min}(J)\to 0σmin​(J)→0 the pseudoinverse blows up, producing huge, unsafe joint velocities along the singular direction. With λ=0.1\lambda = 0.1λ=0.1 (damped least squares) velocities stay bounded, but a task-space tracking error is introduced in that direction. The Week-12 safety filter should engage once σmin⁡\sigma_{\min}σmin​ drops below threshold (≈0.05) or velocity limits are hit, triggering a fallback that slows/halts the trajectory or reprojects the target off the singular direction.

Looking ahead

With kinematics in place, we now add physics and uncertainty.

Week 2: Dynamics and State Estimation. We study robot dynamics through the Euler-Lagrange and Newton-Euler formalisms, floating-base dynamics for legged systems, and how robots estimate their state from noisy sensors through Kalman filtering and its nonlinear extensions — all critical for control and learning in the real world.


Further reading

  • Murray, Li, Sastry (1994). Chapters 2 & 3: Rigid Body Motion and Manipulator Kinematics.
  • Siciliano et al. (2009). Chapter 2: Kinematics.
Next →
Week 2: Dynamics and State Estimation
On this page
  • Purpose of this lecture
  • Configuration spaces
  • Floating-base systems and underactuation
  • Computational representation
  • Forward kinematics
  • Product of Exponentials formula
  • Key property
  • Inverse kinematics
  • The challenges
  • Numerical IK: the iterative update rule
  • Damped Least Squares (Levenberg-Marquardt)
  • Learning connection
  • Rigid body transforms and SE(3)
  • Why naive Euclidean loss on rotation is wrong
  • Jacobians and differential kinematics
  • Building intuition: planar 2-DoF example
  • Python Implementation: 2-DOF Forward Kinematics
  • Python Implementation: Numerical Inverse Kinematics
  • Geometric vs. analytic Jacobian
  • Uses
  • Singularities and manipulability
  • A concrete hardware failure mode
  • Manipulability ellipsoids
  • Implications for learning
  • Workspace analysis
  • Why kinematics still matters in the age of learning
  • Key takeaways
  • Conceptual questions
  • Looking ahead
  • Further reading