Purpose of this lecture
Before robots can learn, plan, or reason, they must be modeled.
Robot learning algorithms — imitation learning, reinforcement learning, and foundation models — do not replace classical robotics. They sit on top of it. Without a precise description of how a robot moves, how its joints affect its body, and how its geometry constrains motion, learning-based methods become brittle, unsafe, or uninterpretable.
This lecture establishes the geometric and kinematic foundations required for all subsequent topics in robot learning. The emphasis is mathematical: we will derive the forward kinematic map using screw theory, formalize the numerical inverse kinematics update rule, define the manipulability measure and its geometric interpretation, and connect workspace analysis to reward function design. These are not peripheral topics — every learning algorithm in the subsequent thirteen weeks operates within the kinematic structure introduced here.
Configuration spaces
The configuration of a robot specifies its complete geometric state. The configuration space (C-space) is the set of all possible configurations.
For a fixed-base manipulator with joints, the configuration is the vector of joint variables . Revolute joints contribute angles on the circle ; prismatic joints contribute translations in . The full C-space is thus a product manifold of circles and lines, not simply — a distinction that matters when measuring distances between configurations or designing exploration strategies for RLReinforcement Learning.
For mobile robots, configuration includes the planar pose , while for legged robots the configuration is composed of a floating base — the 6 DoF pose of the root body — combined with the joint angles for each limb. C-spaces easily exceed 30–50 dimensions for modern legged systems, and the dimensionality directly determines the difficulty of exploration, planning, and policy learning.
Floating-base systems and underactuation
A key distinction for legged and mobile robots: the base is an unactuated degree of freedom. Unlike a fixed-base manipulator where every DoF has a dedicated motor, a legged robot cannot directly command its base position or orientation. Base motion arises indirectly through contact forces at the feet.
This has immediate consequences for learning. The policy must reason about contact implicitly or explicitly. State estimation (Week 2) is necessary because base pose is not directly sensed with a rigid reference. Reward functions that penalize base motion must account for the fact that the robot cannot directly minimize it — it can only do so by learning the right contact strategy. This distinction between actuated joints and unactuated base is one of the most important conceptual boundaries between manipulation and locomotion RLReinforcement Learning.
Computational representation
In practice, robot kinematics are specified in URDF (Unified Robot Description Format) or MJCF (MuJoCo XML) files. These formats encode the kinematic tree (parent-child joint hierarchy), joint types, limits, and axes, along with link masses, inertias, and collision geometry. When you load a robot in IsaacLab, MuJoCo, or PyBullet, you are instantiating this kinematic tree. Understanding the structure of these files makes it far easier to debug simulation errors, set up reward functions, and implement custom controllers.
Forward kinematics
Forward kinematics (FK) maps joint configurations to end-effector pose: . Given , there is exactly one end-effector pose — FK is always well-defined and computationally inexpensive.
The classical formalization of FK uses Denavit-Hartenberg (DH) parameters, which assign four parameters per joint to describe the relative transform between adjacent link frames. While DH conventions are widely used in industry, they have a fundamental limitation: the frame assignment is not unique and the parameters are sensitive to singularities in the convention itself. The modern alternative, based on the Product of Exponentials formula from screw theory, is geometrically cleaner and integrates naturally with the structure needed for learning-based methods.
Product of Exponentials formula
In the screw-theoretic approach, each joint axis is represented as a screw axis , defined in the fixed base frame when the robot is at its zero configuration . The screw axis captures both the direction of joint motion and the coupling between translation and rotation:
where is the unit rotation axis of joint and for revolute joints (with a point on the axis) or for prismatic joints (with the direction of translation). The matrix exponential of the screw axis scaled by the joint variable yields a rigid body transform:
where is the skew-symmetric matrix corresponding to , and is computed via Rodrigues' rotation formula. The full forward kinematics map is then the Product of Exponentials (PoE):
where is the end-effector pose at the zero configuration. Each factor is itself an element of , and their composition is also in — the PoE formula respects the group structure exactly, with no approximation or singularity in the parameterization.
The PoE formula has several advantages over DH conventions that matter for robot learning. The screw axes are defined in a single global frame, making them easy to specify geometrically without a DH frame-assignment procedure. The formula differentiates cleanly: the Jacobian columns are the transformed screw axes, which we will derive below. And the exponential map connects directly to Lie algebra operations that appear throughout equivariant learning architectures and geometric trajectory optimization.
Key property
Forward kinematics is deterministic, efficient, and always well-defined. It serves as a differentiable function through which gradients from task-space losses can propagate to joint-space parameters — an essential property for trajectory optimization, operational-space control, and learning architectures that reason in end-effector coordinates.
Inverse kinematics
Inverse kinematics (IK) solves the opposite problem: given a desired end-effector pose , find joint angles such that . This problem is generically harder than FK and has no closed-form solution for most robot geometries.
The challenges
The IK problem exhibits three structural difficulties that have direct consequences for how learning systems handle it. First, there may be multiple solutions: a redundant robot (with more than 6 DoF) has infinitely many joint configurations that achieve any reachable end-effector pose, and even a non-redundant robot typically has a discrete set of solutions (elbow up vs. elbow down, wrist flip). Second, there may be no solution: not all points in task space are reachable from all configurations, and even within the reachable workspace, not all orientations are achievable at all positions. Third, solutions may be numerically ill-conditioned near singular configurations, where small changes in the desired pose require large changes in joint angles.
Numerical IK: the iterative update rule
The standard approach for general robot geometries is iterative numerical IK. Starting from an initial configuration , the algorithm computes a small joint-space step that reduces the task-space error at each iteration. Let the task-space error be , encoding both position and orientation error between the current end-effector pose and the desired pose. The orientation error is computed as the logarithm of the relative rotation , which gives an axis-angle representation free from gimbal lock.
The linearized relationship between joint velocity and task-space velocity is . Treating the error as a desired velocity and inverting gives the pseudoinverse IK update:
where is the Moore-Penrose pseudoinverse of the Jacobian. This update is the minimum-norm joint displacement that achieves the desired task-space step, and iterating it from converges to a solution (if one exists nearby) for well-conditioned configurations.
Damped Least Squares (Levenberg-Marquardt)
Near singular configurations, becomes ill-conditioned and its inverse amplifies errors catastrophically — small task-space errors produce enormous joint-space steps. The Damped Least Squares (DLS) method, also known as the Levenberg-Marquardt regularization, addresses this by adding a damping term to the matrix being inverted:
The scalar is the damping coefficient. When the Jacobian is full rank and the configuration is well-conditioned, the damping term is negligible and the update approximates the pseudoinverse update. As the robot approaches a singularity and , the damping term dominates and bounds the joint velocities at the cost of allowing some task-space error. This graceful degradation near singularities is the key property: DLS never generates unbounded joint commands, making it safe for hardware deployment.
The choice of trades off between IK accuracy (small , more susceptible to singularities) and smoothness (large , always safe but less accurate). Adaptive schemes set as a function of the smallest singular value of , providing singularity-dependent damping that is tight away from singularities and conservative near them.
Learning connection
The relationship between IK and learned policies is determined by the action space. Task-space policies — such as diffusion policy and ACTAction Chunking with Transformers, which output desired end-effector poses — must solve IK as a post-processing step, either via analytical IK (for specific geometries), numerical DLS IK, or by learning an implicit IK through end-to-end training. Joint-space policies — which dominate locomotion RLReinforcement Learning and are common in torque-controlled manipulation — output joint positions, velocities, or torques directly, bypassing IK entirely. Understanding which regime the policy operates in determines what geometric structure it must learn and where IK-induced errors can enter the pipeline.
Rigid body transforms and SE(3)
Robot motion occurs in 3D rigid body space. A rigid body pose is an element of the Special Euclidean group:
where is the group of 3D rotations and is the translation.
Why naive Euclidean loss on rotation is wrong
Rotation is not a vector space. Common rotation representations each carry failure modes when treated as flat Euclidean quantities. Euler angles suffer from gimbal lock — at certain configurations, two rotational axes align and a degree of freedom is lost. Quaternions have the double cover property: and represent the same rotation. A loss function computed naively treats them as maximally different when they are in fact identical — the antipodal problem. Rotation matrices are overconstrained ( parameters for DoF); unconstrained optimization produces outputs that are not valid rotations.
The geometrically correct notion of distance between rotations is the geodesic distance on :
where is the matrix logarithm mapping back to the Lie algebra . When supervising orientation targets, geodesic or axis-angle losses should be used rather than MSE on raw quaternion components. When averaging poses (e.g., in ensemble policies or diffusion rollouts), averaging in the tangent space rather than the ambient representation space produces geometrically consistent results.
Several modern architectures exploit structure directly. Equivariant networks (SE(3)-Transformers, ESCNN) build in symmetry so that rotating the scene rotates the output predictably, improving sample efficiency. Six-DoF diffusion policies operate in and require proper geodesic noise schedules. Foundation models (ACTAction Chunking with Transformers, , GR00T) embed geometric inductive biases either explicitly in their action representations or implicitly through large-scale training on pose-annotated data.
Jacobians and differential kinematics
The Jacobian relates joint velocities to end-effector velocities:
where is the task-space velocity (3 linear + 3 angular) and is the vector of joint velocities. In the PoE formulation, the -th column of the body Jacobian is the screw axis transformed to the current configuration by the partial forward kinematics up to joint , making the Jacobian a natural byproduct of the FK computation.
Building intuition: planar 2-DoF example
Consider a 2-link planar arm with link lengths and joint angles . The end-effector position is:
Differentiating:
Python Implementation: 2-DOF Forward Kinematics
The following code implements the forward kinematics and Jacobian calculation for the 2-link planar arm described above.
import numpy as np
def compute_forward_kinematics(q, l1=1.0, l2=1.0):
"""
Computes the end-effector (x, y) position for a 2-DOF planar arm.
Args:
q: array-like of length 2 [q1, q2] in radians
l1, l2: lengths of the first and second links
"""
q1, q2 = q
# x = l1*cos(q1) + l2*cos(q1 + q2)
x = l1 * np.cos(q1) + l2 * np.cos(q1 + q2)
# y = l1*sin(q1) + l2*sin(q1 + q2)
y = l1 * np.sin(q1) + l2 * np.sin(q1 + q2)
return np.array([x, y])
def compute_jacobian(q, l1=1.0, l2=1.0):
"""
Computes the 2x2 geometric Jacobian for a 2-DOF planar arm.
"""
q1, q2 = q
# Define partial derivatives for J
j11 = -l1 * np.sin(q1) - l2 * np.sin(q1 + q2)
j12 = -l2 * np.sin(q1 + q2)
j21 = l1 * np.cos(q1) + l2 * np.cos(q1 + q2)
j22 = l2 * np.cos(q1 + q2)
return np.array([[j11, j12],
[j21, j22]])
# Example: Arm at [45 deg, 45 deg]
q_test = np.array([np.pi/4, np.pi/4])
pos = compute_forward_kinematics(q_test)
J = compute_jacobian(q_test)
print(f"End-effector position: {pos}")
print(f"Jacobian determinant: {np.linalg.det(J):.4f}")
Python Implementation: Numerical Inverse Kinematics
While forward kinematics is analytical, inverse kinematics often requires numerical methods for complex chains. Here is a simple Jacobian-based IK solver.
def compute_inverse_kinematics(target_pos, q_initial, max_steps=100, tol=1e-3):
"""
Solves for q that produces target_pos using Newton's method.
"""
q = q_initial.copy()
for i in range(max_steps):
current_pos = compute_forward_kinematics(q)
error = target_pos - current_pos
if np.linalg.norm(error) < tol:
print(f"IK converged in {i} steps")
return q
# q_next = q + J_inv * error
J = compute_jacobian(q)
# Use pseudo-inverse for robustness near singularities
q = q + np.linalg.pinv(J) @ error
return q
# Test IK
target = np.array([1.5, 0.5])
q_sol = compute_inverse_kinematics(target, np.array([0.1, 0.1]))
print(f"Solved joint angles: {q_sol}")
print(f"Verification pos: {compute_forward_kinematics(q_sol)}")
This matrix tells you how each joint velocity contributes to end-effector velocity. When , the robot is at a singularity — one direction of task-space motion becomes uncontrollable.
Geometric vs. analytic Jacobian
The geometric Jacobian relates joint velocities to the physical linear and angular velocity of the end-effector, with angular velocity expressed as directly. The analytic Jacobian relates joint velocities to the time derivative of a chosen rotation parametrization (e.g., for Euler angles). It depends on the representation and is not defined at parametrization singularities. Mixing rotation representations and Jacobians — computing a control law with the analytic Jacobian but measuring error in quaternion space — introduces errors. The geometric Jacobian should be used unless there is a specific reason for the analytic form.
Uses
The Jacobian appears throughout the robot learning stack: in velocity-level control and impedance control; in force control via the transpose ; in sensitivity analysis and workspace characterization; and in hybrid learning-plus-control pipelines using operational-space controllers. Jacobian-based control is often embedded as a differentiable layer inside learned pipelines. Operational-space controllers use to map policy outputs from task space to joint torques. Impedance control wraps a stiffness law around , providing compliant behavior that RLReinforcement Learning policies can learn to exploit. Gradient-based trajectory optimization differentiates through to propagate task-space costs into joint space.
Singularities and manipulability
A singularity occurs when the Jacobian loses rank — . At singular configurations, one or more directions of task-space motion become uncontrollable. Achieving motion in those directions requires theoretically infinite joint velocities, and numerically the pseudoinverse amplifies errors catastrophically, causing large commanded torques and potential hardware damage.
A concrete hardware failure mode
Consider a 6-DoF arm approaching a wrist singularity: the last three joint axes become coplanar, collapsing the rank of the wrist Jacobian. A policy executing a trajectory that passes through this configuration generates a torque spike. In simulation this may be suppressed by joint velocity limits; on hardware it triggers emergency stops or damages actuators. This is a realistic and well-documented failure mode in sim-to-real transfer, and it arises precisely because the policy was not trained with awareness of Jacobian conditioning.
Manipulability ellipsoids
The concept of manipulability, introduced by Yoshikawa (1985), provides a scalar measure of how far a configuration is from singularity and how well the robot can generate motion and force in different directions. The manipulability measure is defined as:
This scalar is zero at singular configurations (where ) and strictly positive elsewhere. Larger values of indicate that the robot has greater ability to generate end-effector velocities in all directions simultaneously.
The manipulability ellipsoid gives a geometric picture of directional capability. It is the image of the unit ball in joint-velocity space under the map . Formally, the set of achievable task-space velocities for unit joint speed is:
This is an ellipsoid in task space whose semi-axes are the singular values of , oriented along the corresponding left singular vectors. The longest axis of the ellipsoid points in the direction of maximum end-effector velocity per unit joint speed — the most dexterous direction. The shortest axis points toward the near-singular direction where even large joint velocities produce little end-effector motion.
Physically, this means a robot at a stretched-out configuration (near a boundary of the workspace) has a very elongated manipulability ellipsoid: it can move easily along one direction but has poor speed and force transmission in others. A robot near the center of its workspace typically has a more spherical ellipsoid, indicating uniform capability across directions. The same ellipsoid (inverted, via the force-velocity duality ) characterizes the force ellipsoid — directions in which the robot can generate contact forces most efficiently.
Implications for learning
Manipulability has direct implications for reward design and policy evaluation. A policy that routinely passes through low-manipulability configurations will generate large torque commands that stress actuators and cause sim-to-real failures. Dataset design for imitation learning should filter demonstrations that approach singular configurations, or add corrective perturbations that teach recovery. In RLReinforcement Learning, the manipulability measure can be incorporated as an auxiliary reward term — encouraging the policy to maintain well-conditioned configurations — or as a constraint in a CBF-QP safety filter. Empirically, adding even a small manipulability penalty to the reward function significantly reduces the frequency of near-singular trajectories and improves hardware deployment success rates.
Workspace analysis
The workspace of a robot is the set of end-effector poses that are reachable by some joint configuration. Understanding workspace geometry is not merely a static fact about the robot's hardware — it directly determines what tasks are feasible, how reward functions must be designed, and how curriculum learning should be structured. These connections are often underappreciated by practitioners who focus on algorithmic choices while treating workspace constraints as a background assumption.
The reachable workspace consists of all poses for which there exists at least one (respecting joint limits) such that . The dexterous workspace is the more restrictive subset of poses achievable across all end-effector orientations — it is typically a much smaller region concentrated near the robot's kinematic center. For manipulation tasks where orientation matters (grasping, insertion, surface-following), the dexterous workspace is the operationally relevant constraint, not the larger reachable workspace.
Workspace analysis shapes reward function design in non-obvious ways. A naive dense reward that penalizes the distance from the end-effector to a goal position will push the policy to command configurations that move toward the goal — but if the goal is outside the reachable workspace, the policy will drive joint configurations toward their limits and potentially into singularities in pursuit of an impossible target. The correct practice is to verify that all sampled goals during training lie within the reachable workspace, or to include a workspace penalty that discourages the policy from commanding end-effector targets that require configurations near joint limits or singularities. Formally, this can be implemented as an auxiliary reward term or as a distance penalty from the workspace boundary.
Curriculum learning must also respect workspace geometry. As task difficulty increases during training — for instance, by sampling goal positions from progressively larger regions of space — the curriculum schedule must ensure that the sampled goals remain within the reachable workspace. Goals progressively closer to the workspace boundary increase both geometric difficulty (the robot must approach configurations with lower manipulability) and numerical difficulty (IK solutions near the boundary are sensitive to small perturbations). A well-designed curriculum tracks manipulability across the training distribution and avoids the regime where frequent near-singular configurations begin damaging hardware.
The task-specific workspace — the subset of the reachable workspace that is actually required by the deployment task — is an important practical abstraction. For a table-top pick-and-place task, the task-specific workspace is a small region above the table surface. Training a policy that explores only this region (rather than the full reachable workspace) converges faster, produces more precise policies, and avoids the waste of training capacity on configurations that will never be needed at deployment.
Why kinematics still matters in the age of learning
Modern robot learning systems still execute actions in joint space or task space, still collide with physical joint limits and singular configurations, and still fail when geometric structure is ignored in loss functions or reward design. Even the most capable foundation models — ACTAction Chunking with Transformers, , GR00T — either rely on kinematic structure for supervision, embed geometric inductive biases in their architectures, or implicitly learn them from massive pose-annotated datasets. In each case, the underlying geometry does not disappear. It is either handled correctly or it becomes a source of failure.
Learning does not replace kinematics. It builds on it.
Key takeaways
The concepts introduced this week form a dependency chain, not a flat list of independent facts. C-space defines the space in which the robot lives. Forward kinematics — formalized most cleanly through the Product of Exponentials formula — maps points in that space to geometry in the world, using screw axis representations that respect the Lie group structure of . The Jacobian linearizes FK locally, giving a first-order picture of how joint motion produces task-space motion and providing the core operator for numerical IK via the pseudoinverse update . Damped Least Squares IK regularizes this update near singularities by adding to the inversion, bounding joint commands at the cost of some task-space accuracy. Singularities are the configurations where the Jacobian loses rank; the manipulability measure quantifies proximity to singularity, and the manipulability ellipsoid characterizes directional force and velocity transmission geometrically. Workspace analysis characterizes globally where the robot can and cannot reach, and these constraints propagate directly into reward function design, curriculum scheduling, and action-space parameterization. Throughout, the correct mathematical home for rigid body poses is — a structure that must be respected in any learning system that reasons about robot geometry.
Conceptual questions
-
A locomotion policy is trained entirely in joint space and never explicitly reasons about the robot's base pose. In what sense does geometry still matter for this policy? How does the fact that the base is unactuated appear in the RLReinforcement Learning problem formulation?
-
You are training a manipulation policy using behavioral cloning. Your action space is end-effector pose represented as a quaternion plus translation. You use MSE loss on the raw quaternion components. What can go wrong because of the antipodal problem, and how would you fix it using a geodesic loss?
-
A 7-DoF robot arm has more joints than the 6 DoF needed to specify an end-effector pose. How does this redundancy appear in the Jacobian — specifically, what is and what is the dimension of its null space? How might a policy exploit the null space to satisfy secondary objectives (e.g., joint limit avoidance or manipulability maximization) while achieving the primary task?
-
The manipulability measure is incorporated as an auxiliary reward term during RLReinforcement Learning training with a weight of . Describe the geometric effect on the policy's preferred configurations. What failure mode could arise if is set too high relative to the task reward? How does the manipulability ellipsoid of the learned policy differ from a policy trained with ?
-
A task-space policy outputs desired end-effector poses at 10 Hz, which are tracked by DLS numerical IK. The robot approaches a wrist singularity () mid-trajectory. With (pure pseudoinverse), describe the failure mode in terms of joint velocities. With (damped), describe how the behavior changes and what task-space error is introduced. At what point would you recommend the safety filter (Week 12) should engage, and what fallback behavior would it trigger?
Looking ahead
With kinematics in place, we now add physics and uncertainty.
Week 2: Dynamics and State Estimation. We study robot dynamics through the Euler-Lagrange and Newton-Euler formalisms, floating-base dynamics for legged systems, and how robots estimate their state from noisy sensors through Kalman filtering and its nonlinear extensions — all critical for control and learning in the real world.
Further reading
- Murray, Li, Sastry (1994). Chapters 2 & 3: Rigid Body Motion and Manipulator Kinematics.
- Siciliano et al. (2009). Chapter 2: Kinematics.