Automatic Differentiation Basics
Riemann’s automatic differentiation engine automatically records tensor computations, building a computation graph, and efficiently computes derivatives through backpropagation. This is essential for training neural networks and other optimization tasks.
Core Concepts
Computation Graph: A directed graph automatically constructed by Riemann in the background that records the relationships between tensor operations. Each node represents a tensor, and edges represent operations.
Forward Pass: The process of executing operations starting from input tensors along the computation graph to obtain the final output.
Backward Propagation (Backprop): The process of propagating gradients backward along the computation graph starting from the output tensor to compute derivatives for each input tensor.
Gradient: The partial derivative of a scalar output tensor with respect to other tensors, representing the rate of change of the output relative to the input.
Leaf Node Tensor: A tensor created directly by the user (e.g., through
rm.tensor()) withrequires_grad=True. These are typically model parameters.Intermediate Node Tensor: A tensor created as a result of operations on other tensors. By default, gradients for intermediate nodes are not retained.
Gradient Computation Methods
Riemann provides two methods for computing gradients:
backward() method: Suitable for computing gradients of multiple tensors at once. After calling, gradients for all participating leaf node tensors are computed and stored in their respective
gradattributes.grad() function: Suitable for computing gradients of specific tensors. Allows precise control over which tensors’ gradients to compute, returning a tuple of gradients without modifying the tensors’
gradattributes.
Gradient Tracking Switch
By default, tensors don’t track their gradients. To enable gradient tracking, set requires_grad=True when creating a tensor:
import riemann as rm
# Tensor without gradient tracking
x = rm.tensor([1., 2., 3.])
print(x.requires_grad) # False
# Tensor with gradient tracking
x = rm.tensor([1., 2., 3.], requires_grad=True)
print(x.requires_grad) # True
You can also enable or disable gradient tracking on existing tensors:
x = rm.tensor([1., 2., 3.])
print(x.requires_grad) # False
# Enable gradient tracking
x.requires_grad_(True)
print(x.requires_grad) # True
Computing Gradients
Riemann provides two methods for computing gradients: the backward() method and the grad() function.
Using the backward() Method
The backward() method is suitable for computing gradients of multiple tensors at once. After calling, gradients are automatically stored in the grad attributes of participating leaf node tensors.
Function Signature:
tensor_object.backward(gradient=None, retain_graph=False, create_graph=False)
Parameters:
gradient (optional): When the output tensor is not a scalar, a gradient tensor with the same shape as the output is required. For scalar outputs, this parameter can be omitted, defaulting to
None(equivalent to passing scalar 1).retain_graph (optional): Whether to retain the computation graph. Defaults to
False, meaning the graph is released after backpropagation. Set toTrueif you need to callbackward()multiple times.create_graph (optional): Whether to record the computation graph of gradients for subsequent computation of higher-order derivatives, defaults to
False.
Use Cases:
Training neural networks, computing gradients for all trainable parameters at once
When multiple backward passes are needed (e.g., gradient accumulation)
Computing higher-order derivatives
Important Notes:
Only leaf node tensors with
requires_grad=Truewill have their gradients computedIntermediate node tensors do not retain gradients by default; call
retain_grad()if you need gradients for intermediate nodesGradients accumulate in the
gradattribute; manually zero gradients before multiplebackward()calls
Example 1: Gradient Computation for Scalar Output
import riemann as rm
# Create tensors with gradient tracking (leaf nodes)
x = rm.tensor(2.0, requires_grad=True)
y = rm.tensor(3.0, requires_grad=True)
# Define computation (intermediate node)
z = x * y + x ** 2.
# Compute gradients
z.backward()
# Access gradients
print(x.grad) # dz/dx = y + 2*x = 3 + 4 = 7
print(y.grad) # dz/dy = x = 2
Example 2: Gradient Computation for Non-Scalar Output
import riemann as rm
# Create tensors with gradient tracking
x = rm.tensor([1., 2., 3.], requires_grad=True)
# Define computation that produces a non-scalar output
y = x * 2.
# Compute gradients with respect to a vector, gradient argument required
gradient = rm.tensor([1., 1., 1.]) # Vector for Jacobian-vector product
y.backward(gradient)
# Access gradients
print(x.grad) # [2., 2., 2.]
Example 3: Retaining Gradients for Intermediate Nodes
import riemann as rm
x = rm.tensor(2.0, requires_grad=True)
y = x * 3 # Intermediate node
z = y ** 2 # Output
# Retain gradients for intermediate node y
y.retain_grad()
z.backward()
print(x.grad) # dz/dx = 36
print(y.grad) # dz/dy = 12 (because retain_grad() was called)
Using the grad() Function
The grad() function is suitable for computing gradients of specific tensors, allowing precise control over which tensors’ gradients to compute.
Function Signature:
riemann.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=False, create_graph=False, allow_unused=False)
Parameters:
outputs: Output tensor(s) (scalar or tensor), the starting point for gradient computation
inputs: Input tensor or tuple of tensors, specifying which tensors to compute gradients for
grad_outputs (optional): Gradient tensor required when
outputsis not a scalarretain_graph (optional): Whether to retain the computation graph, defaults to
Falsecreate_graph (optional): Whether to record the computation graph of gradients for subsequent computation of higher-order derivatives, defaults to
Falseallow_unused (optional): Whether to allow some input tensors to be unused, defaults to
False
Use Cases:
When you only need gradients for specific tensors, not all leaf nodes
When you don’t want to modify the
gradattributes of tensorsWhen you need more flexible control over the gradient computation process
Important Notes:
Gradients are returned as a tuple, in the same order as the
inputsparameterOnly tensors specified in
inputswill have their gradients computedDoes not modify the
gradattributes of input tensorsIntermediate nodes, even with
retain_grad()called, will not automatically have gradients computed ingrad(); they must be explicitly specified
Example 1: Computing Gradients for Specific Tensors
import riemann as rm
x = rm.tensor(2.0, requires_grad=True)
y = rm.tensor(3.0, requires_grad=True)
z = rm.tensor(4.0, requires_grad=True)
# Define computation
w = x * y + z
# Only compute gradients for x and y, not z
grads = rm.autograd.grad(w, (x, y))
print(grads) # (tensor(3.), tensor(2.))
print(x.grad) # None (grad() does not modify grad attributes)
Example 2: Gradient Computation for Non-Scalar Output
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
y = x * 2
# For non-scalar outputs, grad_outputs must be provided
grad_outputs = rm.tensor([1., 1., 1.])
grads = rm.autograd.grad(y, x, grad_outputs=grad_outputs)
print(grads) # (tensor([2., 2., 2.]),)
Gradient Accumulation
Gradients are accumulated by default. This means that if you call backward() multiple times, the gradients will add up:
import riemann as rm
# Create tensor with gradient tracking
x = rm.tensor(1.0, requires_grad=True)
# First computation
y = x * 2.
y.backward()
print(x.grad) # 2
# Second computation
y = x * 3.
y.backward()
print(x.grad) # 2 + 3 = 5 (gradients accumulate)
# Clear gradients
if x.grad is not None:
x.grad.zero_()
print(x.grad) # 0
Gradient Computation Context Control
Riemann provides a flexible gradient computation context control mechanism through functions and context managers, allowing convenient enabling or disabling of gradient tracking. This is useful in model inference (where gradients should be disabled to save memory) and training (where gradients are needed) scenarios.
is_grad_enabled() Function
The is_grad_enabled() function checks whether gradient computation is currently enabled.
import riemann as rm
# Check current gradient status
print(rm.is_grad_enabled()) # True (enabled by default)
with rm.no_grad():
print(rm.is_grad_enabled()) # False
no_grad() Context Manager/Decorator
no_grad() temporarily disables gradient computation. In this context, all computations will not track gradients, which is suitable for inference phases and can significantly reduce memory usage and accelerate computation.
As a context manager:
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
with rm.no_grad():
y = x * 2.
print(y.requires_grad) # False
As a function decorator:
import riemann as rm
@rm.no_grad
def inference(model, x):
# Computations within the function will not track gradients
return model(x)
enable_grad() Context Manager/Decorator
enable_grad() temporarily enables gradient computation. Can be used to temporarily enable gradients within a no_grad context.
As a context manager:
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
with rm.no_grad():
# Gradients are disabled here
print(rm.is_grad_enabled()) # False
with rm.enable_grad():
# Gradients are temporarily enabled here
y = x * 2.
print(y.requires_grad) # True
# Back to disabled state
print(rm.is_grad_enabled()) # False
As a function decorator:
import riemann as rm
@rm.enable_grad
def train_step(model, x, target, loss_fn):
# Computations within the function will track gradients
pred = model(x)
loss = loss_fn(pred, target)
loss.backward()
return loss
set_grad_enabled() Context Manager/Decorator
set_grad_enabled(mode) is the most flexible gradient control function, allowing explicit enabling or disabling of gradient computation.
Parameters:
mode (bool):
Trueto enable gradient computation,Falseto disable
As a context manager:
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
# Disable gradients
with rm.set_grad_enabled(False):
y = x * 2.
print(y.requires_grad) # False
# Enable gradients
with rm.set_grad_enabled(True):
y = x * 2.
print(y.requires_grad) # True
As a function decorator:
import riemann as rm
@rm.set_grad_enabled(False)
def inference(model, x):
return model(x)
@rm.set_grad_enabled(True)
def train(model, x, target, loss_fn):
pred = model(x)
loss = loss_fn(pred, target)
loss.backward()
return loss
Nested Context Managers
Gradient control context managers support nested usage, where inner contexts temporarily override outer settings:
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
with rm.no_grad(): # Outer: disable gradients
y1 = x * 2.
print(f"Outer no_grad: y1.requires_grad = {y1.requires_grad}") # False
with rm.enable_grad(): # Inner: enable gradients
y2 = x * 3.
print(f"Inner enable_grad: y2.requires_grad = {y2.requires_grad}") # True
# Back to outer context
y3 = x * 4.
print(f"Back to outer: y3.requires_grad = {y3.requires_grad}") # False
Tensor Methods for Graph Detaching and Data Copying
Riemann provides several tensor methods for managing computation graph dependencies, and copying tensor data. Each method has distinct characteristics related to:
Whether it creates a new tensor object or modifies in-place
Whether it shares data with the original tensor
Whether gradient tracking is preserved
Here are the key methods explained with individual examples:
detach(): Create a new tensor that shares data with the original but is detached from the computation graph
The detach() method returns a new tensor object that shares the same data memory as the original tensor, but is disconnected from the computation graph. This means:
Changes to the detached tensor will modify the original tensor
No gradients will be backpropagated through the detached tensor
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
y = x * 2.
# Detach y from the computation graph
detached_y = y.detach()
print(f"detached_y: {detached_y}")
print(f"detached_y.requires_grad: {detached_y.requires_grad}")
print(f"Modifying detached_y will modify y: {id(detached_y.data) == id(y.data)}")
Characteristics: Creates new tensor object, shares memory with original, disables gradient tracking
detach_(): In-place operation that detaches the current tensor from the computation graph
The detach_() method is an in-place version of detach(). Instead of creating a new tensor, it modifies the current tensor to disconnect it from the computation graph.
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
y = x * 2.
print(f"Before detach_(): y.requires_grad = {y.requires_grad}")
y.detach_() # In-place operation
print(f"After detach_(): y.requires_grad = {y.requires_grad}")
Characteristics: Modifies tensor in-place (no new object), shares memory with original (same tensor), disables gradient tracking
clone(): Create a new tensor with copied data that maintains computation graph dependencies
The clone() method creates a completely new tensor object with its own data memory, but preserves the computation graph dependencies from the original tensor. This means operations on the cloned tensor can backpropagate gradients to the original tensor.
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
y = x * 2.
cloned_y = y.clone()
print(f"cloned_y: {cloned_y}")
print(f"cloned_y.requires_grad: {cloned_y.requires_grad}")
print(f"Modifying cloned_y won't modify y: {id(cloned_y.data) != id(y.data)}")
# Demonstrate gradient can propagate through cloned tensor to original tensor
loss = cloned_y.sum()
loss.backward()
print(f"x.grad after backward(): {x.grad}") # Gradient propagates from cloned tensor to x
Characteristics: Creates new tensor object, copies data (no memory sharing), preserves gradient tracking
copy(): Create a new tensor with copied data that is detached from the computation graph
The copy() method creates a new tensor object with its own data memory and is completely detached from the computation graph. This is equivalent to calling clone().detach_() and is useful for creating independent tensor copies without gradient tracking.
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
y = x * 2.
copied_y = y.copy()
print(f"copied_y: {copied_y}")
print(f"copied_y.requires_grad: {copied_y.requires_grad}")
print(f"Modifying copied_y won't modify y: {id(copied_y.data) != id(y.data)}")
Characteristics: Creates new tensor object, copies data (no memory sharing), disables gradient tracking
Key Differences Between Methods
The following table summarizes the key differences between these four methods:
Method |
Creates New Object? |
Shares Memory with Original Tensor? |
Supports Gradient Tracking? |
|---|---|---|---|
detach() |
Yes |
Yes |
No |
detach_() |
No |
N/A (same tensor) |
No |
clone() |
Yes |
No |
Yes |
copy() |
Yes |
No |
No |
import riemann as rm
x = rm.tensor([1., 2., 3.], requires_grad=True)
# Using detach() - creates new tensor, shares data, detached from graph
y1 = x.detach()
print(f"detach() result: y1 = {y1}, requires_grad={y1.requires_grad}")
# Using detach_() - in-place operation, modifies current tensor
x2 = rm.tensor([1., 2., 3.], requires_grad=True)
print(f"Before detach_(): x2.requires_grad={x2.requires_grad}")
x2.detach_()
print(f"After detach_(): x2.requires_grad={x2.requires_grad}")
# Using clone() - creates new tensor, copies data, maintains graph dependency
y2 = x.clone()
print(f"clone() result: y2 = {y2}, requires_grad={y2.requires_grad}")
# Using copy() - creates new tensor, copies data, detached from graph
y3 = x.copy()
print(f"copy() result: y3 = {y3}, requires_grad={y3.requires_grad}")
Key differences between these methods:
Data Sharing: detach() shares data with original, while clone() and copy() create new data copies
In-place Operation: detach_() modifies the tensor in-place, others create new tensors
Gradient Tracking: clone() maintains gradient tracking (if original requires it), others disable gradient tracking
Independent Copy: copy() creates a completely independent new tensor object that does not share data with the original tensor nor preserves computational graph dependencies
In-place Operations and Gradients
In-place operations can affect gradient computation. Here are important considerations:
Leaf Variables with Gradient Tracking: In-place operations are NOT allowed on leaf tensors that require gradient tracking, as this would destroy the computational graph necessary for backpropagation.
Non-Leaf Variables with Gradient Tracking: In-place operations are allowed on non-leaf tensors (intermediate results) that require gradient tracking.
Examples:
import riemann as rm
# 1. Example: In-place operations on leaf tensors are NOT allowed
x = rm.tensor([1., 2., 3.], requires_grad=True) # Leaf tensor
try:
x.add_(1.) # This will raise an error
except RuntimeError as e:
print(f"Error on leaf tensor in-place operation: {e}")
# 2. Example: In-place operations on non-leaf tensors ARE allowed
y = x * 2. # Non-leaf tensor
print(f"Before in-place add on non-leaf tensor: y = {y}")
y.add_(3.) # In-place operation on non-leaf tensor
print(f"After in-place add on non-leaf tensor: y = {y}")
# Compute gradient after in-place operation on non-leaf tensor
z = y.sum()
z.backward()
print(f"Gradient of x (leaf tensor): x.grad = {x.grad}")
# Clear gradients
x.grad.zero_()
# 3. Example: In-place assignment using tensor indexing on non-leaf tensors
y = x * 2. # Non-leaf tensor
print(f"Before in-place indexing assignment: y = {y}")
y[0] = 100. # In-place indexing assignment
print(f"After in-place indexing assignment: y = {y}")
# Compute gradient after indexing assignment
z = y.sum()
z.backward()
print(f"Gradient of x after indexing assignment: x.grad = {x.grad}")
# Clear gradients
x.grad.zero_()
# 4. Example: Gradient tracking with in-place operations
x = rm.tensor(2.0, requires_grad=True) # Leaf tensor
y = rm.tensor(3.0, requires_grad=True) # Leaf tensor
a = x * y # Intermediate tensor
a.mul_(2.) # In-place multiply
b = a + x # Final tensor
b.backward()
print(f"Gradient of x (left value): x.grad = {x.grad}")
print(f"Gradient of y (right value): y.grad = {y.grad}")
Higher-Order Gradients
Riemann supports computing higher-order derivatives by setting create_graph=True:
import riemann as rm
# Create tensor with gradient tracking
x = rm.tensor(2.0, requires_grad=True)
# First-order computation
y = x ** 3.
# Compute first-order gradients with graph creation
dy_dx = rm.autograd.grad(y, x, create_graph=True)[0]
print(dy_dx) # 12
# Compute second-order gradients
d2y_dx2 = rm.autograd.grad(dy_dx, x)[0]
print(d2y_dx2) # 12
Additionally, Riemann provides two convenient tools for higher-order derivative computation: the d() method and higher_order_grad() function.
d() Method
The d() method of tensor objects is used to compute mixed partial derivatives of the current scalar tensor with respect to multiple scalar tensors. It allows for easy computation of multi-order mixed derivatives.
import riemann as rm
# Create tensors with gradient tracking
x = rm.tensor(2.0, requires_grad=True)
y = rm.tensor(3.0, requires_grad=True)
# Define function f = x^3 * y^2
f = x ** 3 * y ** 2
# Compute mixed partial derivative d²f/dxdy
d2f_dxdy = f.d(x, y)
print(d2f_dxdy) # 72.0
# Compute third-order mixed partial derivative d³f/dx²dy
d3f_dx2dy = f.d(x, x, y)
print(d3f_dx2dy) # 72.0
higher_order_grad() Function
The higher_order_grad() function is used to compute n-th order derivatives of scalar tensor outputs with respect to input tensors. It provides a convenient way to directly compute derivatives of a specified order.
import riemann as rm
# Create tensor with gradient tracking
x = rm.tensor(2.0, requires_grad=True)
# Define function y = x^3
y = x ** 3
# Compute second-order derivative
d2y_dx2 = rm.autograd.higher_order_grad(y, x, 2)[0]
print(d2y_dx2) # 12.0
# Compute third-order derivative
d3y_dx3 = rm.autograd.higher_order_grad(y, x, 3)[0]
print(d3y_dx3) # 6.0
# Multiple inputs case
x1 = rm.tensor(1.0, requires_grad=True)
x2 = rm.tensor(2.0, requires_grad=True)
z = x1 ** 2 + x2 ** 3
grads = rm.autograd.higher_order_grad(z, [x1, x2], 2)
print(grads) # (2.0, 12.0)
Gradient functions (Functional API)
Riemann also provides a set of functional API functions in riemann.autograd.functional module for computing more advanced derivative structures, such as Jacobian matrices, Hessian matrices, Jacobian-vector products, etc.
jacobian() Function
The jacobian() function computes the Jacobian matrix of a function from input to output, showing all first-order partial derivatives of the function output with respect to the input.
import riemann as rm
# Define function f = x^2
def f(x):
return x ** 2
# Create input tensor
x = rm.tensor([1.0, 2.0, 3.0], requires_grad=True)
# Compute Jacobian matrix
jac = rm.autograd.functional.jacobian(f, x)
print(jac)
print(jac.shape) # (3, 3) # For vector input, shape is (n_inputs, n_outputs)
hessian() Function
The hessian() function computes the Hessian matrix of a scalar-valued function, showing all second-order partial derivatives of the function with respect to its inputs.
import riemann as rm
# Define function f = x^3
def f(x):
return x ** 3
# Create input tensor
x = rm.tensor(2.0, requires_grad=True)
# Compute Hessian matrix
hess = rm.autograd.functional.hessian(f, x)
print(hess)
print(hess.shape) # (1, 1) # For scalar input, shape is (input_size, input_size)
derivative() Function
The derivative() function computes a derivative function for the given function. It creates a new function that, when called, returns the derivative of the original function at the specified inputs.
import riemann as rm
# Define function f = x^2
def f(x):
return x ** 2.
# Create derivative function
df = rm.autograd.functional.derivative(f)
# Test the derivative function
x = rm.tensor(2.0, requires_grad=True)
print(df(x)) # Should return tensor(4.0)
# Multi-input example
def g(x, y):
return x * y + x ** 2.
dg = rm.autograd.functional.derivative(g)
x = rm.tensor(2.0, requires_grad=True)
y = rm.tensor(3.0, requires_grad=True)
print(dg(x, y))
jvp() (Jacobian-Vector Product) Function
The jvp() function computes the product of a Jacobian matrix with a given vector.
import riemann as rm
# Define function f = x^2
def f(x):
return x ** 2
# Create input tensor
x = rm.tensor([1.0, 2.0, 3.0], requires_grad=True)
# Define v vector
v = rm.tensor([1.0, 1.0, 1.0])
# Compute jvp
f_x, jvp_val = rm.autograd.functional.jvp(f, x, v)
print(jvp_val)
vjp() (Vector-Jacobian Product) Function
The vjp() function computes the product of a given vector with a Jacobian matrix.
import riemann as rm
# Define function f = x^2
def f(x):
return x ** 2
# Create input tensor
x = rm.tensor([1.0, 2.0, 3.0], requires_grad=True)
# Define v vector
v = rm.tensor([1.0, 1.0, 1.0])
# Compute vjp
f_x, vjp_val = rm.autograd.functional.vjp(f, x, v)
print(vjp_val)
hvp() (Hessian-Vector Product) and vhp() Functions
The hvp() and vhp() functions compute Hessian-Vector Product and Vector-Hessian Product respectively. Since the Hessian matrix is symmetric, hvp() and vhp() are effectively the same.
import riemann as rm
# Define scalar-valued function
def f(x):
return (x ** 3).sum()
# Create input tensor
x = rm.tensor([1.0, 2.0, 3.0], requires_grad=True)
# Define v vector
v = rm.tensor([1.0, 1.0, 1.0])
# Compute hvp
f_x, hvp_val = rm.autograd.functional.hvp(f, x, v)
print(hvp_val)
# vhp computes the same result as hvp
f_x, vhp_val = rm.autograd.functional.vhp(f, x, v)
print(vhp_val)
Custom Gradient Functions
Riemann provides three ways to implement custom functions with gradient tracking support:
Using Riemann Tensor Functions (Automatic Gradients) If you implement your custom function using existing Riemann tensor functions, you get gradient tracking automatically without writing any gradient code:
import riemann as rm def my_custom_function(x): """A custom function that automatically gets gradient support""" return rm.exp(rm.sin(x)) + x**2. # Test automatic gradient tracking x = rm.tensor(1.0, requires_grad=True) y = my_custom_function(x) y.backward() print(f"Gradient: {x.grad}") # Will automatically compute correct gradient
Using track_grad Decorator Use the
track_graddecorator to wrap your function and provide explicit gradient computation.Gradient Function Interface Requirements:
The gradient function passed to
track_gradmust follow these interface requirements:Parameters: Must accept the same parameters as the forward function (same names and order)
Return Value: Must return a
tuplecontaining the gradient (partial derivative) for each input tensorTuple Elements: Each element corresponds to the gradient of the respective input tensor. For tensors that don’t require gradients, return
Nonefor that positionGradient Calculation: The gradient should be computed as the partial derivative of the output with respect to each input
Example for single input:
import riemann as rm import numpy as np def sigmoid_derivative(x): """Gradient function for sigmoid: returns tuple with one element""" sig = 1. / (1. + np.exp(-x.data)) return (rm.tensor(sig * (1. - sig)),) # Note: must return a tuple @rm.track_grad(sigmoid_derivative) def custom_sigmoid(x): """Custom sigmoid function with gradient support""" return rm.tensor(1. / (1. + np.exp(-x.data))) # Test custom sigmoid with gradient x = rm.tensor(0.0, requires_grad=True) y = custom_sigmoid(x) y.backward() print(f"Sigmoid output: {y}") # Should be 0.5 print(f"Sigmoid gradient: {x.grad}") # Should be 0.25
Example for multiple inputs:
import riemann as rm def multiply_derivative(x, y): """Gradient function for multiplication: d(xy)/dx = y, d(xy)/dy = x""" return (y, x) # Returns tuple with gradient for each input @rm.track_grad(multiply_derivative) def custom_multiply(x, y): """Custom multiplication function with gradient support""" return x * y # Test with multiple inputs x = rm.tensor(2.0, requires_grad=True) y = rm.tensor(3.0, requires_grad=True) z = custom_multiply(x, y) z.backward() print(f"z = {z}") # Should be 6.0 print(f"dz/dx = {x.grad}") # Should be 3.0 (y) print(f"dz/dy = {y.grad}") # Should be 2.0 (x)
Using Function Class For more complex cases, you can subclass
Functionand implement bothforwardandbackwardstatic methods.Function Class Interface:
To create a custom function using the
Functionclass, you must implement two static methods:forward(ctx, *inputs)
Purpose: Performs the forward computation
Parameters:
ctx: Context object used to save information for the backward pass. Usectx.save_for_backward()to store tensors needed in backward*inputs: Input tensors (variable number of arguments)
Returns: Output tensor(s) of the forward computation
Usage: Implement your custom computation logic here and save any tensors needed for gradient computation using
ctx.save_for_backward()
backward(ctx, grad_output)
Purpose: Performs the backward (gradient) computation
Parameters:
ctx: Context object containing information saved during forward pass. Access saved tensors viactx.saved_tensorsgrad_output: Gradient of the output tensor (from subsequent layers in the computation graph)
Returns: Tuple of gradients, one for each input tensor. Each gradient should be the product of
grad_outputand the local gradient (partial derivative)Usage: Compute gradients using the chain rule:
grad_input = grad_output * local_gradient
Example:
import riemann as rm import numpy as np class CustomSigmoid(rm.autograd.Function): @staticmethod def forward(ctx, x): """Forward computation for sigmoid Args: ctx: Context object for saving tensors x: Input tensor Returns: Output tensor after applying sigmoid """ sig = 1. / (1. + np.exp(-x.data)) ctx.save_for_backward(rm.tensor(sig)) # Save for backward return rm.tensor(sig) @staticmethod def backward(ctx, grad_output): """Backward computation for sigmoid Args: ctx: Context object with saved tensors grad_output: Gradient from output side Returns: Gradient with respect to input """ sig, = ctx.saved_tensors # Retrieve saved tensor # Chain rule: grad_input = grad_output * local_gradient # local_gradient for sigmoid: sig * (1 - sig) return grad_output * sig * (1. - sig) # Test CustomSigmoid x = rm.tensor(0.0, requires_grad=True) y = CustomSigmoid.apply(x) # Use apply() to call the function y.backward() print(f"Sigmoid output: {y}") # Should be 0.5 print(f"Sigmoid gradient: {x.grad}") # Should be 0.25
Key Points:
Always use
@staticmethoddecorator for bothforwardandbackwardmethodsUse
ctx.save_for_backward()inforwardto save tensors needed for gradient computationAccess saved tensors in
backwardviactx.saved_tensors(returns a tuple)The
backwardmethod must return a tuple with one gradient for each input toforwardCall the function using
ClassName.apply(*inputs), not by instantiating the class
Advanced Computational Graph Manipulation
Riemann provides functions for manually manipulating the computational graph. These functions are designed for special use cases where you need to connect tensors to the computational graph without affecting forward computation values or backward gradient values. These are low-level tools typically used in framework internals (such as Riemann’s hook handling mechanism) rather than common user scenarios.
Supporting Functions and Methods
The following functions and methods are used internally by share_grad_map and are rarely needed directly by users:
fwbw_all_zero Function
Returns a scalar tensor with value 0.0 in forward pass and returns a zero tensor with the same shape as input in backward pass. Used to add a tensor to the computational graph without affecting forward or backward values.
attach_zero_grad_sources Method
Attaches multiple tensors as source tensors to a tensor. This doesn’t change the tensor’s value, but allows it to pass zero gradients to these sources during backward pass. Used internally to connect tensors to the computational graph so they receive zero gradients instead of None.
Gradient Checking
Use the gradcheck function to verify your custom gradient functions are correct:
import riemann as rm
# Define a test function for gradcheck
def test_function(x):
return CustomSigmoid.apply(x)
# Perform gradient check
x = rm.tensor(0.0, requires_grad=True)
check_passed = rm.gradcheck(test_function, (x,))
print(f"Gradient check passed: {check_passed}")
Gradcheck verifies that your analytical gradient computation matches the numerical gradient computed using finite difference method.
Gradient Computation Tips
Memory Management: Gradient computation uses memory to store the computational graph. Use
no_grad()ordetach()when you don’t need gradients to save memory.
Common Pitfalls
In-place Operations: Avoid performing in-place operations on leaf node tensors that require gradient tracking.
Detaching Tensors from Computational Graph: After detaching, tensors lose their computational graph dependencies and cannot perform backward propagation for gradient calculation.
Non-scalar Outputs: Remember to provide gradient arguments when calling
backward()on non-scalar outputs.Memory Leaks: Long-running computations with gradient tracking can consume significant memory.
Examples
Rosenbrock Function Optimization (Banana Function)
Rosenbrock function (also known as banana function) is a classic non-convex optimization problem. The function has its minimum at (1, 1) with value 0.
Here’s an example of optimizing the Rosenbrock function using Riemann’s automatic differentiation and Adam optimizer:
import riemann as rm
from riemann import optim
# Define the Rosenbrock function (banana function)
def rosenbrock_2d(x, y):
"""Rosenbrock function for 2D case"""
return 100. * (y - x**2.)**2. + (1. - x)**2.
# Initialize parameters with gradient tracking
x = rm.tensor(-1.2, requires_grad=True) # Start from point (-1.2, 1.0)
y = rm.tensor(1.0, requires_grad=True)
params = [x, y]
# Setup optimizer
optimizer = optim.Adam(params, lr=0.05)
print("Optimizing Rosenbrock function (banana function):")
print(f"Initial x: {x.item():.4f}, y: {y.item():.4f}")
print(f"Initial loss: {rosenbrock_2d(x, y).item():.4f}")
# Perform optimization
for i in range(1000):
loss = rosenbrock_2d(x, y)
# Reset gradients
optimizer.zero_grad()
# Compute gradients automatically
loss.backward()
# Update parameters
optimizer.step()
# Print progress every 200 iterations
if i % 200 == 0:
print(f"Iteration {i}: loss = {loss.item():.8f}, x = {x.item():.8f}, y = {y.item():.8f}")
# Print final results
print(f"\nOptimization completed!")
print(f"Final x: {x.item():.10f}, y: {y.item():.10f}")
print(f"Final loss: {loss.item():.10f}")
print(f"Theoretical minimum: x=1.0, y=1.0, loss=0.0")