brief intro to autograd in pytorch

27 Mar, 2025

Background

PyTorch is known for its automatic differentiation (autograd) system, which computes derivatives of tensor operations. This enables training machine learning models by automating gradient calculations during the forward pass and computing derivatives efficiently during backpropagation.

To support this, PyTorch builds computational graphs. In this graph:

Nodes represent tensors or operations.
Edges represent data flow between operations.

To include a tensor in the graph, set requires_grad=True. This signals PyTorch to track operations involving the tensor for later gradient computation.

Calling .backward() initiates reverse-mode automatic differentiation, where PyTorch applies the chain rule from the output back to the inputs. Gradients are stored in each tensor’s .grad attribute.

PyTorch’s graphs are dynamic: built at runtime and only store operation references and tensor dependencies, not static graphs like in TensorFlow 1.x. Each operation records its parent tensors and the function used to generate the result.

The gradient computation follows the chain rule:

\frac{d L}{d x} = \frac{d L}{d y} \cdot \frac{d y}{d x}

where $L$ is the loss and $y = f (x)$ is an intermediate variable.

Code Example

Example 1: Basic multiplication

import torch 

x = torch.tensor([2.0], requires_grad=True)
y = torch.tensor([3.0], requires_grad=True)
C = x * y
print(C)
# Output: tensor([6.], grad_fn=<MulBackward0>)

Explanation:

C is 6.0 from $2.0 \times 3.0$
grad_fn=<MulBackward0> indicates that C is a result of a tracked multiplication operation and part of the computational graph.

Example 2: Chained operations

x = torch.tensor([2.0], requires_grad=True)
y = torch.tensor([3.0], requires_grad=True)

z = x * y        # multiplication
w = z + 2        # addition
q = w ** 2       # squaring

Inspecting the graph step-by-step:

curr = q.grad_fn
print(curr)
# Output: <PowBackward0>  ← corresponds to `w ** 2`

curr = curr.next_functions[0][0]
print(curr)
# Output: <AddBackward0>  ← corresponds to `z + 2`

curr = curr.next_functions[0][0]
print(curr)
# Output: <MulBackward0>  ← corresponds to `x * y`

Each next_functions[0][0] gives the preceding operation that fed into the current result.

Conclusion

This example illustrates how PyTorch builds a dynamic computational graph and how operations are linked. If requires_grad=True, each tensor becomes part of the graph, enabling backpropagation via .backward(). You can explore more by chaining operations and walking through grad_fn to see how gradients are propagated.

Declaration of LLM Usage: I have used LLM to generate codes, and paraphrase paragraphs for better explain the concepts.