Differentiation - the chain rule (proof)
Categories: differentiation calculus

The chain rule allows us to differentiate composite functions. In this article, we will prove the chain rule, but first, we will quickly look at what the chain rule is, with a simple example (for more details, see this article). A composite function takes the form:
Where f(x) and g(x) are two functions that each take a single variable. Here is an example composite function:
In this case, the composite function is formed from these two basic functions:
So we first find the square of x, then we find the cosine of the result. The chain rule tells us the derivative of the composite function y. It has the following form:
So to find the derivative of y, we first find the derivative of f, and apply it to the result of g(x). We then multiply it by the derivative of g. Here are the derivatives of f and g, which are both standard results:
Let's substitute f' and g' in the previous expression for y':
Then we can substitute g(x) and simplify, to give the final result:
Proof of the chain rule
How do we prove this rule? It isn't too difficult, we start from the standard first principles definition of the derivative:
It is useful to rewrite this formula to express the derivative at a:
This is the same as the previous formula, except that we have replaced the original x with a, and the original h with x - a. These two graphs show the two cases if you are unconvinced:
The two graphs are identical, except that they have been labelled differently. The LHS shows a band that starts at horizontal position x and ends at horizontal position x + h, and therefore has a width h. The RHS shows a band that starts at a and ends at x, and therefore has a width x - a.
In our case, y(x) has the form f(g(x)):
Now we will multiply top and bottom by the term g(x) - g(a), like this:
It is valid to do this provided g(x) and g(a) are not equal (otherwise we would be multiplying by 0/0, which is undefined - we will ignore this special case for now, but come back to it later). Now let's rearrange the terms in the denominator:
Finally, we will use the rule that the limit of a product is equal to the product of the limits. This basically means that we can move the limits inside the multiplication:
Looking at the limit on the RHS, this expression is identical to the expression for y' we used earlier, but using on g ather than y. This means that the expression is equal to g':
The limit on the LHS is also very similar, except that it uses f(g(a)) rather than y(a). So the expression is equal to f'(g(a)):
Substituting both of these into the original equation gives:
This equation depends only on a. Of course, we could call the variable x, so the expression becomes identical to the one we wish to prove:
So we have proved the chain rule.
Caveat - when g(x) = g(a) close to a
As we noted earlier, the proof requires care in the case where g(x) - g(a) might be zero, to avoid the possibility of multiplying by 0/0. But remember that we are taking the limit as x tends to a, so a situation like this is not a problem:
The red line shows some function g(x). On the LHS, g(a) and g(x) have the same value, but that doesn't matter because x is not close to a. The RHS shows the situation where x is closer to a, and in that case we can see that as x continues to get closer and closer to a (without reaching a), it will never be equal to a because the curve has non-zero slope in that region.
Where we do get a potential problem is if the function g(x) is horizontal (ie a constant value), either everywhere or just in some region around a. This is shown in the graph below:
On the LHS, g(a) and g(x) have the same value. But on the RHS, g(a) and g(x) still have the same value, because g is constant in that whole region. So we cannot use the technique of multiplying the top and bottom by g(x) - g(a) in this particular case.
Fortunately, this isn't a serious problem. We have a region where g is constant with some value that we will call C, so:
This also means that y (which is f(g(x))) will be constant in that region. because g(x) is constant, so:
So the derivatives of y and g are 0 in the region:
Looking at the chain rule, in this special case, both sides of the equation have zero terms (y' and g'), so they are equal:
Which proves the chain rule in the special case where g(x) - g(a) is zero over a particular region.
See also
- Slope of a curve
- Differentiation from first principles - x²
- Second derivative and sketching curves
- Differentiation - the product rule
- Differentiation - the quotient rule
- Differentiation - the chain rule
- Differentiation - derivative of an inverse function
- Differentiation from first principles - a to the power x
- Derivative of sine, geometric proof
- Differentiation - L'Hôpital's rule

Join the GraphicMaths Newletter
Sign up using this form to receive an email when new content is added:
Popular tags
adder adjacency matrix alu and gate angle answers area argand diagram binary maths cartesian equation chain rule chord circle cofactor combinations complex modulus complex polygon complex power complex root cosh cosine cosine rule countable cpu cube decagon demorgans law derivative determinant diagonal directrix dodecagon eigenvalue eigenvector ellipse equilateral triangle euler eulers formula exercises exponent exponential exterior angle first principles flip-flop focus gabriels horn gradient graph hendecagon heptagon hexagon horizontal hyperbola hyperbolic function hyperbolic functions infinity integration by parts integration by substitution interior angle inverse hyperbolic function inverse matrix irrational irregular polygon isosceles trapezium isosceles triangle kite koch curve l system line integral locus logarithm maclaurin series major axis matrix matrix algebra mean minor axis n choose r nand gate net newton raphson method nonagon nor gate normal normal distribution not gate octagon or gate parabola parallelogram parametric equation pentagon perimeter permutations polar coordinates polynomial power probability probability distribution product rule proof pythagoras proof quadrilateral questions radians radius rectangle regular polygon rhombus root sech segment set set-reset flip-flop sine sine rule sinh sloping lines solving equations solving triangles square square root standard curves standard deviation star polygon statistics straight line graphs surface of revolution symmetry tangent tanh transformation transformations trapezium triangle turtle graphics uncountable variance vertical volume volume of revolution xnor gate xor gate