Seeing the Chain Rule

Being a believer in the Rule of Four, I have been trying for years to find a good visual (graphical) illustration of why or how the Chain Rule for derivatives works. This very simple example is the best I could come up with.

Consider the function y=\sin \left( x \right),0\le x\le 2\pi . (See figure 1. A tangent segment at \displaystyle \left( {\frac{\pi }{3},\sin \left( {\frac{\pi }{3}} \right)} \right) is drawn.) As you know, this function’s values go smoothly from 0 to 1 to 0 to –1 and back to 0. The slopes of its tangent line, its derivative, appears to go from 1 to 0 to –1 to 0 to 1 as you would expect knowing its derivative is  \frac{{dy}}{{dx}}=\cos \left( x \right). (See figure 2)

Consider the function  y=\sin \left( {3x} \right),0\le x\le 2\pi (See figure 3. A tangent line at \displaystyle \left( {\frac{\pi }{9},\sin \left( {\frac{\pi }{9}} \right)} \right)  is drawn) This takes on all the values of the sine function three times between 0 and  2\pi . It goes through the same values three times as fast and therefore, its rate of change (yeah, the derivative) should be three times as much. Compare the tangent lines in Figures 1 and 3. This agrees with the derivative found by the Chain Rule:  \frac{{dy}}{{dx}}=3\cos \left( {3x} \right). See figure 4)

Next, consider the function y=\sin \left( {\tfrac{1}{2}x} \right),0\le x\le 2\pi (See figure 5. A tangent line at \displaystyle \left( {\frac{{2\pi }}{3},\sin \left( {\frac{{2\pi }}{3}} \right)} \right) is drawn.). This time the function is stretch and only goes through half its period. So, It goes through the same values half as fast as the original and the slope is only half as steep as the original. Compare the tangent lines in Figures 1 and 5.Therefore, the rate of change the derivative, should be only half the original’s. So,  \frac{{dy}}{{dx}}=\frac{1}{2}\cos \left( {\tfrac{1}{2}x} \right) (See figure 6)

I hope this helps your students see what’s happening with the Chain Rule, at least a little bit. I’d be happy to hear and share any ideas you have to illustrate the Chain Rule graphically.

There is a movable Desmos graph here to help illustrate all of this.

Here are links to other posts on the Chain Rule

Foreshadowing the Chain Rule

The Power Rule Implies Chain Rule

The Chain Rule

Derivative Practice – Numbers

Derivative Practice – Graphs

Experimenting with CAS – Chain Rule


The Chain Rule

Most of the function students are faced with in beginning calculus are compositions of the Elementary Functions. The Chain Rule allows you to differentiate composite functions easily. The posts listed below are ways to introduce and use the Chain Rule.

Experimenting with a CAS – Chain Rule  Using a CAS to discover the Chain Rule

Power Rule Implies Chain Rule and Foreshadowing the Chain Rule the same ideas.

The Chain Rule

Revised from 9-19-2017

The Chain Rule

Most of the function students are faced with in beginning calculus are compositions of the Elementary Functions. The Chain Rule allows you to differentiate composite functions easily. The posted listed below are ways to introduce and then use the Chain Rule.

Experimenting with a CAS – Chain Rule  Using a CAS to discover the Chain Rule

Power Rule Implies Chain Rule and Foreshadowing the Chain Rule the same ideas.

The Chain Rule





Good Question 10 – The Cone Problem

Today’s good question is an optimization problem, but its real point is choosing how to do the computation. As such it relates to MPAC 3a and 3b: “Students can  … select appropriate mathematical strategies [and] sequence algebraic/computational processes logically.” The algebra required to solve this questions can be quite daunting, unless you get clever. Here’s the question.

A sector of arc length x is removed from a circle of radius 10 cm. The remaining part of the circle is formed into a cone of radius r and height h,

  1. Find the value of x so that the cone has the maximum possible volume.
  2. The sector that was removed is also formed into a cone. Find the value of x that makes this cone have it maximum possible volume. (Hint: This is an easy problem.)
  3. In the context of the problem, the expression for the volume of the cone in part a. has a domain of 0\le x\le 20\pi . Why? Ignore the physical situation and determine the domain of the expression for the volume from a. Graph the function. Discuss.


Part a: As usual, we start by assigning some variables.


Let r be the radius of the base of the cone and let h be its height. The circumference of the cone is 2\pi r=20\pi -x, so r=10-\frac{x}{2\pi } and h=\sqrt{{{10}^{2}}-{{r}^{2}}}. The volume of the cone is

\displaystyle V=\frac{\pi }{3}{{r}^{2}}h=\frac{\pi }{3}{{r}^{2}}\sqrt{{{10}^{2}}-{{r}^{2}}}=\frac{\pi }{3}{{\left( 10-\frac{x}{2\pi } \right)}^{2}}\sqrt{{{10}^{2}}-{{\left( 10-\frac{x}{2\pi } \right)}^{2}}}

To find the maximum, the next step is to differentiate the volume. The expression on the right above looks way complicated and its derivative will be even worse. Simplifying it is also a lot of trouble, and, in fact, does not make things easier.* Here’s where we can be clever and avoid a lot of algebra. Let’s just work from \displaystyle V=\frac{\pi }{3}{{r}^{2}}\sqrt{{{10}^{2}}-{{r}^{2}}}

To find the maximum differentiate the volume with respect to x using the chain rule.

\displaystyle \frac{dV}{dx}=\frac{dV}{dr}\cdot \frac{dr}{dx}=\frac{\pi }{3}\left( {{r}^{2}}\frac{-2r}{2\sqrt{{{10}^{2}}-{{r}^{2}}}}+2r\sqrt{{{10}^{2}}-{{r}^{2}}} \right)\left( -\frac{1}{2\pi } \right)

Setting this equal to zero and simplifying (multiply by -6\sqrt{{{10}^{2}}-{{r}^{2}}}) gives

-{{r}^{3}}+2r\left( 100-{{r}^{2}} \right)=200r-3{{r}^{3}}=0

\displaystyle r=0,r=\sqrt{\frac{200}{3}}=\frac{10\sqrt{6}}{3}

The minimum is obviously r = 0, so the maximum occurs when  \displaystyle r=10-\frac{x}{2\pi }=\frac{10\sqrt{6}}{3}. Then, solving for x gives

\displaystyle x=2\pi \left( 10-\frac{10\sqrt{6}}{3} \right)\approx 11.52986

Aside: We often see questions saying, if y = f(u) and ug(x), find dy/dx. Here we have put that idea to practical use to save doing a longer computation.

Part b: The arc of the piece cut out is the circumference, x, of a cone with a radius of \displaystyle {{r}_{1}}=\frac{x}{2\pi } and a height of \displaystyle {{h}_{1}}=\sqrt{{{10}^{2}}-{{r}_{1}}^{2}}. Its volume is

\displaystyle V=\frac{\pi }{3}{{r}_{1}}^{2}\sqrt{{{10}^{2}}-{{r}_{1}}^{2}}

This is the same as the expression we used in part a. and can be handled the same way, except that here \displaystyle \frac{d{{r}_{1}}}{dx}=+\frac{1}{2\pi }. The computation and result will be the same. The result will be the same. The maximum occurs at

\displaystyle x=2\pi \left( 10-\frac{10\sqrt{6}}{3} \right)\approx 11.52986

This should not be a surprise.  The piece cut out and the piece that remains are otherwise indistinguishable, so the maximum volume should be the same for both.

Part c: From part a we have \displaystyle V=\frac{\pi }{3}{{r}^{2}}\sqrt{{{10}^{2}}-{{r}^{2}}}=\frac{\pi }{3}{{\left( 10-\frac{x}{2\pi } \right)}^{2}}\sqrt{{{10}^{2}}-{{\left( 10-\frac{x}{2\pi } \right)}^{2}}}. To graph there is no need to simplify the expression in x:

Tthe x scale marks are at multiples of $latex 5\pi $

The x-scale marks are at multiples of 5\pi

The domain is determined by the expression under the radical so

-10\le r\le 10

-10\le 10-\frac{x}{2\pi }\le 10

0\le x\le 40\pi

This is the “natural domain” of the function without regard to the physical situation given in the original problem. I cannot think of a reason for the difference.


*Fully simplified in terms of x the volume is \displaystyle V=\frac{1}{24{{\pi }^{2}}}{{\left( 20\pi -x \right)}^{2}}\sqrt{40\pi x-{{x}^{2}}}. This isn’t really easier to differentiate and solve.


Power Rule Implies Chain Rule

Having developed the Product Rule d\left( uv \right)=u{v}'+{u}'v and the Power Rule \frac{d}{dx}{{x}^{n}}=n{{x}^{n-1}} for derivatives in your class, you can explore similar rules for the product of more than two functions and suddenly the Chain Rule will appear.

For three functions use the associative property of multiplication with the rule above:

d\left( uvw \right)=d\left( \left( uv \right)w \right)=u\cdot v\cdot dw+w\cdot d(uv)=u\cdot v\cdot dw+w\left( udv+vdu \right)

So expanding with a slight change in notation:

d\left( uvw \right)=uv{w}'+u{v}'w+u'vw

For four factors there is a similar result:

d\left( uvwz \right)=uvw{z}'+uv{w}'z+u{v}'wz+{u}'vwz

Exercise: Let {{f}_{i}} for i=1,2,3,...,n be functions. Write a general formula for the derivative of the product {{f}_{1}}{{f}_{2}}{{f}_{3}}\cdots {{f}_{n}} as above and in sigma notation


d\left( {{f}_{1}}{{f}_{2}}{{f}_{3}}\cdots {{f}_{n}} \right)={{f}_{1}}{{f}_{2}}{{f}_{3}}\cdots {{{f}'}_{n}}+{{f}_{1}}{{f}_{2}}{{{f}'}_{3}}\cdots {{f}_{n}}+{{f}_{1}}{{{f}'}_{2}}{{f}_{3}}\cdots {{f}_{n}}+\cdots +{{{f}'}_{1}}{{f}_{2}}{{f}_{3}}\cdots {{f}_{n}}

\displaystyle d\left( {{f}_{1}}{{f}_{2}}{{f}_{3}}\cdots {{f}_{n}} \right)=\sum\limits_{i=1}^{n}{\frac{{{f}_{1}}{{f}_{2}}{{f}_{3}}\cdots {{f}_{n}}}{{{f}_{i}}}{{{{f}'}}_{i}}}

This idea may now be used  to see the Chain Rule appear. Students may guess that d{{\left( f \right)}^{4}}=4{{\left( f \right)}^{3}}, but wait there is more to it.

Write {{\left( f \right)}^{4}}=f\cdot f\cdot f\cdot f\text{ }. Then from above

d{{\left( f \right)}^{4}}=d\left( f\cdot f\cdot f\cdot f\text{ } \right)=f\cdot f\cdot f\cdot {f}'+f\cdot f\cdot {f}'\cdot f+f\cdot {f}'\cdot f\cdot f+{f}'\cdot f\cdot f\cdot f

d{{\left( f \right)}^{4}}=4{{\left( f \right)}^{3}}{f}'\text{ }

Looks just like the power rule, but there’s that “extra” {f}'. Now you are ready to explain about the Chain Rule in the next class.

Foreshadowing the Chain Rule

I assigned another very easy but good problem this week. It was simple enough, but it gave a hint of things to come.

Use the Product Rule to find the derivative of {{\left( f\left( x \right) \right)}^{2}}.

Since we have not yet discussed the Chain Rule, the Product Rule was the only way to go.

\frac{d}{dx}{{\left( f \right)}^{2}}=\frac{d}{dx}\left( f\cdot f \right)=f\cdot {f}'+{f}'\cdot f=2f\cdot f'

 And likewise for higher powers:

\frac{d}{dx}{{f}^{3}}=\frac{d}{dx}\left( f\cdot f\cdot f \right)=f\cdot f\cdot {f}'+f\cdot {f}'\cdot f+{f}'\cdot f\cdot f=3{{f}^{2}}{f}'

If you just look at the answer, it is not clear where the {f}' comes from. But the result foreshadows the Chain Rule.

Then we used the new formula to differentiate a few expressions such as {{\left( 4x+7 \right)}^{2}} and {{\sin }^{2}}\left( x \right) and a few others.

Regarding the Chain Rule: I have always been a proponent of the Rule of Four, but I have never seen a good graphical explanation of the Chain Rule. (If someone has one, PLEASE send it to me – I’ll share it.)

Here is a rough verbal explanation that might help a little.

Consider the graph of y=\sin \left( x \right). On the interval [0,2\pi ] it goes through all its value in order once – from 0 to 1 to 0 to -1 and back to zero. Now consider the graph of y=\sin \left( 3x \right). On the interval \left[ 0,\tfrac{2\pi }{3} \right] it goes through all the same values in one-third of the time. Therefore, it must go through them three times as fast. So the rate of change of y=\sin \left( 3x \right) between 0 and \tfrac{2\pi }{3} must be three times the rate of change of y=\sin \left( x \right). So the rate of change of  must be 3\cos \left( 3x \right). Of course this rate of change is the slope and the derivative.

The Chain Rule

Except for the simplest functions, a procedure known as the Chain Rule is very helpful and often necessary to find derivatives. You can start with an example such as finding the derivative of  {{\left( 2x+7 \right)}^{2}}.  Most students will expand the binomial to get 4{{x}^{2}}+28x+49 and differentiate the result to get 8x+28. They will try the same approach with {{\left( 2x+7 \right)}^{3}} and then you can hit them with {{\left( 2x+7 \right)}^{53}}.  They will see the need for a short cut at once. What to do?

The explanation runs like this. Let u\left( x \right)={{x}^{53}} and let v\left( x \right)=2x+7. Then our original expression becomes {{\left( 2x+7 \right)}^{53}}=u\left( v\left( x \right) \right) a composition of functions. The Chain Rule is used for differentiating compositions. Students must get good at recognizing compositions. The differentiation is done from the outside, working inward.  It is done in the exact opposite order than the procedure for evaluating expression. To evaluate the expression above you (1) evaluate the expression inside the parentheses and the (2) raise that result to the 53 power. To differentiate you (1) use the power rule to differentiate the 53 power of whatever is inside, this gives 53{{\left( 2x+7 \right)}^{52}}, the (2) differentiate the \left( 2x+7 \right) which give 2 and multiply the results: 53{{\left( 2x+{{7}^{52}} \right)}^{52}}(2)=106{{\left( 2x+7 \right)}^{52}}. Symbolically, this looks like {u}'\left( v\left( x \right) \right){v}'\left( x \right) or {f}'\left( g\left( x \right) \right){g}'\left( x \right). This can be extended to compositions of more than two functions:

\displaystyle \frac{d}{dx}f\left( g\left( h\left( x \right) \right) \right)={f}'\left( g\left( h\left( x \right) \right) \right){g}'\left( h\left( x \right) \right){h}'\left( x \right)

The cartoon below is from Courtney Gibbons’ great collection of math cartoons ( may help you kids remember this:

I have been looking for a way to illustrate the Chain Rule graphically, but to no avail. The closest I could come up with is this: Consider f\left( x \right)=\sin \left( 3x \right). This function takes on all the values of y=\sin \left( x \right) in order in one-third the time. (That is its period is one-third of the period of y=\sin \left( x \right). Since this is true, it must go through the values three times as fast; thus, its derivative (it’s rate of change) must be three times the derivative of the sine: {f}'\left( x \right)=3\cos \left( 3x \right).

The students will need some practice on using the Chain Rule. I suggest a number of simple (single compositions) first and then a few longer ones and maybe one or two “monsters” just for fun once they get the idea.

The Chain Rule doesn’t end with just being able to differentiate complicated expressions; it will also form the basis for implicit differentiation, finding the derivative of a function’s inverse and Related Rate problems among others things.

Finally, here is a way to develop the Chain Rule which is probably different and a little more intuitive from what you will find in your textbook. (After a suggestion by Paul Zorn on the AP Calculus EDG October 14, 2002)

Let f be a function differentiable at x=a, and let g be a function that is differentiable at x=b and such that g\left( b \right)=a. Then, near x=a we can use the local linear approximation of f and g to find  \frac{d}{dx}f\left( g\left( b \right) \right):

f\left( x \right)\approx f\left( a \right)+{f}'\left( a \right)\left( x-a \right)

f\left( g\left( x \right) \right)\approx f\left( a \right)+{f}'\left( a \right)\left( g\left( x \right)-a \right)=f\left( a \right)+{f}'\left( a \right)g\left( x \right)-a {f}'\left( a \right)

\displaystyle \frac{d}{dx}f\left( g\left( x \right) \right)=0+{f}'\left( a \right){g}'\left( x \right)-0

\displaystyle\frac{d}{dx}f\left( g\left( b \right) \right)={f}'\left( g\left( b \right) \right){g}'\left( b \right)