The Derivative, Part 7

So let’s recap: we have a rule to find derivatives of basic functions using a table, a rule to handle a function that is multiplied by a constant, a rule to handle the addition or subtraction of two (or more) functions, a rule to handle the multiplication of two (or more) functions, and a rule to handle the division of two functions. I also did an example where several of these rules can be used finding the derivative of a single function. You would think that this would exhaust all the possibilities and that you can now differentiate any function in the known universe. But alas, there is one more, perhaps the most powerful, rule yet to be presented.

This new rule is called the chain rule, so called because it allows you to find the derivative of a function, of a function, of a function, and so on.

Now there is a textbook way to present this rule and an intuitive way which I like to use. I find that the textbook approach can be confusing because there are several variables variables to keep track of. I will present both ways so that you may see the connection between the two and have a better understanding of the chain rule.

The textbook approach to the chain rule is a bit easier to see if we forego functional notation and go back to using y. However, whenever you have a function of a function f[g(x)], the chain rule is to be used. Functions like this are called composite functions. For example,

\[ f( x) =\text{sin}\left( x^{2}\right)\]

Here, g(x) = x² and f(x) = sin(x). So f(x²) = sin(x²).

In the textbook approach you let u be the inner function (that is the function you are using as the argument for the outer function) and you let y be the function after you replace the inner function with u. I will give an explanation later on how to identify the inner and outer functions if that is not clear.

So in this case, u = x², so y = sin(u). The textbook chain rule is

\[\frac{dy}{dx} =\frac{dy}{du} \times \frac{du}{dx}\]

This may look scary but let me repeat this rule in English: the derivative of a composite function is the derivative of y with respect to u times the derivative of u with respect to x. So in our example, dy/du = cos(u) (using the table) and du/dx = 2x. Multiplying these together and replacing u with its definition, we get

\[\frac{dy}{dx} =\text{cos}( u)\times 2x\ =\ 2x\ \text{cos}( x^{2})\]

So to further explain what inner and outer functions are, suppose you wanted to take our example function and calculate its value for a certain number for x. The first thing you would do is take that number and square it. The squaring function is the inner function since it is the first thing you would do. Then you would take the sine of that squared number. The sine function then is the outer function as that is the last operation you would do.

So I explain the chain rule as follows: Take the derivative of the outer function of ‘something’ keeping the ‘something’ intact, but since the ‘something’ is not just ‘x‘ you need to multiply the result by the derivative of that ‘something’.

In this example, the ‘something’ is x². So the derivative of the sine of that ‘something’ is cos(x²), but I then need to multiply this by the derivative of the ‘something’. The derivative of x² is 2x, so the result is 2x cos(x²).

So now let’s reverse the roles of the the inner and outer functions. Consider the derivative of [sin(x)]². A very common shortcut notation for the square of a trig function like this is [sin(x)]² = sin²(x). Again, imagine actually calculating this for a particular value of x. You would first take the sine of that number (the inner function) then square the result (the outer function). We know that the derivative of the square of ‘something’ is 2 times that ‘something’ to the first power which in this case is 2 sin(x). But to compensate for this simplification, we need to multiply the result by the derivative of that ‘something’. In this case, the derivative of sin(x) is cos(x), so the final answer is 2 sin(x)cos(x).

Now to get comfortable with this, we need to do some more examples. I will do that in my next post.

The Derivative, Part 6

Last time I presented the multiplication rule of differentiation to be used when given a function that is the multiplication of two or more other functions. As you would guess, there is also a rule that handles the division of two functions.

Let’s say you have the function

\[ f( x) =\frac{x^{2}}{\text{sin}( x)}\]

This one can be solved with the multiplication rule if you remember that 1/sin(x) = csc(x). But as I haven’t told you what the derivative of csc(x) is, we are stuck using the following division rule. But this highlights the point that as we get deeper into maths, there are often several ways to solve a problem. The maths “arteest” is one that solves a problem elegantly.

So the following rule is the division rule. Again, I will use u(x) and v(x) to split the function up into its parts. If you have a function of the form

\[f( x) =\frac{u( x)}{v( x)}\]

then the derivative of f(x) is

\[f'( x) =\frac{u( x) v'( x) -u'( x) v( x)}{[ v( x)]^{2}}\]

As you can see, this rule is a bit more complex which is why you would use a simpler rule if possible. But it is still relatively easy to use if you keep track of which part is u and which part is v.

Using the example function above,

\[\begin{array}{{>{\displaystyle}l}} u( x) =x^{2} ,\ \ \ \ \ v( x) =\text{sin}( x)\\ u'( x) =2x,\ \ \ \ v'( x) =\text{cos}( x) \end{array}\]

So according to the division rule,

\[f'( x) =\frac{x^{2}\text{cos}( x) -2x\text{sin}( x)}{\text{sin}^{2}( x)}\]

Now you can use many rules in a single differentiation problem consider

\[f( x) =\frac{x^{2} e^{x}}{\text{sin}( x)}\]

Here, the numerator is a multiplication of two functions. So when using the division rule, you need to apply the multiplication rule for the u‘ part:

\[ \begin{array}{{>{\displaystyle}l}}
u( x) \ =\ x^{2} e^{x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ v( x) =\text{sin}( x)\\
u'( x) =x^{2} e^{x} +2xe^{x} \ \ \ \ \ v'( x) =\text{cos}( x)
\end{array}\]

I’ll leave it as an exercise for you to see if I correctly found u‘(x) using the multiplication rule. I used the fact that (as seen from the table I provided a couple of posts before) is its own derivative. Anyway, using the division rule,

\[f'( x) =\frac{x^{2} e^{x}\text{cos}( x) -\left( x^{2} e^{x} +2xe^{x}\right)\text{sin}( x)}{\text{sin}^{2}( x)}\]

So you might be thinking that you can differentiate any function as long as you know the derivatives of the individual parts. So how would you differentiate

\[ f( x) =\text{sin}\left( x^{2}\right) ?\]

This is not a multiplication of functions, but rather a function of a function. I will introduce the very powerful chain rule as it applies to differentiation in my next post.

The Derivative, Part 5

So last time, I provided a table of derivatives given a function that is of a particular form. Because of rules 3 and 4 (you will need to see my last post to see what these are), along with the other entries in the table, you can now differentiate many functions not specifically in the table. But there are still many functions that you cannot differentiate without other rules. For example, if

\[f( x) =x^{2}\text{sin}( x)\]

there is no table entry to help you. Even though you can differentiate x² and sin(x) separately, there is no rule in the table that allows you to differentiate their multiplication together since they are both functions of x, that is, neither one is just a constant. You can’t use rule 4 here.

There is a differentiation rule that handles this. It is the multiplication rule and it states that if you have a function of the form

\[f( x) =u( x) v( x)\]

then the derivative is

\[f'( x) =u( x) v'( x) +u'( x) v( x)\]

This can be proven using the basic definition of a derivative, but you can just take my word for it.

So in the example at the beginning of this post,

\[\begin{array}{{>{\displaystyle}l}}
u( x) =x^{2} ,\ \ \ \ \ v( x) =\text{sin}( x)\\
u'( x) =2x,\ \ \ \ v'( x) =\text{cos}( x)
\end{array}\]

where I used the table in my last post to find the individual derivatives. So according to the rule,

\[f'( x) =x^{2}\text{cos}( x) +2x\text{sin}( x)\]

Now this rule can be extended to handle more than two functions multiplied together. If

\[f( x) =u( x) v( x) w( x)\]

then you can use the original rule twice, or

\[ f'( x) =u( x) v( x) w'( x) +u( x) v'( x) w( x) +u'( x) v( x) w( x)\]

I think you can see the pattern here. So if

\[\begin{array}{{>{\displaystyle}l}}
f( x) =x^{2}\text{sin}( x)\text{cos}( x)\\
u( x) =x^{2} ,\ \ v( x) =\text{sin}( x) ,\ \ \ w( x) =\text{cos}( x)\\
u'( x) =2x,\ \ v'( x) =\text{cos}( x) ,\ \ w'( x) =-\text{sin}( x)
\end{array}\]

So the derivative is

\[f'( x) =-x^{2}\text{sin}^{2}( x) +x^{2}\text{cos}^{2}( x) +2x\text{sin}( x)\text{cos}( x)\]

Now this can be simplified using trig identities but I will leave it here.

What about a function that’s a division of two functions? Yes there is rule for that as well, but I’ll cover that in my next post.

The Derivative, Part 4

Last time I provided some general rules for finding the derivatives for functions of different forms. Let me summarise these and provide some new ones as well. The new ones can be developed using the basic definition of the derivative. Letters a, and n are constants and are not a function variable:

f(x)f‘(x)
1\[a\]\[0\]
2\[x^{n}\]\[nx^{n-1}\]
3\[g(x)±h(x)\]\[g'(x)±h'(x)\]
4\[ag(x)\]\[ag'(x)\]
5\[e^{ax}\]\[ae^{x}\]
6\[\text{sin}(nx)\]\[n\text{cos}(nx)\]
7\[\text{cos}(nx)\]\[-n\text{sin}(nx)\]
8\[\text{tan}(nx)\]\[n\text{sec}^{2}(nx)\]

These rules can be used for more than the explicit function forms included, especially using rules 3 and 4. For example if

\[f( x) =3x^{3} -2x^{2} +5x-7\]

then by using rules 2, 3, and 4 you can find the derivative as

\[f'( x) =9x^{2} -4x +5\]

Now let’s look at a more complex function:

\[f( x) =3\text{sin}( 2x) -2\text{cos}( 3x) -0.25x^{2}\]

where x is in radians. We can use rules 2, 3, 4, 6, and 7 and take the derivative of each term to get

\[f'( x) =6\text{cos}( 2x) +6\text{sin}( 3x) -0.5x\]

Now let’s look at a common use for derivatives. It is often needed to find the maximum and minimum of a function. Let’s look at the function f(x) = 3x³-10x²+9x:

We would like to know where (the x value) the peak (local maximum) and the local minimum occur and what the values of the function are at those points. As you have seen before, the gradient of the tangent lines at these points are zero. Since the derivative of a function gives us the gradient, we can find the derivative and find the values of x that make it zero. Using our rules for derivatives, f‘(x) = 9x²-20x+9. So we want to find the solutions to

f‘(x) = 9x²-20x+9 = 0

Using the quadratic formula, the two solutions are x = 0.627 and 1.595. We can evaluate the original function at these values of x to get the two points (0.627, 2.451) as the local maximum and (1.595, 1.088) as the local minimum.

A practical use of this is to find the maximum height a ball achieves that is thrown up into the air. Using physics to come up with the equations of motion of the ball, one can find the answer.

Even though I have shown that we can now differentiate a plethora of functions, there are still some functional forms that we cannot differentiate using the rules presented so far. I will cover some new rules in my next post.

The Derivative, Part 3

Now that we have some confidence that the derivative definition gives correct results of functions that we know the answer to, let’s look at a functional form where the answer is not known.

Consider f(x) = x². As you know, this function plots as the standard parabola. The slope of a tangent line on this curve (its rate of change) is not constant, unlike the cases we have looked at before, but it depends on where we are on the curve:

So we again start with the basic definition of the derivative:

\[f'( x) =\lim _{h\rightarrow 0}\frac{f( x+h) -f( x)}{h} = \lim _{h\rightarrow 0}\frac{( x+h)^{2} -x^{2}}{h} =\] \[ \lim _{h\rightarrow 0}\frac{x^{2} +2xh+h^{2} -x^{2}}{h} =\lim _{h\rightarrow 0}\frac{h( 2x+h)}{h} =\lim _{h\rightarrow 0}( 2x+h) =2x\]

So again, we do some algebraic manipulation that gets rid of the h in the denominator. Remember, as we are taking the limit as h approaches 0, the x is essentially treated as a constant. So the final answer is f‘(x) = 2x. Refering back to the graph, this satisfies the tangent line slopes at -1 and 1: f‘(-1) = -2, f‘(1) = 2. At any other point on the graph, just evaluate f‘(x) = 2x to find the rate of change of f(x) = x² at a particular x.

Now do you have to evaluate the definition for every different function you come across? Thankfully, the answer is no. Mathematicians have long ago done the hard work for you but because of the properties of limits, many general rules can be made. For example, if you know the derivative of a function, but what you have is the same function but multiplied by a constant, the derivative of this new function is just the same constant times the derivative of the old function. For example, we now know that for f(x) = x², f‘(x) = 2x. But what about g(x) = 3x²? Well, g‘(x) will just be 3 times the derivative of x², so g‘(x) = 6x.

So the rule is, if g(x) = af(x) where a is a constant number, then g‘(x) = af‘(x). Another generic rule is that the derivative of a sum of functions is the sum of the individual derivatives: If h(x) = f(x) + g(x), then h‘(x) = f‘(x) + g‘(x).

It turns out that if

\[f( x) =ax^{n}\]

where n is any real number except -1, then

\[f'( x) =anx^{n-1}\]

So to find the derivative in this case, you just multiply the function by n and reduce the value of the exponent by 1.

Next time, I will present a table of common derivatives and do some sample problems.

The Derivative, Part 2

I ended my last post with the rather daunting definition of the derivative:

\[f'( x) \ =\lim _{h\rightarrow 0} \ \frac{f( x+h) -f( x)}{h}\]

I will now show how this definition can be used to find much simpler ways to calculate a derivative.

Let’s start with an example that we already know the answer to and is the simplest function we can think of, f(x) = c where c is some constant. You know that if the function does not change anywhere over the values of x, its rate of change (derivative) is zero. You see this if you plotted the function – it’s a horizontal line and a horizontal line has a gradient of zero. So f‘(x) = 0. Let’s see if the derivative definition gives us the same answer.

\[f'( x) \ =\lim _{h\rightarrow 0} \ \frac{f( x+h) -f( x)}{h} =\lim _{h\rightarrow 0} \ \frac{c-c}{h} \rightarrow \frac{0}{0}\]

Well that didn’t help much – we just got an indeterminate form 0/0. But as I said in my last post, there will always be some algebraic manipulation required to remove the problem.

One common method is to multiply the numerator and denominator by the same fraction. This does not change the value of the expression but if you use the right fraction, it removes the issue. In this case, multiply top and bottom by 1/h.

\[f'( x) \ =\lim _{h\rightarrow 0} \ \frac{\frac{1}{h}( c-c)}{\frac{1}{h}( h)} =\lim _{h\rightarrow 0} \ \frac{\frac{1}{h}( 0)}{1} =\lim _{h\rightarrow 0} \ \frac{0}{1} =0\]

Notice that by doing that, we got rid of h and we are left with 0/1 which is definitely 0. So the first rule of finding a derivative: if f(x) = c, then f‘(x) = 0.

Now let’s look at a more complex function, but again, one you know the answer to. The generic equation of a line is f(x) = mx + c where m and c are specific numbers: f(x) = 3x + 7 is an example. Again, from your studies of linear equations, you know this kind of function will plot as a straight line with a gradient of m. So we know that if f(x) = mx + c, then f‘(x) = m. Does our definition give the same result?

\[f'( x) \ =\lim _{h\rightarrow 0} \ \frac{m( x+h) +c-( mx+c)}{h} =\lim _{h\rightarrow 0} \ \frac{mx+mh+c-mx-c}{h} \ \] \[=\lim _{h\rightarrow 0} \ \frac{mh}{h} =\lim _{h\rightarrow 0} \ m=m\]

So when we enter in the particular function into the definition, then expand it, get rid of the terms that cancel, then cancel the common factor h, we again get rid of the dependency on h. We are left with the limit of a constant m as h approaches 0. But as m does not care what h does, the answer is just m – just what we expected. So we now know if f(x) = mx + c, then f‘(x) = m.

Next time, I will do the same thing but use functions for which we don’t know the answer.

The Derivative, Part 1

In my last post, I showed that the rate of change of any function that plots as a straight line (a linear function) has a constant rate of change and that value is the gradient of the line. However, for a nonlinear function, its rate of change depends on the value of x, that is, where you are on its graph. I also said that a function’s rate of change is called the derivative of the function and that is what I will call it from now on.

Graphically, the derivative at a particular x value is the gradient of the tangent line at that point:

We would like to find an easy way to mathematically find this value as opposed to graphing the function and estimating the tangent line’s gradient at the desired points. Clearly, as seen above, the derivative is another function of x as its value changes depending on what x is. There are several ways to denote the derivative, but we will start with f’(x) (read as “f prime of x”). We would like to find what f’(x) is given a function f(x).

I know that the following derivation of the derivative may look complex and begs the question about how easy it will be to find the derivative, but following this will help solidify your understanding of what a derivative is and the final result will be used many times to find the easy results for various function forms.

We begin by taking the graph of a function and drawing a secant line (a line that connects two points on the graph) and calculate the gradient of that line:

We want to know the gradient of the estimated tangent line which we are using to approximate the tangent line at x. From your study of linear equations, you know that the gradient of a line can be found from any two points on the line. The two points on our estimated tangent line are (x, f(x)) and (x+h, f(x+h)) where h is a small distance away from x. Using these two points, we find the gradient by calculating the rise from the first point to the second point divided by the run between the two points. The rise is the difference between the y coordinates and the run is the difference between the x coordinates (h):

\[\text{gradient} \ =\ \frac{f( x+h) -f( x)}{h}\]

Now what happens as h gets smaller? The estimate should get closer to the actual value we are seeking. The below graphic from IkamusumeFan [CC BY-SA (https://creativecommons.org/licenses/by-sa/3.0)] illustrates this:

So it appears that we are interested in what our estimated gradient approaches as h approaches zero. This is, in fact, the formal definition of a function’s derivative. Remember my post on limits? Using limit notation then, the definition of the derivative is

\[f'( x) \ =\lim _{h\rightarrow 0} \ \frac{f( x+h) -f( x)}{h}\]

Notice that if we just substitute zero for h to evaluate the limit, we get the indeterminate form 0/0 as explained in my prior post. So again you may be saying “this doesn’t make finding a derivative easy at all”. At this point, you are correct. But in my next post, I will show how this definition is used to simplify derivatives.

Rate of Change

We are all familiar with many physical rates of change. The rate of change of distance is called velocity. If distance is being measured in meters and time is being measured in seconds, the rate of change of distance (velocity) is measured in units meters per second (m/sec). Water filling a bucket can be measured in liters. If time is measured in minutes, then the rate of change of the amount of water in the bucket has units liters per minute (ltr/min). Or we cold measure the height of the water in the bucket in centimeters. The rate of change would be in centimeters per minute (cm/min).

Graphically, the rate of change of a function indicates how fast it increases or decreases as you move along the x-axis. From your study of linear equations, the standard form of an equation of a line is y = mx + c, where m is the gradient or slope of the line. For example, the line y = 2x + 5 has a gradient of 2 which means that it rises 2 units for every unit you move to the right along the x-axis. If this was the equation of the distance of a particle moving from some reference point, where the distance (y-axis) was measured in centimeters and the time (x-axis) was measured in seconds, the velocity of the particle would be 2 cm/sec which is the same as the gradient of the graph. However, the gradient (velocity) is constant since the gradient is 2 anywhere on the graph. All linear graphs have a constant gradient (rate of change). What about non-linear graphs?

Look at the graph of a non-linear function below:

You would say that the function is increasing (positive rate of change) up to about x = -0.6, decreases (negative rate of change) between about -0.6 and 0.6, and increases after 0.6. The rate of change is different depending on where you are on the graph. For many physical problems that have been modelled with an equation, we want to know what the rate of change is at different values of x. A very common problem to solve is to find where the rate of change is zero. The solution to this would find the maximum and/or minimum points because these are the points where the rate of change goes from positive to negative or vice versa.

What does this have to do with calculus? The mathematical term for the rate of change of a function is the derivative of the function and finding the derivative of a function will be the first thing I will define in my next post. Finding the derivative of a function is an operation in calculus, and this is usually the first topic developed in a calculus subject.