Word Problems – 2

I am writing posts of word problems to give students practice and underlying methods to solve these. My first post in this series provides some techniques to use to help you get to an answer. The most important thing is to not panic. Here is an example of a problem that could cause you to panic, but as you will see, calm logical thinking will get you to the answer.

A suspension bridge has been constructed over a river. The suspension cable is parabolic in nature. The distance between the two towers holding the cable is 180 metres and the minimum height of the cable above the road is 15 metres. At a point 40 metres from the vertex of the cable the height above the road is 30 metres.
How high above the road are the cables attached up the tower?

First, let me repeat from my last post, the things you should do to get started:

  1. Read through the entire problem.
  2. Draw a picture(s) if the problem lends itself to this, and label things with known values and use letters to label other things that you feel are useful to know.
  3. Read through the problem again and as you read, write equations that express mathematically what the words are saying. Use letters for unknown things. I find it helps to use letters that easily refer to the unknown thing. For example, use “C” to be an unknown price of a child’s ticket and “A” for the adult ticket. You can also use subscripts to differentiate unknowns. For example, use “tA” for the time to travel on path A and “tB” for the time to travel on path B.
  4. Identify the thing(s) that the problem is asking for, and look over the pictures and equations you have, to come up with a way to solve for the requested unknown(s).

Hopefully, you have already done the first step. Now let’s do step 2:

This picture has all the information provided in the problem.

Looking at step 3, I do not see any equations we can write, but there is a clue in the problem. The shape of the suspension cable is a parabola and this is the shape of a quadratic equation. But before we can write down equations, we need a coordinate system.

Welcome to your first engineering lesson! When engineers are presented with a problem, they frequently must come up with a coordinate system. But they are free to place the system anywhere they want to. You can choose one that makes the equations difficult or choose one that makes the equations as simple as possible. So where do we choose ours? There is no right answer but some answers are better than others.

Here are some possibilities:

I think we can agree that the random coordinate system 1 would be a poor choice. But 2, 3, and 4 look good because the unknown value would just be the y coordinate over an appropriate value of x. Any of these three systems can be used to solve the problem, but if you review the turning point form of a quadratic equation, you should see that system 3 would generate the simplest equation of the parabola.

I will now redraw the figure using coordinate system 3 and interpreting the distances given in the problem as coordinates:

Figure 1

You may notice the two points (-40,30) and (40, 30). As a parabola is symmetric, the statement “At a point 40 metres from the vertex of the cable the height above the road is 30 metres” actually defines two points.

Now the turning point form of a quadratic equation is

y = a(xh)2+k

where (h,k) are the coordinates of the vertex. So from figure 1, we can immediately write the equation

y = ax2+15

Notice how simple this equation is because of our choice of coordinate system. We can find the value of a by substituting the point (40,30) into this equation:

30 = a(40)2+15 ⟹ a = 3/320

So the equation of the suspension cable is

y = 3x2/320 + 15

Now we can find the height of the cable support by letting x = 90:

y = 3(90)2/320 + 15 = 90.9375 m

So you see that this was not too difficult of a problem.

Word Problems – 1

A constant complaint from my students is that they struggle with word problems. So I thought I would produce several posts showing examples of how to approach and solve these problems.

Now every word problem is unique and different approaches can apply (hence, why these problems are scary). But there are techniques to help you through this. I have found that, when confronted with a word problem, it helps to do the following:

  1. Read through the entire problem.
  2. Draw a picture(s) if the problem lends itself to this, and label things with known values and use letters to label other things that you feel are useful to know.
  3. Read through the problem again and as you read, write equations that express mathematically what the words are saying. Use letters for unknown things. I find it helps to use letters that easily refer to the unknown thing. For example, use “C” to be an unknown price of a child’s ticket and “A” for the adult ticket. You can also use subscripts to differentiate unknowns. For example, use “tA” for the time to travel on path A and “tB” for the time to travel on path B.
  4. Identify the thing(s) that the problem is asking for, and look over the pictures and equations you have, to come up with a way to solve for the requested unknown(s).

Example

The digits of a 3-digit number add up to 13. The sum of the first two digits is 6. The number formed when the digits are reversed and 1 is subtracted is 3 times the original number. What is the original number?

I will get to this in a minute, but as you progress through maths, you will find that, more and more, you need knowledge from your past. This is an example of that. You may be tempted to use xyz to represent the unknown number, but that would be incorrect. In maths, xyz means x×y×z. If x = 1, y = 2, and z = 3, 123 does not equal 1×2×3 = 6.

From primary school (grade school in the USA), you learned that 123 means 1×100 + 2×10 + 3×1 because 1 is in the hundreds place, 2 is in the tens place, and 3 is in the units place. So let’s get to the problem.

So reading through the problem, you see that we need to find a 3-digit number. This problem does not lend itself to drawing pictures, but we can generate some equations. From the first sentence, we can write:

x + y + z = 13     (1)

From the second sentence, we can write:

x + y = 6     (2)

Now you may notice at this point, that we can use equation (2) in equation (1) as equation (1) has x + y in it and equation (2) says that we can replace this with 6:

x + y + z = 6 + z = 13 ⟹ z = 7

Wow! We haven’t finished going through the problem yet, but we have 1/3 of the problem solved. Now what about the third sentence? The numerical value of the number when the digits are reversed is:

100z + 10y + x

The third sentence says that if we subtract 1 from this, we get 3 times the value of the original number. A key word to look for in word problems is “is”. This means “=” in maths:

100z + 10y + x – 1 = 3(100x + 10y + z)     (3)

Since we now know that z = 7, this can be simplified to:

700 + 10y + x – 1 = 3(100x + 10y + 7)

⟹ 10y + x + 699 = 300x + 30y +21     (4)

Now we can get all the variables on the left side and the numbers on the right side, combine like terms to get:

-299x-20y = -678 ⟹ 299x+20y = 678     (5)

Equations (2) and (5) are a set of simultaneous equations in the variables x and y. You can manually use the substitution or elimination methods or use your CAS calculator to solve for x and y. I will assume that you can do these things to get x = 2 and y = 4. So the number that solves the problem is 247.

What is important here is to follow the process. Each word problem is unique but with practice, you will learn to look for key word and phrases that will help you to convert the word problem to a standard textbook problem. I will present more examples in future posts.

Learning to Count Again

Let’s define an event A, flipping heads on the flip of a coin for example. What is the probability of that event occurring? I will use the common notation of P(A) to represent the probability of an event A. The basic way to calculate a probability is to divide the number of ways A can occur by the number of ways anything can occur. That is,

\[\text{P(A)} =\frac{\text{Number of ways A can occur}}{\text{Number of ways anything can occur}}\]

In the case of getting a heads from a flip of a coin, there is only one way to get a heads and there are two possibilities. So the probability is 1/2.

If we flip two coins and we now let our event A be two heads, then the counting of the ways anything can happen takes just a little more thought. The possibilities are: HH, HT, TH, and TT. So there is only one way to get two heads but there are four possibilities. So the probability is 1/4. If the event is “at least one head”, there are three ways that can happen out of the total of four possibilities, so the probability is 3/4.

We can increase the complexity of our experiment and flip three coins or ask questions about choosing certain cards in a standard deck. The counting for the numerator and the dominator gets harder but it is still possible with a little more thought. But what if I asked “what is the probability of getting a flush (all cards of the same suit) in 5 cards randomly selected from a deck of cards?” or “A four-digit number (with no repetitions) is to be formed from the set of digits {1, 2, 3, 4, 5, 6}. Find the probability that the number is even.” Now the counting gets much harder. But fortunately, there are ways to handle this

Selection or Arrangement

Let’s say we have 8 people standing around just waiting for a math problem to show up. Someone wants to form a team of 5 people from these 8 and another person wants to arrange 5 of these people around a desk. How many different teams can be made and how many different settings around the desk can be done?

Notice that in the team question, it doesn’t matter what order you pick the players but in the table question, order does matter. Just using letters to represent the people, ABCDE and EDCBA are the same for the team question so would only be counted once but these are two separate arrangements in the table question. The table question is an arrangement question whereas the team question is a selection question. In textbooks, arrangements are often called permutations and selections are called combinations. It doesn’t matter what they are called, you need to know which type you have in order to count the number of possibilities correctly.

Arrangements

Let’s look at the table question first. You have 8 ways to choose the first person, but when you do, there are only 7 people left to choose as the second person. Then you only have 6 left for the third position, 5 left for the fourth and 4 left for the fifth. So the total number of arrangements is 8×7×6×5×4 = 6720. Multiplying numbers that sequentially decrease by one is a common thing when doing these kind of problems so, as is so common in maths, a shorthand notation was created. 8×7×6×5×4×3×2×1 is represented as 8! and is called “8 factorial”. But for our arrangement problem, we are missing the 3×2×1 part. Notice that 3×2×1 = 3!. So another way to show the solution is 8!/3!. Also notice that 8 – 5 = 3.

We can generalise this. If you have n things and want to arrange them r at a time, then the number of arrangements is n!/(nr)!. Once again, even this is too much to write for our lazy mathematicians, so this is given a shortcut notation where the P stands for permutation:

\[
^{n} P_{r} =\frac{n!}{( n-r) !}
\]

In a CAS calculator, the function “nPr” is used to calculate permutations, so for our problem nPr(8,5) = 6720.

By the way, what if we wanted to arrange all 8 people? Then using the permutation formula, we get 8!/(8-8)! = 8!/0!. This looks illegal but mathematicians foresaw this and defined 0! as being equal to 1. Must be nice to be able to make your own rules.

Selections

Now let’s look at the team question. Using letters again, we are now in the scenario where ABCDE and EDCBA are the same team and this should only be counted once. So we would expect that the number of selections using the same n and r would be smaller than the number of arrangements. In fact this is true. So for our team selection, if we use our arrangement formula, each team has 5! different arrangements and the calculated number is 5! too large. So if we divide our arrangement total of 6720 by 5! = 120, we would get the correct number for the number of teams possible, 6720/120 = 56. To generalise, if you have n things and want to select them r at a time without regard to order, then the number of arrangements is n!/r!(nr)!. Again, there is shorthand for this where the C stands for combination:

\[^{n} C_{r} =\frac{^{n} P_{r}}{r!} =\frac{n!}{r!( n-r) !}\]

In a CAS calculator, the function “nCr” is used to calculate combinations, so for our problem nCr(8,5) = 56.

Poker

So at the beginning of this post, I posed a hypothetical question about the probability of being dealt a flush hand of 5 cards out of a deck of 52 cards. As card order does not matter, this is a selection (combination) problem. We need to find the number of ways to get 5 card flush hands and the total number of possible 5 card hands. I assume you are familiar with the standard deck of cards that consists of 4 suits of 13 cards each.

First let’s find the total number of 5 card hands there are. Here n = 52 and r = 5:

\[^{52} C_{5} =\frac{52!}{5!( 52-5) !} =2,598,960\]

his will be our denominator to use in the probability formula. For the numerator, we need to find how many ways you can get a 5 card flush. You could get 5 clubs, 5 hearts, 5 diamonds, or 5 spades. Each one of these is a selection of 13 cards, 5 at a time or:

\[^{13} C_{5} =\frac{13!}{5!( 13-5) !} =1,287\]

We need to multiply this by 4 since there are 4 suits, 1287×4 = 5148. So the probability of being dealt a flush in 5 card poker is P(Flush) = 5148/2598960 = 0.00198 = 0.198% or about once in 505 hands. I wouldn’t bet on it.

An Equation Transformation Example

One of the many skills maths students develop in high school is the ability to change the position of a graph. I sometimes need to remind my students that this is not just “busy” work. It is a skill used frequently in technical fields such as science and engineering. What follows is an example from celestial mechanics.

The Two-Body Problem

Calculating orbits and their characteristics is usually a computer-intensive exercise. For example, how far from the earth is an orbiting satellite at a particular time. However, a good first approximation of this is to pretend we are in the unrealistic universe where only two point masses exist. In this universe, the shape of orbits can be perfectly modelled with equations called conic sections. Why they are called conic sections is a bit beyond the scope of this post, but please free free to look that up.

The orbital shape I want to describe here is the ellipse. The equation of an ellipse which is centered at the origin is:\[\frac{x^{2}}{a^{2}} +\frac{y^{2}}{b^{2}} =1\] where 2a is the length of the ellipse in the x direction and 2b is the length of the ellipse in the y direction. If a > b, the line along the a intercepts is called the major axis of the ellipse and the line along the b intercepts is called the minor axis:

\[\frac{x^{2}}{a^{2}} +\frac{y^{2}}{b^{2}} =1\]

This is a possible shape of an orbit about one of the masses in our perfect two-mass universe. In reality, orbits of say satellites around the earth, approximate this shape but other factors like other objects in the solar system and the earth not being a point mass with an uneven distribution of its mass, make the orbital shape slightly different than an ellipse. Not only that, in our perfect universe, the orbit path is exactly contained in a plane and can be completely drawn on flat paper. In reality, orbits also go below and above the average plane of the orbit.

Now one of the masses, say m₂, is on the ellipse. The other mass, m₁, is not at the center of the ellipse, the origin in the above figure. So where is it?

An ellipse has two special points associated with it call focal points. These points have the property that any point on the ellipse has a total distance between it and the focal points is a constant. This constant is 2a if the longest dimension of the ellipse is along the x-axis. Otherwise, this constant length is 2b.

Total length of the red line = 2a regardless of where (x,y) is.

The other mass, m₁, is at one of these focal points. The coordinates of these points can be found using the Pythagorus theorem. Looking at the blue triangle above, you can see that \[c=\sqrt{a^{2} -b^{2}}\]

An Example

Suppose we want to analyze the relationship between two masses that are in orbit. In orbital mechanics, the shape of an orbit is given by its eccentricity, e. An elliptical orbit has an eccentricity between 0 and 1. By the way, a perfectly circular orbit has an eccentricity of 0. So let’s say we have an orbit with the major axis length 2a = 10 and minor axis length 2b = 6. This is an orbit with an eccentricity of 0.8. If we want to know the position of m₂ with respect to m₁, we need a coordinate system. A convenient one would be a Cartesian system aligning the major axis of the orbit along the x-axis and centering the ellipse at the origin. This way we can immediately write down the equation of the orbit:\[\frac{x^{2}}{25} +\frac{y^{2}}{9} =1\]

as a = 5 and b = 3. This means that the focal points are \[c=\sqrt{25 -9} =4\]from the center:

An example orbit

Two of the more basic pieces of information we would like to know (especially if this is a satellite orbiting the earth) is what is the distance r between m₁ and m₂ and what is the direction of m₂ with respect to m₁. We can define the direction as the angle ???? of the line connecting the two masses with the positive x -axis. This is actually the reference used in much of celestial mechanics. This angle is called the true anomaly.

Well this is nice, but since we want to find relationships of m₂ with respect to m₁, it would be even more convenient to put the origin of our coordinate system at m₁. How do we do that?

Transformations to the Rescue

Somewhere in your year 10 or 11 maths (grades 10 or 11 in the States), you took the equation of a standard parabola, y = x², and replaced the x with xh to get y = (xh)². This moved the parabola h units to the left or right depending on the sign of h.

Effect of replacing x with (xh

It turns out that in any relationship between x and y, replacing the x with xh has the exact same effect on its graph. So in our example, if we move the ellipse 4 units to the left, m₁ would be on the origin of our coordinate system.

\[\frac{(x+4)^{2}}{25} +\frac{y^{2}}{9} =1\]

We can now calculate r and ???? a bit more easily than before the transformation. Let’s look at where m₂ is in the above figure. Here, m₂ is at (-4,3). The following process would work with any point, but at this point, the numbers are “nicer”.

The figure below shows the answers. The distance r can be found using the Pythagorus theorem but once you see that the two sides are 3 and 4, then this is the standard 3-4-5 right triangle:

The solution

In this perfect universe, other orbital shapes are possible: circles, parabolas, and a hyperbolic. These have their own standard equations but they all can be transformed to move them to any place you wish on the coordinate system to make your calculations easier.

Modelling and Graphs

This post presents a simplified engineering modelling and design example. I offer this as a motivation to students struggling with graphing topics like linear, quadratic, cubic, etc equations. This is not just busy work. It is the beginning of developing skills you will need to work in technical professions like engineering.

Why Modelling?

Engineers use mathematical models for many reasons. Some of these reasons are:

  1. Easier and cheaper to analyse and design than the physical thing being modelled
  2. Cheaper to tweak to do “what if” analysis
  3. Necessary when the physical object needs to be encoded, eg a control system for an aircraft – the control system needs to know how the aircraft will react to its controls.

The Example

Let’s use a mathematical model in a familiar scenario.

The Spring

Early automobiles used a suspension that just consisted of leaf springs to support the entire carriage. Later, coil springs for each wheel were used. With just springs, what is the reaction of the carriage (the part that people sit in) to a bump in the road? I think you can imagine that it would look like this:

This is a plot of a cosine function which you may have seen already. This is a model of what would happen to a car with just springs. But it’s not a good model since it assumes that the oscillations would go on forever. Of course, in real life, the springs would lose energy and there would be wind resistance as well. But engineers typically start out with a simple model and add complexity as they continue analysing it.

This particular function I just plotted is x = 5cos(3t), where x is the position of the carriage and t is time. The “5” corresponds to the strength of the bump in the road, and the “3” relates to how weak or stiff the spring is.

The letter k is usually used to represent the spring stiffness. The higher k is, the stiffer the spring and the faster the oscillations experienced by the people in the car. Changing the “5” just changes the extreme values of the curve, but changing the “3” has a more interesting effect. Below are three graphs showing the effect of changing k:

Red: k = 1
Blue: k = 3
Green:k = 5

The Damper

As I mentioned before, just using a cosine function to model the spring suspension is not good because there is natural damping that occurs due to energy loss in the spring (heat) and resistance due to the air around the car. The real reaction of the carriage would look something like this:

Spring with damper

But even this would not be comfortable as the car would be frequently bouncing as it goes over bump after bump (but certainly more comfortable than no suspension at all!). So more damping would be desireable.

A function that has a damping shape is the exponential function
x = adt where the “a” and the “d” are some positive constants for a particular equation. As will be seen when you study calculus, a convenient “a” to use is the irrational number e. Don’t worry if you haven’t seen e before. It is just a number approximately equal to 2.71828. I say approximately because e, like ????, is irrational so it has a non-repeating decimal part. The d indicates how strong the damping effect is; the larger d is, the stronger the damping.

So as an example, the plot of x = 5e-0.5t looks like:

x = 5e-0.5t

The “5” in his example is just to provide the initial bump. So it would be nice if we could combine the action of the spring with a damping force that a shock absorber would provide. This can be done by multiplying the shock absorber model
x = 5e-0.5t and the spring model x = 5cos(3t) together:
x = 5e-0.5tcos(3t) where we just need the one “5” for the initial bump. The plot of this looks like:

x = 5e-0.5tcos(3t)

Much better but room for improvement. Depending on the type of car we are designing, we may want a stiff response where the damping is strong (like in a sports car where one wants to “feel the road”) or a softer one for a family car for example.

Below are two plots changing the damping strength and using k = 3:

Blue: d = 0.5
Red: d = 1.0

A carriage following the red curve would be a better ride. The damping can be made stronger for a sports car.

When the car is actually built and testing reveals that the car responds to bumps as was predicted by the engineer’s model, well that’s a very good feeling!

Why Study Maths?

The engineer in this discussion used several skills that you are studying now:

  1. Drawing graphs of equations
  2. How to change the shape of those graphs my changing the values in the equation
  3. Circular (trigonometric) and exponential functions
  4. Calculus
  5. And all of the above is based on algebra.

The calculus skill is not apparent in the discussion so far, but I will explain that further in the next section.

Creating a Model

We can generalise the model for the suspension of a car as x = Aedtcos(kt) where A is the size of the initial bump in the road, d is the stiffness of the shock absorbers, and k is the strength of the springs. How did I know to use the exponential and cosine functions? They are the solution to what are called differential equations which are based on calculus.

When developing a model, especially for dynamic (moving) systems, they start with basic physics. Frequently, Newton’s second law is the starting point:\[F = ma\] where F is the force applied or exerted by a mass m, accelerating (or decelerating) by a. Now, we want a model that predicts x, the position of the car carriage. Many applications of calculus involve the rate of change of something. The rate of change of position is called velocity which you are familiar with. And the rate of change of velocity is called acceleration. Newton’s second law explains why you feel your back press against the seat only when you accelerate (or against the seatbelt when you decelerate). The rate of change of position, that is velocity, is mathematically called the first derivative of position. The rate of change of velocity, that is acceleration, is the first derivative of velocity or the second derivative of position. So Newton’s second law is really an equation about the second derivative of position. Equations that have derivatives in them, are called differential equations and these are very familiar to engineers.

Notation-wise, a single dot is shown above a variable to indicate a first derivative and two dots represent a second derivative. So another way to show Newton’s second law is \[F=m\ddot{x}\] as acceleration is the second derivative of the position, x.

Now without going into the detail, the forces exerted on the carriage by the springs and the shock absorbers are kx and dẋ respectively. So if you replace the F in Newton’s second law with these forces, you get the differential equation\[kx+d\dot{x}=m\ddot{x}\]

An engineer uses calculus to solve this equation for the position x as a function of the time t, that is x(t). Given certain initial conditions like the size of the bump experienced by the car, and the values of k and d, a solution to this equation can be x = Aedtcos(kt). This is where I got the cosine and the exponential function from at the beginning.

Transforming Quadratic (Parabolas) Graphs

Forms of Quadratic Equations

Quadratic equations come in several generic forms (or patterns) but they all have several things in common:

  1. The highest power of x (the independent variable) is 2 when all expressions are expanded in polynomial form.
  2. The other integer powers, 1 meaning just x, and 0 meaning a constant term, may or may not be present. But the squared term must be present.
  3. Other powers (negative integers, and non-integers), cannot be present.

The most common general form of a quadratic equation is \[y=ax^2 +bx+c,\] where the ab, and c are constants specific in a particular equation. For example, \[y=3x^2 -2x+7.\] Here a = 3, b = -2, and c = 7. There are other forms as well:\[\begin{array}{l}
y=k\left(x+a\right)\left(x+b\right),
y=a{\left(x-h\right)}^2 +k
\end{array},\]where the unspecified constants a, and k are specific to each form and are not the same numbers when converting between each of these forms. The most basic quadratic equation is \[y=x^2.\]Choosing various values of x, then squaring them to get the corresponding y values, and plotting these on a Cartesian coordinate grid, creates the following curve:

\[y=x^2\]

This shape is called a parabola. All quadratic equations have this shape when plotted but their position, orientation, and scale may be different. Each form of quadratic equations have their advantages. This lesson however, will concentrate on the form \[y=a{\left(x-h\right)}^2 +k.\]

The ‘a’ Factor

So we will be looking at the quadratic form \[y=a{\left(x-h\right)}^2 +k.\] This form is called the turning point form. Let’s start out simple and look at the effect of a alone by setting the h and k to zero. This leaves us with the equation \[y=ax^2.\] This coefficient in front of the x2 term scales and orients the parabola. If a is negative, all the y values are now negative. This flips (reflects) the parabola across the x-axis. If a is a large number, greater than 1, then the y values are larger for a given x than the y values in the basic y = x2 parabola. This has the effect of making the parabola sharper, that is, it is dilated along the x-axis. If a is a fraction between -1 and 1, then the y values increase more slowly. This has the effect of making the parabola flatter which is also a dilation along the x-axis. Below are several graphs of y = ax2 for various values of a. The basic parabola is shown (dashed curve) for comparison:

\[y=3x^2\]
\[y=\frac{1}{3}x^2\]
\[y=-3x^2\]
\[y=-\frac{1}{3}x^2\]

Notice how the negative sign flips the parabola across the x-axis.

The k Effect

Now let’s look at the equation \[y=ax^2 +k.\]I have just added a k to the previous equation form. If you add or subtract a constant number to an equation, it just raises or lowers the graph of the equation by k units. This is independent of the effect that a has on the curve. Below are examples for two choices of k using an a that was used above for comparison:

\[y=-\frac{1}{3}x^2+4\]
\[y=3x^2-4\]

The h Reaction

Now so far we have scaled and inverted our parabola and moved it up or down. What about moving it right or left? That’s what the h does in the form \[y=a{\left(x-h\right)}^2 +k.\] We get to this form by replacing x with x – h. Whenever this is done in any equation, as well as the quadratic equation, this moves the curve to the right h units if h  is positive or to the left if h is negative. But be careful. There is a negative sign in the form y=a(xh)2+k, so in y=a(x-3)2+k, h = 3 so this parabola is moved 3 units to the right. Whereas,  a(x+3)2+k can be thought of as a(x-(-3))2+k, so h is -3 and this moves the parabola 3 units to the left. The effect of h on the graph of a parabola is independent of the effects of a or k.

Below are two examples of the effect of h using the last example above for comparison:

\[y=3(x-1)^2-4\]
\[y=3(x+1)^2-4\]

The Derivative, Part 8

Now let’s do some more examples using not only the chain rule, but using a combination of the rules we have covered.

Let me start with an example that illustrates the “chainyness” of the chain rule. Let \[f( x) =\text{sin}\left(\sqrt{x^{2} +2x-7}\right)\]

Notice that there are three operations at work here: the sine, the square root and the polynomial. Referring back to the previous post, which is the outermost function? It’s the sine as that would be the last operation you would perform if you were to actual calculate the function for a particular x value. So the derivative rule for the sine is the first differentiation rule we will use.

So we have the sine of “something” so we start with the derivative of that something: \[f'( x) =\text{cos}\left(\sqrt{x^{2} +2x-7} \right)\ ( …)\] Now from the last post, you know you have to multiply this by the derivative of that “something”. It will be helpful to rewrite that “something” as \[\sqrt{x^{2} +2x-7} =\left( x^{2} +2x-7\right)^{1/2}\] Looks like we need to apply the chain rule again as we have an inner (polynomial) and an outer (power) functions.

The derivative of the “something” to the 1/2 power is \[\frac{1}{2}\left( x^{2} +2x-7\right)^{-1/2}( …)\]

We are now left with the innermost function x² + 2x – 7. The chain rule says to multiply the previous results with the derivative of this innermost function which is 2x +2. So putting this in the last (…) and then putting that result in the first (…) gives \[f'( x) =\frac{1}{2}\left( x^{2} +2x-7\right)^{-1/2}( 2x+2)\text{cos}\left(\sqrt{x^{2} +2x-7}\right)\] Do you see how the successive differentiations of the functions from the outermost to the innermost works with the chain rule?

Let’s do another example. Let’s differentiate \[f( x) =\sqrt{\text{sin}( x)\text{cos}( x)}\]

As we did before, it’s easier to see the applicable differentiation rule if you convert the square root to its equivalent exponent form:\[f( x) =\left[{\text{sin}( x)\text{cos}( x)}\right]^{1/2}\]

Hopefully you can now identify the outermost operation as raising “something” to the 1/2 power. So the power rule is the one to use at first:\[f'( x) =\frac{1}{2}[\text{sin}( x)\text{cos}( x)]^{-1/2}( …)\]

So we now need to multiply this by the derivative of the “something” which is sin(x)cos(x). But this is the multiplication of two functions so we need to use the multiplication rule. Letting u = sin(x) and v = cos(x), then uv + uv‘ becomes cos²(x) – sin²(x). So now replacing the (…) with this results in \[f'( x) =\frac{1}{2}[\text{sin}( x)\text{cos}( x)]^{-1/2}\left[\text{cos}^{2}( x) -\text{sin}^{2}( x)\right]\]

This last example highlights the point that to find the derivative of complex functions frequently requires the use of several differentiation rules. You need to be aware of where you are in a particular problem and which rule you are currently working on.

Next time, I will show some examples where the derivatives are used. In the meantime, you can use the results of derivatives found in this post to find the derivative of\[f( x) =\sqrt{\text{sin}( x)\text{cos}( x)}\text{sin}\left(\sqrt{x^{2} +2x-7}\right)\]

The Derivative, Part 7

So let’s recap: we have a rule to find derivatives of basic functions using a table, a rule to handle a function that is multiplied by a constant, a rule to handle the addition or subtraction of two (or more) functions, a rule to handle the multiplication of two (or more) functions, and a rule to handle the division of two functions. I also did an example where several of these rules can be used finding the derivative of a single function. You would think that this would exhaust all the possibilities and that you can now differentiate any function in the known universe. But alas, there is one more, perhaps the most powerful, rule yet to be presented.

This new rule is called the chain rule, so called because it allows you to find the derivative of a function, of a function, of a function, and so on.

Now there is a textbook way to present this rule and an intuitive way which I like to use. I find that the textbook approach can be confusing because there are several variables variables to keep track of. I will present both ways so that you may see the connection between the two and have a better understanding of the chain rule.

The textbook approach to the chain rule is a bit easier to see if we forego functional notation and go back to using y. However, whenever you have a function of a function f[g(x)], the chain rule is to be used. Functions like this are called composite functions. For example,

\[ f( x) =\text{sin}\left( x^{2}\right)\]

Here, g(x) = x² and f(x) = sin(x). So f(x²) = sin(x²).

In the textbook approach you let u be the inner function (that is the function you are using as the argument for the outer function) and you let y be the function after you replace the inner function with u. I will give an explanation later on how to identify the inner and outer functions if that is not clear.

So in this case, u = x², so y = sin(u). The textbook chain rule is

\[\frac{dy}{dx} =\frac{dy}{du} \times \frac{du}{dx}\]

This may look scary but let me repeat this rule in English: the derivative of a composite function is the derivative of y with respect to u times the derivative of u with respect to x. So in our example, dy/du = cos(u) (using the table) and du/dx = 2x. Multiplying these together and replacing u with its definition, we get

\[\frac{dy}{dx} =\text{cos}( u)\times 2x\ =\ 2x\ \text{cos}( x^{2})\]

So to further explain what inner and outer functions are, suppose you wanted to take our example function and calculate its value for a certain number for x. The first thing you would do is take that number and square it. The squaring function is the inner function since it is the first thing you would do. Then you would take the sine of that squared number. The sine function then is the outer function as that is the last operation you would do.

So I explain the chain rule as follows: Take the derivative of the outer function of ‘something’ keeping the ‘something’ intact, but since the ‘something’ is not just ‘x‘ you need to multiply the result by the derivative of that ‘something’.

In this example, the ‘something’ is x². So the derivative of the sine of that ‘something’ is cos(x²), but I then need to multiply this by the derivative of the ‘something’. The derivative of x² is 2x, so the result is 2x cos(x²).

So now let’s reverse the roles of the the inner and outer functions. Consider the derivative of [sin(x)]². A very common shortcut notation for the square of a trig function like this is [sin(x)]² = sin²(x). Again, imagine actually calculating this for a particular value of x. You would first take the sine of that number (the inner function) then square the result (the outer function). We know that the derivative of the square of ‘something’ is 2 times that ‘something’ to the first power which in this case is 2 sin(x). But to compensate for this simplification, we need to multiply the result by the derivative of that ‘something’. In this case, the derivative of sin(x) is cos(x), so the final answer is 2 sin(x)cos(x).

Now to get comfortable with this, we need to do some more examples. I will do that in my next post.

The Derivative, Part 6

Last time I presented the multiplication rule of differentiation to be used when given a function that is the multiplication of two or more other functions. As you would guess, there is also a rule that handles the division of two functions.

Let’s say you have the function

\[ f( x) =\frac{x^{2}}{\text{sin}( x)}\]

This one can be solved with the multiplication rule if you remember that 1/sin(x) = csc(x). But as I haven’t told you what the derivative of csc(x) is, we are stuck using the following division rule. But this highlights the point that as we get deeper into maths, there are often several ways to solve a problem. The maths “arteest” is one that solves a problem elegantly.

So the following rule is the division rule. Again, I will use u(x) and v(x) to split the function up into its parts. If you have a function of the form

\[f( x) =\frac{u( x)}{v( x)}\]

then the derivative of f(x) is

\[f'( x) =\frac{u( x) v'( x) -u'( x) v( x)}{[ v( x)]^{2}}\]

As you can see, this rule is a bit more complex which is why you would use a simpler rule if possible. But it is still relatively easy to use if you keep track of which part is u and which part is v.

Using the example function above,

\[\begin{array}{{>{\displaystyle}l}} u( x) =x^{2} ,\ \ \ \ \ v( x) =\text{sin}( x)\\ u'( x) =2x,\ \ \ \ v'( x) =\text{cos}( x) \end{array}\]

So according to the division rule,

\[f'( x) =\frac{x^{2}\text{cos}( x) -2x\text{sin}( x)}{\text{sin}^{2}( x)}\]

Now you can use many rules in a single differentiation problem consider

\[f( x) =\frac{x^{2} e^{x}}{\text{sin}( x)}\]

Here, the numerator is a multiplication of two functions. So when using the division rule, you need to apply the multiplication rule for the u‘ part:

\[ \begin{array}{{>{\displaystyle}l}}
u( x) \ =\ x^{2} e^{x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ v( x) =\text{sin}( x)\\
u'( x) =x^{2} e^{x} +2xe^{x} \ \ \ \ \ v'( x) =\text{cos}( x)
\end{array}\]

I’ll leave it as an exercise for you to see if I correctly found u‘(x) using the multiplication rule. I used the fact that (as seen from the table I provided a couple of posts before) is its own derivative. Anyway, using the division rule,

\[f'( x) =\frac{x^{2} e^{x}\text{cos}( x) -\left( x^{2} e^{x} +2xe^{x}\right)\text{sin}( x)}{\text{sin}^{2}( x)}\]

So you might be thinking that you can differentiate any function as long as you know the derivatives of the individual parts. So how would you differentiate

\[ f( x) =\text{sin}\left( x^{2}\right) ?\]

This is not a multiplication of functions, but rather a function of a function. I will introduce the very powerful chain rule as it applies to differentiation in my next post.

The Derivative, Part 5

So last time, I provided a table of derivatives given a function that is of a particular form. Because of rules 3 and 4 (you will need to see my last post to see what these are), along with the other entries in the table, you can now differentiate many functions not specifically in the table. But there are still many functions that you cannot differentiate without other rules. For example, if

\[f( x) =x^{2}\text{sin}( x)\]

there is no table entry to help you. Even though you can differentiate x² and sin(x) separately, there is no rule in the table that allows you to differentiate their multiplication together since they are both functions of x, that is, neither one is just a constant. You can’t use rule 4 here.

There is a differentiation rule that handles this. It is the multiplication rule and it states that if you have a function of the form

\[f( x) =u( x) v( x)\]

then the derivative is

\[f'( x) =u( x) v'( x) +u'( x) v( x)\]

This can be proven using the basic definition of a derivative, but you can just take my word for it.

So in the example at the beginning of this post,

\[\begin{array}{{>{\displaystyle}l}}
u( x) =x^{2} ,\ \ \ \ \ v( x) =\text{sin}( x)\\
u'( x) =2x,\ \ \ \ v'( x) =\text{cos}( x)
\end{array}\]

where I used the table in my last post to find the individual derivatives. So according to the rule,

\[f'( x) =x^{2}\text{cos}( x) +2x\text{sin}( x)\]

Now this rule can be extended to handle more than two functions multiplied together. If

\[f( x) =u( x) v( x) w( x)\]

then you can use the original rule twice, or

\[ f'( x) =u( x) v( x) w'( x) +u( x) v'( x) w( x) +u'( x) v( x) w( x)\]

I think you can see the pattern here. So if

\[\begin{array}{{>{\displaystyle}l}}
f( x) =x^{2}\text{sin}( x)\text{cos}( x)\\
u( x) =x^{2} ,\ \ v( x) =\text{sin}( x) ,\ \ \ w( x) =\text{cos}( x)\\
u'( x) =2x,\ \ v'( x) =\text{cos}( x) ,\ \ w'( x) =-\text{sin}( x)
\end{array}\]

So the derivative is

\[f'( x) =-x^{2}\text{sin}^{2}( x) +x^{2}\text{cos}^{2}( x) +2x\text{sin}( x)\text{cos}( x)\]

Now this can be simplified using trig identities but I will leave it here.

What about a function that’s a division of two functions? Yes there is rule for that as well, but I’ll cover that in my next post.