Modeling Data with Straight Lines using Excel

Microsoft Excel, like most graphical calculators and spreadsheet programs, has the built in ability to do linear regression of measured data using certain types of functions — lines, polynomials, logarithms, and exponents for example. However, you can get it to do any type of function — sinusoidal, natural log, whatever — if you work through the spreadsheet and can use the iterative Solver tool.

This more general approach is quite useful in teaching pre-Calculus, because the primary purpose of all the functions they have to learn is to create mathematical models (functions) based on data that can be used for predictions.

The Data

I started this year’s pre-calculus class by having them collect some data. In a simplification of the snow-melt experiment I did with the middle school last year, I had them put a beaker of water (about 300ml) on a hot plate and measure the temperature every minute as warmed up.

To make the experiment a little more interesting, I had each student in each group of four take just three consecutive measurements and try to find the equation of the straight line that best fit their data, and could be used to try to predict the other measurements of their peers in their group.

Figure 1. Scatter plot of measured temperatures during the warming of a beaker of water on a hot plate. Data given in Table 1.

It did not quite work out as I’d hoped. Since you only need two points to find the equation of a straight line, having three points produced a little confusion. I’d hoped to produce that confusion, but hadn’t realized that I’d need to do a review of how to find the equation of a straight line. A large fraction of the class was a little bit rusty after hot months of summer.

So, we pooled all the data and reviewed how to find the equation of a straight line.

Table 1: The Data

Time (minutes) Measured Temperature (°C)
0 22
1 26
2 31
3 36
4 40
5 44
6 48
7 53
8 58
9 61
10 65
11 68
12 71

Finding the Equation for a Straight Line using Two Points

The general equation for a straight lines is:

(1)  y = mx + b

and we need to determine the coefficients m and b. m is the slope, which can be calculated from two points using the equation:

(2)  m = \frac{y_2 - y_1}{x_2 - x_1}

using the points at t=6 and t=11 — the points (x1, y1) = (6,48) and (x2, y2) = (11,68) respectively — for example, gives a slope of:

 m = \frac{68 - 48}{11 - 6}

 m = \frac{20}{5}

 m = 4

so our general equation becomes:

 y = 4 x + b

to find b we substitute either one of the points into the equation for x and y. If we use the first point, x = 6, and y = 48, we get:

 48 = 4 (6) + b
 48 = 24 + b

 24 + b = 48
 b = 48 - 24
 b = 24

and the equation of our line becomes:
(3)  y = 4 x + 24

Now, since we’re actually looking at a relationship between temperature and time, with temperature on the y-axis and time on the x-axis, we could relabel the terms in the equation with T = temperature and t = time to have:

(4)  T = 4 t + 24

While this equation is more satisfying to me, because I think it better describes the relationship we have, the more vocal students preferred the equation in terms of x and y (Eqn 3). These are the terms they are more familiar with in the context of a math class, and I recall seeing some evidence that students seem to learn better with the more abstract representations sometimes (though I can’t quite remember the source; I should have blogged about it).

Plotting the Data and the Modeled Straight Line

The straight line equation we came up with (Eqn. 4) is our model of the data. It’s not quite perfect. All the data do not lie on the line, although, if we did everything right, only the points (6, 48) and (11, 68) are guaranteed to be on the line.

Figure 2. The equation of our straight line model (red line) matches the data (blue diamonds) pretty well.

I showed the class how to plot the scatter graph using MS Excel, and how to draw the line to show the modeled data. The measured data are represented as points since the measurements were made at discrete points in time. The modeled equation, however, is a continuous function, hence the straight line. The Excel sheet below (Resource 1) illustrates:

Resource 1: Excel Spreadsheet of Measured versus Modeled Data

The Best Fit Curve

The Excel spreadsheet (Resource 1) was set up so that when I entered the slope (m) and intercept (b) values, the graph would quickly update. So I went through the class. Everyone called out their slope and intercept values, I plugged them in, and they could all see how the modeled line changed slightly based on the points used to calculate it. So I put the question to them, “How can we figure out which model equation is the best?”

That’s how I was able to introduce the topic of error. What if we compared the temperature predicted by the model for each data point, to the actual value. The smaller the difference in modeled versus measured temperatures, the better the fit of the model. Indeed, if we sum all the differences, or better yet take the average of the differences, we could get a single number, we’ll call the average error (ε), that could be used to compare the different models. I used this opportunity to introduce sigma notation, which the pre-calculus students had not seen much of before.

As a first pass (which, as we’ll see below, has a major problem), the error (ε) for each point (i) is:

 \epsilon_i = (T_{measured}-T_{modeled})

The average error is the sum of all the errors divided by the number of points (n) (we have 12 points so n=12 in this example):

(5)  \bar{\epsilon} = \frac{\sum\limits_{i=1}^{n} \epsilon_i}{n}

Now this works, but there is one problem. I was quite pleased and a little bit surprise that one of my students recognized what it was without any coaxing and also suggested a solution: by simply taking the difference to calculate the error, a point that is offset above the modeled line can be canceled out by a point offset by the same amount below the line. So what we really need is to use the absolute value of the error.

(6)  \epsilon_i = \left| T_{measured}-T_{modeled} \right|

This works, and is what we went with, but I did also point that what’s usually done is to use the square of the error instead of the absolute value. Squaring makes any number positive, so it accomplishes the same goal as the absolute value, and is the approach we’ll use when I go into linear regression later on.

Setting up the Excel spreadsheet to calculate the average error is fairly straightforward as shown in Resource 2:

Resource 2. Calculating the average error using Excel.

So once again, we went through the class and everyone called out their slope and intercept values and cheered when I plugged the numbers in and they saw if they had the lowest value.

It is important to remember, though, that the competition gives a somewhat random result: students’ average error is a function of the points they happened to pick, not how well they did the math (assuming everyone did the math correctly).

Figure 2. Showing the spreadsheet used to calculate the average error (Resource 2).

Practicing Plotting Points on the Co-ordinate Plain

Pre-Algebra class starts next week, so in preparation for one of the early lessons on how to plot x,y co-ordinates, I put together an interactive plotter that lets students drag points onto the co-ordinate plain.

Students practice plotting points by dragging the red dot to the coordinates given.

Usage

The program generates random coordinate pairs within the area of the chart (or you can enter values of the coordinates yourself):

  • Clicking the “Show Point” button will place a yellow dot at the point.
  • When you’re confident you understand how the coordinate pairs work, you can practice by dragging the red dot to where you think the point is and the program will tell you if you’re right or not.

About the Program

This interactive application uses the jQuery and KineticJS javascript libraries. The latter library in particular is useful for making the HTML5 canvases interactive, so you can click on points on the graph and drag them.

When I have some time, after classes settle down, I’ll see if I can figure out how to embed this type of app into this (WordPress) blog. KineticJS is based off HTML5 canvases, which is what I use for the other interactive graphs I’ve posted, so it shouldn’t be terribly hard (at least in principle).

Finding Where Two Lines Intersect: Khan Academy Lessons

Khan Academy videos on how to solve systems of linear equations:

Use a graphing calculator to double check your lines, or use WolframAlpha, or use my simple line grapher.

by Graphing

Video: Solving Linear Systems by Graphing

Most algebra texts have good problem sets for practice (Khan Academy does not – as yet).

by Substitution

Moving from the more intuitive, visual, graphical method of solution to the more exact algebraic methods, we start with solving by substitution.

Video: Solving Linear Systems by Substitution

Practice Set: Practicing Systems of equations with substitution

by Elimination

Video: Solving Systems of Equations by Elimination

Additional Video: Solving systems by elimination 2 (on YouTube).

Additional Video: Solving systems by elimination 3 (on YouTube).

Practice Set: Practicing Systems of equations with simple elimination

Exponential Growth/Decay Models (Summary)

A quick summary (more details here):

The equation that describes exponential growth is:
Exponential Growth:  N = N_0 e^{rt}

where:

  • N = number of cells (or concentration of biomass);
  • N0 = the starting number of cells;
  • r = the rate constant, which determines how fast growth occurs; and
  • t = time.

You can set the r value, but that’s a bit abstract so often these models will use the doubling time – the time it takes for the population (the number of cells, or whatever, to double). The doubling time (td) can be calculated from the equation above by:

 t_d = \frac{\ln 2}{r}

or if you know the doubling time you can find r using:

 r = \frac{\ln 2}{t_d}

Finally, note that the only difference between a growth model and a decay model is the sign on the exponent:

Exponential Decay:  N = N_0 e^{-rt}

Decay models have a half-life — the time it takes for half the population to die or change into something else.

Exponential Cell Growth

The video shows 300 seconds of purely exponential growth (uninhibited), captured from the exponential growth VAMP scenario. Like the exponential growth function itself, the video starts off slowly then gets a lot more exciting (for a given value of exciting).

The modeled growth is based on the exponential growth function:

 N = N_0 e^{rt} (1)

where:

  • N = number of cells (or concentration of biomass);
  • N0 = the starting number of cells;
  • r = the rate constant, which determines how fast growth occurs; and
  • t = time.

Finding the Rate Constant/Doubling Time (r)

You can enter either the rate constant (r) or the doubling time of the particular organism into the model. Determining the doubling time from the exponential growth equation is a nice exercise for pre-calculus students.

Let’s call the doubling time, td. When the organism doubles from it’s initial concentration the growth equation becomes:

 2N_0 = N_0 e^{r t_d}

divide through by N0:

 2  =  e^{r t_d}

take the natural logs of both sides:

 \ln 2  =  \ln (e^{r t_d})

bring the exponent down (that’s one of the rules of logarithms);

 \ln 2  =  r t_d \ln (e)

remember that ln(e) = 1:

 \ln 2  =  r t_d

and solve for the doubling time:

 \frac{\ln 2}{r}  =  t_d

Decay

A nice follow up would be to solve for the half life given the exponential decay function, which differs from the exponential growth function only by the negative in the exponent:

 N = N_0 e^{-rt}

The UCSD math website has more details about Exponential Growth and Decay.

Finding the Growth Rate

A useful calculus assignment would be to determine the growth rate at any point in time, because that’s what the model actually uses to calculate the growth in cells from timestep to timestep.

The growth rate would be the change in the number of cells with time:

 \frac{dN}{dt}

starting with the exponential growth equation:

 N = N_0 e^{rt}

since we have a natural exponent term, we’ll use the rule for differentiating natural exponents:

 \frac{d}{dx}(e^u) = e^u \frac{du}{dx}

So to make this work we’ll have to define:

 u = rt

which can be differentiated to give:

 \frac{du}{dt} = r

and since N0 is a constant:

 N = N_0 e^{u}

 \frac{dN}{dt} = N_0 e^{u} \frac{du}{dt}

substituting in for u and du/dt gives:

 \frac{dN}{dt} = N_0 e^{rt} (r)

rearranging (to make it look prettier (and clearer)):

 \frac{dN}{dt} = N_0 r e^{rt} (2)

Numerical Methods: Euler’s method

With this formula, the model could use linear approximations — like in Euler’s method — to simulate the growth of the biomass.

First we can discretize the differential so that the change in N and the change in time (t$) take on discrete values:
 \frac{dN}{dt} = \frac{\Delta N}{\Delta t}

Now the change in N is the difference between the current value Nt and the new value Nt+1:

Now using this in our differentiated equation (Eq. 2) gives:

 \frac{N^{t+1}-N^t}{\Delta t} = N_0 r e^{r\Delta t}

Which we can solve for the new biomass (N^t+1):

  N^{t+1}-N^t = N_0 r e^{r\Delta t} \Delta t

to get:
  N^{t+1}     = N_0 r e^{r\Delta t} \Delta t + N^t

This linear approximation, however, does introduce some error.

The approximated exponential growth curve (blue line) deviates from the analytical equation. The deviation compounds itself, getting worse exponentially, as time goes on.
The approximated exponential growth curve (blue line) deviates from the analytical equation. The deviation compounds itself, getting worse exponentially, as time goes on.

Excel file for graphed data: exponential_growth.xls

VAMP

This is the first, basic but useful product of my summer work on the IMPS website, which is centered on the VAMP biochemical model. The VAMP model is, as of this moment, still in it’s alpha stage of development — it’s not terribly user-friendly and is fairly limited in scope — but is improving rapidly.

Plugging Latex Equations into Webpages

I’ve figured out how to put latex equations into this WordPress website, but have been struggling trying to get it on my other math based web pages, like the parabolas page.

Now, however, I’ve discovered CodeCogs, which provides an excellent Equation Editor that allows the inclusion of latex equations on any website (html page).

The Mathematical Logic Behind Billions, Trillions and Standard Form

A billion, in continental Europe is, a million squared ( 1,000,0002 = 1,000,000,000,000), but in the English speaking world, a billion is only a thousand million (1,000,000,000). numberphile goes into the beautiful, mathematical logic of the longer form (i.e. the continental system).

Of course, the “simplest” system, which avoids all the potential for miscommunication, is the standard scientific notation, where 1,000,000,000 is written as 1×109 (or just 109).

The Dish

Bending a Soccer Ball

Students from the University of Leicester have published a beautiful short research paper (pdf) on the physics of curving a soccer ball through the air.

It has been found that the amount a football bends depends linearly on the speed of the ball and the amount of spin.

— Sandhu et al., 2011: How to score a goal (pdf) in the University of Leicester’s Journal of Physics Special Topics

They derive the relationship from Bernoulli’s equation using some pretty straightforward algebra. The force (F) perpendicular to the ball’s motion that causes it to curl is:

F = 2 \pi R^3 \rho \omega v

and the distance the ball curls can be calculated from:

D = \frac{\pi R^3 \rho \omega}{ v m } x^2

where:

  • F = force perpendicular to the direction the ball is kicked
  • D = perpendicular distance the ball moves to the direction it is kicked (the amount of curl)
  • R = radius of the ball
  • ρ = density of the air
  • ω = angular velocity of the ball
  • v = velocity of the ball (in the direction it is kicked)
  • m = mass of the ball
  • x = distance traveled in the direction the ball is kicked

The paper itself is an excellent example of what a short, student research paper should look like. And there are number of neat followup projects that advanced, high-school, physics/calculus students could take on, such as: considering the vertical dimension — how much time it take for the ball to rise and fall over the wall; creating a model (VPython) of the motion of the ball; and adding in the slowing of the ball due to air friction.

ScienceDaily