Inverse Relationships

Inverse relationships pop-up everywhere. They’re pretty common in physics (see Boyle’s Law for example: P ∝ 1/V), but there you sort-of expect them. You don’t quite expect to see them in the number of views of my blog posts, as are shown in the Popular Posts section of the column to the right.

Table 1: Views of the posts on the Montessori Muddle in the previous month as of October 16th, 2012.

Post Post Rank Views
Plate Tectonics and the Earthquake in Japan 1 3634
Global Atmospheric Circulation and Biomes 2 1247
Equations of a Parabola: Standard to Vertex Form and Back Again 3 744
Cells, cells, cells 4 721
Salt and Sugar Under the Microscope 5 686
Google Maps: Zooming in to the 5 themes of geography 6 500
Market vs. Socialist Economy: A simulation game 7 247
Human Evolution: A Family Tree 8 263
Osmosis under the microscope 9 219
Geography of data 10 171

You can plot these data to show the relationship.

Views of the top 10 blog posts on the Montessori Muddle in the last month (as of 10/16/2012).

And if you think about it, part of it sort of makes sense that this relationship should be inverse. After all, as you get to lower ranked (less visited) posts, the number of views should asymptotically approach zero.

Questions

So, given this data, can my pre-Calculus students find the equation for the best-fit inverse function? That way I could estimate how many hits my 20th or 100th ranked post gets per month.

Can my Calculus students use the function they come up with to estimate the total number of hits on all of my posts over the last month? Or even the top 20 most popular posts?

Modeling Data with Straight Lines using Excel

Microsoft Excel, like most graphical calculators and spreadsheet programs, has the built in ability to do linear regression of measured data using certain types of functions — lines, polynomials, logarithms, and exponents for example. However, you can get it to do any type of function — sinusoidal, natural log, whatever — if you work through the spreadsheet and can use the iterative Solver tool.

This more general approach is quite useful in teaching pre-Calculus, because the primary purpose of all the functions they have to learn is to create mathematical models (functions) based on data that can be used for predictions.

The Data

I started this year’s pre-calculus class by having them collect some data. In a simplification of the snow-melt experiment I did with the middle school last year, I had them put a beaker of water (about 300ml) on a hot plate and measure the temperature every minute as warmed up.

To make the experiment a little more interesting, I had each student in each group of four take just three consecutive measurements and try to find the equation of the straight line that best fit their data, and could be used to try to predict the other measurements of their peers in their group.

Figure 1. Scatter plot of measured temperatures during the warming of a beaker of water on a hot plate. Data given in Table 1.

It did not quite work out as I’d hoped. Since you only need two points to find the equation of a straight line, having three points produced a little confusion. I’d hoped to produce that confusion, but hadn’t realized that I’d need to do a review of how to find the equation of a straight line. A large fraction of the class was a little bit rusty after hot months of summer.

So, we pooled all the data and reviewed how to find the equation of a straight line.

Table 1: The Data

Time (minutes) Measured Temperature (°C)
0 22
1 26
2 31
3 36
4 40
5 44
6 48
7 53
8 58
9 61
10 65
11 68
12 71

Finding the Equation for a Straight Line using Two Points

The general equation for a straight lines is:

(1)  y = mx + b

and we need to determine the coefficients m and b. m is the slope, which can be calculated from two points using the equation:

(2)  m = \frac{y_2 - y_1}{x_2 - x_1}

using the points at t=6 and t=11 — the points (x1, y1) = (6,48) and (x2, y2) = (11,68) respectively — for example, gives a slope of:

 m = \frac{68 - 48}{11 - 6}

 m = \frac{20}{5}

 m = 4

so our general equation becomes:

 y = 4 x + b

to find b we substitute either one of the points into the equation for x and y. If we use the first point, x = 6, and y = 48, we get:

 48 = 4 (6) + b
 48 = 24 + b

 24 + b = 48
 b = 48 - 24
 b = 24

and the equation of our line becomes:
(3)  y = 4 x + 24

Now, since we’re actually looking at a relationship between temperature and time, with temperature on the y-axis and time on the x-axis, we could relabel the terms in the equation with T = temperature and t = time to have:

(4)  T = 4 t + 24

While this equation is more satisfying to me, because I think it better describes the relationship we have, the more vocal students preferred the equation in terms of x and y (Eqn 3). These are the terms they are more familiar with in the context of a math class, and I recall seeing some evidence that students seem to learn better with the more abstract representations sometimes (though I can’t quite remember the source; I should have blogged about it).

Plotting the Data and the Modeled Straight Line

The straight line equation we came up with (Eqn. 4) is our model of the data. It’s not quite perfect. All the data do not lie on the line, although, if we did everything right, only the points (6, 48) and (11, 68) are guaranteed to be on the line.

Figure 2. The equation of our straight line model (red line) matches the data (blue diamonds) pretty well.

I showed the class how to plot the scatter graph using MS Excel, and how to draw the line to show the modeled data. The measured data are represented as points since the measurements were made at discrete points in time. The modeled equation, however, is a continuous function, hence the straight line. The Excel sheet below (Resource 1) illustrates:

Resource 1: Excel Spreadsheet of Measured versus Modeled Data

The Best Fit Curve

The Excel spreadsheet (Resource 1) was set up so that when I entered the slope (m) and intercept (b) values, the graph would quickly update. So I went through the class. Everyone called out their slope and intercept values, I plugged them in, and they could all see how the modeled line changed slightly based on the points used to calculate it. So I put the question to them, “How can we figure out which model equation is the best?”

That’s how I was able to introduce the topic of error. What if we compared the temperature predicted by the model for each data point, to the actual value. The smaller the difference in modeled versus measured temperatures, the better the fit of the model. Indeed, if we sum all the differences, or better yet take the average of the differences, we could get a single number, we’ll call the average error (ε), that could be used to compare the different models. I used this opportunity to introduce sigma notation, which the pre-calculus students had not seen much of before.

As a first pass (which, as we’ll see below, has a major problem), the error (ε) for each point (i) is:

 \epsilon_i = (T_{measured}-T_{modeled})

The average error is the sum of all the errors divided by the number of points (n) (we have 12 points so n=12 in this example):

(5)  \bar{\epsilon} = \frac{\sum\limits_{i=1}^{n} \epsilon_i}{n}

Now this works, but there is one problem. I was quite pleased and a little bit surprise that one of my students recognized what it was without any coaxing and also suggested a solution: by simply taking the difference to calculate the error, a point that is offset above the modeled line can be canceled out by a point offset by the same amount below the line. So what we really need is to use the absolute value of the error.

(6)  \epsilon_i = \left| T_{measured}-T_{modeled} \right|

This works, and is what we went with, but I did also point that what’s usually done is to use the square of the error instead of the absolute value. Squaring makes any number positive, so it accomplishes the same goal as the absolute value, and is the approach we’ll use when I go into linear regression later on.

Setting up the Excel spreadsheet to calculate the average error is fairly straightforward as shown in Resource 2:

Resource 2. Calculating the average error using Excel.

So once again, we went through the class and everyone called out their slope and intercept values and cheered when I plugged the numbers in and they saw if they had the lowest value.

It is important to remember, though, that the competition gives a somewhat random result: students’ average error is a function of the points they happened to pick, not how well they did the math (assuming everyone did the math correctly).

Figure 2. Showing the spreadsheet used to calculate the average error (Resource 2).

DNAi: History of genetics and manipulating DNA

DNA. (from Wikipedia)

DNA interactive is another great resource for studying the history of genetics and how we manipulate and use it today (recommended by the indispensable Anna Clarke). They have lesson plans and nice pages on the modern techniques used to work with DNA.

Image from the DNAi webpage on gel electrophoresis. Electrophoresis is a bit like chromatography which might make for a good demonstration.

I have not done much with genetic sequencing myself and I found the website interesting and informative. I have, however, written programs to get and work with the GenBank database, which is not that hard since they have some easy tools to work with. I would love to figure out how to get a sample sequenced and then run it through GenBank to identify it. It would so nicely integrate the curriculum, using a practical exercise to solve a problem (like what species are on the nature trail), while using the same tools and resources that scientists use, and tie wonderfully into the short stories in Mirable.