Curious Correlations

The Correlated website asks people different, apparently unrelated questions every day and mines the data for unexpected patterns.

In general, 72 percent of people are fans of the serial comma. But among those who prefer Tau as the circle constant over Pi, 90 percent are fans of the serial comma.

— ᔥ Correlated.org: March 23’s Correlation.

Two sets of data are said to be correlated when there is a relationship between them: the height of a fall is correlated to the number of bones broken; the temperature of the water is correlated to the amount of time the beaker sits on the hot plate (see here).

snow-T_data-0643 — A positive correlation between the time (x-axis) and the temperature (y-axis).

In fact, if we can come up with a line that matches the trend, we can figure out how good the trend is.

The first thing to try is usually a straight line, using a linear regression, which is pretty easy to do with Excel. I put the data from the graph above into Excel (melting-snow-experiment.xls) and plotted a linear regression for only the highlighted data points that seem to follow a nice, linear trend.

melting-snow-graph-2 — Correlation between temperature (y) and time (x) for the highlighted (red) data points.

You’ll notice on the top right corner of the graph two things: the equation of the line and the R², regression coefficient, that tells how good the correlation is.

The equation of the line is:

y = 4.4945 x – 23.65

which can be used to predict the temperature where the data-points are missing (y is the temperature and x is the time).

You’ll observe that the slope of the line is about 4.5 ºC/min. I had my students draw trendlines by hand, and they came up with slopes between 4.35 and 5, depending on the data points they used.

The regression coefficient tells how well your data line up. The better they line up the better the correlation. A perfect match, with all points on the line, will have a regression coefficient value of 1.0. Our regression coefficient is 0.9939, which is pretty good.

If we introduce a little random error to all the data points, we’d reduce the regression coefficient like this (where R² is now 0.831):

snow-melting-randomized — Adding in some random error causes the data to scatter more, making for a worse correlation. The black dots are the original data, while the red dots include some random error.

The correlation trend lines don’t just have to go up. Some things are negatively correlated — when one goes up the other goes down — such as the relationship between the number of hours spent watching TV and students’ grades.

negative-correlation — The negative correlation between grades and TV watching. Image: ᔥ Lanthier (2002).

Correlation versus Causation

However, just because two things are correlated does not mean that one causes the other.

A jar of water on a hot-plate will see its temperature rise with time because heat is transferred (via conduction) from the hot-plate to the water.

On the other hand, while it might seem reasonable that more TV might take time away from studying, resulting in poorer grades, it might be that students who score poorly are demoralized and so spend more time watching TV; what causes what is unclear — these two things might not be related at all.

Which brings us back to the Correlated.org website. They’re collecting a lot of seemingly random data and just trying to see what things match up.

Curiously, many scientists do this all the time — typically using a technique called multiple regression. Understandably, others are more than a little skeptical. The key problem is that people too easily leap from seeing a correlation to assuming that one thing causes the other.

Sub-atomic Physics: The Significance of 0.8%

When it comes to particle physics … [m]easuring something once is meaningless because of the high degree of uncertainty involved in such exotic, small systems. Scientists rely on taking measurements over and over again — enough times to dismiss the chance of a fluke.

— Moskowitz (2011): Is the New Physics Here? Atom Smashers Get an Antimatter Surprise in LiveScience

New research, out of the Large Haldron Collider in Switzerland, shows a 0.8% difference in the way matter and antimatter particles behave. This small difference could go a long way in explaining why the universe is made up mostly of matter today, even though in the beginning there were about equal amounts of matter and antimatter. It would mean that the current, best theory describing particle physics, the Standard Model, needs some significant tweaking.

The Standard Model of elementary particles. The LHC experiment looked the charm quarks (c), and their corresponding antiquarks, which have an opposite charge. Image by MissMJ via Wikipedia.

0.8% is small, but significant. How confident are the physicists that their measurements are accurate? Well, the more measurements you take the more confident you can be in your average result, though you can never be 100% certain. The LHC scientists did enough measurements that they could calculate, statistically, that there is only a 0.05% chance that their measurement is wrong.

Figuring Out Experimental Error

Using stopwatches, we measured the time it took for the tennis ball to fall 5.3 meters. Some of the individual measurements were off by over 30%, but the average time measured was only off by 7%.

I did a little exercise at the start of my high-school physics class today that introduced different types of experimental error. We’re starting the second quarter now and it’s time for their lab reports to including more discussion about potential sources of error, how they might fix some of them, and what they might mean.

One of the stairwells just outside the physics classroom wraps around nicely, so students could stand on the steps and, using stopwatches, time it as I dropped a tennis ball 5.3 meters, from the top banister to the floor below.

measured-falling-time — Students' measured falling times (in seconds).

Random and Reading Errors

They had a variety of stopwatches, including a number of phones, at least one wristwatch, and a few of the classroom stopwatches that I had on hand. Some devices could do readings to one hundredth of a second, while others could only do tenths of a second. So you can see that there is some error just due to how detailed the measuring device can be read. We’ll call this the reading error. If the best value your stopwatch gives you is to the tenth of a second, then you have a reading error of plus or minus 0.1 seconds (±0.1 s). And you can’t do much about this other than get a better measuring device.

Another source of error is just due to random differences that will happen with every experimental trial. Maybe you were just a fraction of a second slower stopping your watch this time compared to the last. Maybe a slight gust of air slowed the balls fall when it dropped this time. This type of error is usually just called random error, and can only be reduced by taking more and more measurements.

Our combination of reading and random errors, meant that we had quite a wide range of results – ranging from a minimum time of 0.7 seconds, to a maximum of 1.2 seconds.

So what was the right answer?

Well, you can calculate the falling time if you know the distance (d) the ball fell (d = 5.3 m), and its acceleration due to gravity (g = 9.8 m/s²) using the equation:

$! t = \sqrt{\frac{2d}{g}}$

which gives:

$! t = 1.043 s$

So while some individual measurements were off by over 30%, the average value was off by only 8%, which is a nice illustration of the phenomenon that the more measurements you take, the better your result. In fact, you can plot the improvement in the data by drawing a graph of how the average of the measurements improves with the number of measurements (n) you take.

falling-averages — The first measurement (1.2 s) is much higher than the calculated value, but when you incorporate the next four values in the average it undershoots the actual (calculated) value. However, as you add more and more data points into the average the measured value gets slowly closer to the calculated value.

More measurements reduce the random error, but you tend to get to a point of diminishing returns when you average just does not improve enough to make it worth the effort of taking more measurements. The graph shows the average slowly ramping up after you use five measurements. While there are statistical techniques that can help you determine how many samples are enough, you ultimately have to base you decision on how accurate you want to be and how much time and energy you want to spend on the project. Given the large range of values we have in this example, I would not want to use less than six measurements.

Systematic Error

But, as you can see from the graph, even with over a dozen measurements, the average measured value remains persistently lower than the calculated value. Why?

This is quite likely due to some systematic error in our experiment – an error you make every time you do the experiment. Systematic errors are the most interesting type of errors because they tell you that something in the way you’ve designed your experiment is faulty.

The most exciting type of systematic error would, in my opinion, be one caused by a fundamental error in your assumptions, because they challenge you to fundamentally reevaluate what you’re doing. The scientists who recently reported seeing particles moving faster than light made their discovery because there was a systematic error in their measurements – an error that may result in the rewriting of the laws of physics.

In our experiment, I calculated the time the tennis ball took to fall using the gravitational acceleration at the surface of the Earth (9.8 m/s²). One important force that I did not consider in the calculation was air resistance. Air resistance would slow down the ball every single time it was dropped. It would be a systematic error. In fact, we could use the error that shows up to actually calculate the force of the air resistance.

However, since air resistance would slow the ball down, it would take longer to hit the floor. Unfortunately, our measurements were shorter than the calculated falling time so air resistance is unlikely to explain our error. So we’re left with some error in how the experiment was done. And quite frankly, I’m not really sure what it is. I suspect it has to do with student’s reaction times – it probably took them longer to start their stopwatches when I dropped the ball than it did to stop them when the ball hit the floor – but I’m not sure. We’ll need further experiments to figure this one out.

In Conclusion

On reflection, I think I probably would have done better using a less dense ball, perhaps a styrofoam ball, that would be more affected by air resistance, so I can show how systematic errors can be useful.

Fortunately (sort of) in my demonstration I made an error in calculating the falling rate – I forgot to include the 2 under the square root sign – so I ended up with a much lower predicted falling time for the ball – which allowed me to go through a whole exercise showing the class how to use Excel’s Goal Seek function to figure out the deceleration due to air resistance.

My Excel Spreadsheet with all the data and calculations is included here.

There are quite a number of other things that I did not get into since I was trying to keep this exercise short (less than half an hour), but one key one would be using significant figures.

There are a number of good, but technical websites dealing with error analysis including this, this and this.

Match Stick Rockets

A great, simple, and slightly dangerous way of making rockets. There are a number of variations. I like NASA’s because they have a very nice set of instructions.

match_rocket_howto — How to make a match stick rocket. By Steve Cullivan via NASA.

With a stable launch platform that maintains consistent but changeable launch angles, these could be a great source of simple science experiments that look at the physics of ballistics and the math of parabolas (a nice video camera would be a great help here too) and statistics (matchsticks aren’t exactly precision instruments).

Ngram: The history of words

Montessori-chart-a — Graphs of the words Montessori and muddle created with Google Ngram.

If you take all the books ever written and draw a graph showing which words were used when, you’d end up with something like Google’s Ngram. Of course I thought I’d chart “Montessori” and “muddle”.

The “Montessori” graph is interesting. It seems to show the early interest in her work, around 1912, and then an interesting increase in interest in the 1960’s and 1970’s. Like with all statistics, one should really be cautious about how you interpret this type of data, however, I suspect this graph explains a lot about the sources of modern trends in Montessori education. I’d love hear someone with more experience thinks.

Alexis Madrigal has an interesting collection of graphs, while Discover has an article with much more detail about what can be done with Google’s database.

Statistical significance

DisNormal06 — Normal distribution with 95% unshaded. Adapted from Wikimedia Commons.

A discussion of statistical significance is probably a bit above middle school level, but I’m posting a note here because it is a reminder about the importance of statistics. In fact, students will hear about confidence intervals when they hear about the margin of error of polls in the news and the “significant” benefits of new drugs. Indeed, if you think about it, the development of formal thinking skills during adolescence should make it easier for students to see the world from a more probabilistic perspective, noticing the shades of grey that surround issues, rather that the more black and white, deterministic, point of view young idealists tend to have. At any rate, statistics are important in life but, according to a Science Magazine article, many scientists are not using them correctly.

One key error is in understanding the term “statistically significant”. When Ronald A. Fisher came up with the concept he arbitrarily chose 95% as the cutoff to test if an experiment worked. The arbitrariness is one part of the problem, 95% still means there is one chance in twenty that the experiment failed and with all the scientists conducting experiments, that’s a lot of unrecognized failed experiments.

But the big problem is the fact that people conflate statistical significance and actual significance. Just because there is a statistically significant correlation between eating apples and acne, does not mean that it’s actually important. It could be that this result predicts that one person in ten million will get acne from eating apples, but is that enough reason to stop eating apples?

It is a fascinating article that deals with a number of other erroneous uses of statistics, but I’ve just spent more time on this post than I’d planned (it was supposed to be a short note). So I’d be willing to bet that there is a statistically significant correlation between my interest in an issue and the length of the post (and no correlation with the amount of time I intended to spend on the post).