One of my physics students is working on a project to demonstrate interference in sound waves, so I generated a few sound files with different wavelengths for her to experiment with.
A sound wave with a frequency of 347 cycles per second (347 Hz), which has a wavelength of approximately 1 meter. Waveform captured using the WaveWindow program.
Using SoX, you can generate waves by inputing the frequency you want (using the synth command). The frequency () depends on the wavelength () and speed () of the sound waves through air.
The speed of sound through the air depends on the temperature (it’s a linear relationship). Hyperphysics has a nice Speed of Sound in Air calculator, which tells me that at room temperature (about 25 ºC):
The Correlated website asks people different, apparently unrelated questions every day and mines the data for unexpected patterns.
In general, 72 percent of people are fans of the serial comma. But among those who prefer Tau as the circle constant over Pi, 90 percent are fans of the serial comma.
Two sets of data are said to be correlated when there is a relationship between them: the height of a fall is correlated to the number of bones broken; the temperature of the water is correlated to the amount of time the beaker sits on the hot plate (see here).
A positive correlation between the time (x-axis) and the temperature (y-axis).
In fact, if we can come up with a line that matches the trend, we can figure out how good the trend is.
The first thing to try is usually a straight line, using a linear regression, which is pretty easy to do with Excel. I put the data from the graph above into Excel (melting-snow-experiment.xls) and plotted a linear regression for only the highlighted data points that seem to follow a nice, linear trend.
Correlation between temperature (y) and time (x) for the highlighted (red) data points.
You’ll notice on the top right corner of the graph two things: the equation of the line and the R2, regression coefficient, that tells how good the correlation is.
The equation of the line is:
y = 4.4945 x – 23.65
which can be used to predict the temperature where the data-points are missing (y is the temperature and x is the time).
You’ll observe that the slope of the line is about 4.5 ºC/min. I had my students draw trendlines by hand, and they came up with slopes between 4.35 and 5, depending on the data points they used.
The regression coefficient tells how well your data line up. The better they line up the better the correlation. A perfect match, with all points on the line, will have a regression coefficient value of 1.0. Our regression coefficient is 0.9939, which is pretty good.
If we introduce a little random error to all the data points, we’d reduce the regression coefficient like this (where R2 is now 0.831):
Adding in some random error causes the data to scatter more, making for a worse correlation. The black dots are the original data, while the red dots include some random error.
The correlation trend lines don’t just have to go up. Some things are negatively correlated — when one goes up the other goes down — such as the relationship between the number of hours spent watching TV and students’ grades.
The negative correlation between grades and TV watching. Image: ᔥLanthier (2002).
Correlation versus Causation
However, just because two things are correlated does not mean that one causes the other.
A jar of water on a hot-plate will see its temperature rise with time because heat is transferred (via conduction) from the hot-plate to the water.
On the other hand, while it might seem reasonable that more TV might take time away from studying, resulting in poorer grades, it might be that students who score poorly are demoralized and so spend more time watching TV; what causes what is unclear — these two things might not be related at all.
Which brings us back to the Correlated.org website. They’re collecting a lot of seemingly random data and just trying to see what things match up.
Curiously, many scientists do this all the time — typically using a technique called multiple regression. Understandably, others are more than a little skeptical. The key problem is that people too easily leap from seeing a correlation to assuming that one thing causes the other.
The average change in the date of "first leaf" in the United States. Note that states farther to the north have seen greater change. Image from the interactive by Climate Central.
In Missouri, between 1981 and 2010 the average date at which trees first showed their leaves was two days earlier than the average between 1951 and 1980, according to this graphic by Climate Central.
You’ll also note the north-south trend, where change is greater as you go north. Most models predict that global warming/climate change due to increasing carbon dioxide will result in bigger changes as you get toward the poles.
Andrew Sullivan compiles some interesting commentary on the extent of global cotton production (40% of all agricultural land), and the argument that all this production for cheap clothes is exacerbating hunger problems around the world.
Note: the history of cotton makes for a fascinating read.
Where cotton comes from? "There grew there a wonderful tree which bore tiny lambs on the endes of its branches. These branches were so pliable that they bent down to allow the lambs to feed when they are hungrie" - Mandeville (1350). (Image via Wikipedia)
It takes time, practice, and patience with seeing yourself do a lot of not-so-great work, before you eventually master a subject. Hard work not only builds character, it builds expertise.