Culturing Yeast: Baking Bread in Biology

Loves of bread.

Nice and fluffy loaves of bread requires the generation of bubbles in the dough. This is typically done either with an acid-base reaction (baking soda and an acid) or with yeast. Since we’re doing biology, we made some loaves and focused on how the process of bread making require careful management of the environment for the yeast to produce the carbon dioxide gas that makes the bubbles that makes the bread rise.

We followed the bread-baking recipe I’ve used before for the middle school’s student-run business, but had to shorten rising times to get it all done by the end of our class.

Yeast is a single-celled fungi (typically Saccharomyces cerevisiae). Fungi are heterotrophs, so focusing on what the yeast requires for life and metabolic activity requires consideration of:

  • water (moisture)
  • warmth (but not too warm)
  • energy source (short chained carbohydrates to make the energy more easily accessible)

Yeast produces carbon dioxide bubbles via fermentation (Styurf et al., 2017). It could do it through respiration, but in the bread dough there is not a lot of oxygen available (more info on respiration here).

Fermentation looks something like:

C6H12O6 → 2 C2H5OH + 2 CO2

So, the carbohydrate (glucose) is converted to ethanol and carbon dioxide.

As opposed to respiration (which requires oxygen):

C6H12O6 + 6 O2 → 6 H2O + 6 CO2

Yeast fermenting/proofing.

Identifying Birds by Sound: BirdNet

The Cornell Ornithology Lab’s BirdNet lets you upload audio files of bird calls and identifies the birds. I tried it with this file (BirdCall01.m4a) recorded near school, and it identified Red Shouldered Hawks (about 6 seconds in).

Screen capture from sound file analysis on BirdNet’s online demo.

3d Printable Microscopes

A few interesting, low-cost but potentially lab-grade, microscopes that would be great Makerspace projects for students.

OpenFlexure: Out of the University of Bath, this has a Raspberry Pi at the core that can control the stage, focus, and sensor (using the RPi camera module). Since it’s modular the cost varies with the image quality you’re aiming for, but it looks like you can achieve even high resolution results relatively cheaply. They have great detail on their website, including their own version of Raspbian to install on the Pi, so this looks like an good starter project.

UC2: I really like the look of this building block, LEGO-style, system. It seems extremely flexible and there are some interesting projects that go beyond your standard microscope. There are a lot of designs you can go with, including an Arduino or using a Raspberry Pi and camera, but they claim to get good results just with smartphones. This is a big, sprawling project, which suggests a slightly steeper learning curve.

Hat tip to Maggie Eisenberger for introducing me to these.

Comparing Covid Cases of US States

Missouri’s confirmed cases (z-score) compared to the other U.S. states from April 20th to October 3rd, 2020. The z-score is a measure of how far you are away from the average. In this case, a negative z-score is good because it indicates that you’re below the average number of cases (per 1000 people). For all the states.

Based on my students’ statistics projects, I automated the method (using R) to calculate the z-score for all the states in the U.S. We used the John Hopkins daily data.

I put graphs for all of the states on the COVID: The U.S. States Compared webpage.

The R functions (test.R) assumes all of the data is in a folder (COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports_us/), and outputs the graphs to the folder ‘images/zscore/‘ which needs to exist.

covid_data <- function(infile, state="Missouri") {
    filename <- paste(file_dir, infile, sep='')
    mydata <- read.csv(filename)
    pop <- read.csv('state_populations.txt')
    mydata <- merge(mydata, pop)
    mydata$ConfirmedPerCapita1000 <- mydata$Confirmed / mydata$Population *1000
    summary(mydata$ConfirmedPerCapita1000)
    stddev <- sd(mydata$ConfirmedPerCapita1000)
    avg <- mean(mydata$ConfirmedPerCapita1000)
    cpc1k <- mydata[mydata$Province_State == state,]$ConfirmedPerCapita1000
    zscore <- (cpc1k - avg)/stddev
    #print(infile, zscore)
    return(zscore)
}


get_zScore_history <-function(state='Missouri') {
  df <- data.frame(Date=as.Date(character()), zscore=numeric())
  for (f in datafiles){
    dateString <- as.Date(substring(f, 1, 10), format='%m-%d-%y')
    zscore <- covid_data(f, state=state)
    df[nrow(df) + 1,] = list(dateString, zscore)
  }
  df$day <- 1:nrow(df)

  plot_zScore(df, state)


  # LINEAR REGRESSIONS:
  # http://r-statistics.co/Linear-Regression.html
  lmod <- lm(day ~ zscore, df)
  return(df)
}

plot_zScore <- function(df, state){
  max_z <- max( abs(max(df$zscore)), abs(min(df$zscore)))
  print(max_z)


  zplot <- plot(x=df$day, y=df$zscore, main=paste('z-score: ', state), xlab="Day since April 20th, 2020", ylab='z-score', ylim=c(-max_z,max_z))
  abline(0,0, col='firebrick')
  dev.copy(png, paste('images/zscore/', state, '-zscore.png', sep=''))
  dev.off()
}

get_states <- function(){
  lastfile <- datafiles[ length(datafiles) ]
  filename <- paste(file_dir, lastfile, sep='')
  mydata <- read.csv(filename)
  pop <- read.csv('state_populations.txt')
  mydata <- merge(mydata, pop)
  return(mydata$Province_State)
}

graph_all_states <- function(){
  states <- get_states()
  for (state in states) {
    get_zScore_history(state)
  }
}

file_dir <- 'COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports_us/'
datafiles <- list.files(file_dir, pattern="*.csv")

print("To get the historical z-score data for a state run (for example):")
print(" > get_zScore_history('New York')" )

df = get_zScore_history()

You can run the code in test.R in the R console using the commands:

> source('test.R')

which does Missouri by default, but to do other states use:

> get_zScore_history('New York')

To get all the states use:

> graph_all_states()

Basic R (using Covid data)

Once you start R you’ll need to figure out which directory you’re working in:

> getwd()

On a Windows machine your default working directory might be something like:

[1] "C:/Users/username"

On OSX or Linux you’ll get something like:

 [1] "/Users/username" 

To get to the directory you want to work in use setwd(). I’ve put my files into the directory: “/Users/lurba/Documents/TFS/Stats/COVID”

To get there from the working directory above I could enter the full path above, or just the relative path like:

> setwd("TFS/Stats/COVID")

Now my data is in the file named “04-20-2020.csv” (from the John Hopkins Covid data repository on github) which I’ll read in with:

> mydata <- read.csv("04-20-2020.csv")

This creates a variable named “mydata” that has the information in it. I can see the column names by using:

> colnames(mydata)

which gives:

 [1] "Province_State"       "Country_Region"       "Last_Update"         
 [4] "Lat"                  "Long_"                "Confirmed"           
 [7] "Deaths"               "Recovered"            "Active"              
 [10] "FIPS"                 "Incident_Rate"        "People_Tested"       
 [13] "People_Hospitalized"  "Mortality_Rate"       "UID"                 
 [16] "ISO3"                 "Testing_Rate"         "Hospitalization_Rate"

Let’s take a look at the summary statistics for the number of confirmed cases, which is in the column labeled “Confirmed”:

> summary(mydata$Confirmed)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    317    1964    4499   15347   13302  253060 

This shows that the mean is 15, 347 and the maximum is 253,060 confirmed cases.

I’m curious about which state has that large number of cases, so I’m going to print out the columns with the state names (“Province_State”) and the number of confirmed cases (“Confirmed”). From our colnames command above we can see that “Province_State” is column 1, and “Confirmed” is column 6, so we’ll use the command:

> mydata[ c(1,6) ]

The “c(1,6)” says that we want the columns 1 and 6. This command outputs

             Province_State Confirmed
1                   Alabama      5079
2                    Alaska       321
3            American Samoa         0
4                   Arizona      5068
5                  Arkansas      1973
6                California     33686
7                  Colorado      9730
8               Connecticut     19815
9                  Delaware      2745
10         Diamond Princess        49
11     District of Columbia      2927
12                  Florida     27059
13                  Georgia     19407
14           Grand Princess       103
15                     Guam       136
16                   Hawaii       584
17                    Idaho      1672
18                 Illinois     31513
19                  Indiana     11688
20                     Iowa      3159
21                   Kansas      2048
22                 Kentucky      3050
23                Louisiana     24523
24                    Maine       875
25                 Maryland     13684
26            Massachusetts     38077
27                 Michigan     32000
28                Minnesota      2470
29              Mississippi      4512
30                 Missouri      5890
31                  Montana       433
32                 Nebraska      1648
33                   Nevada      3830
34            New Hampshire      1447
35               New Jersey     88722
36               New Mexico      1971
37                 New York    253060
38           North Carolina      6895
39             North Dakota       627
40 Northern Mariana Islands        14
41                     Ohio     12919
42                 Oklahoma      2680
43                   Oregon      1957
44             Pennsylvania     33914
45              Puerto Rico      1252
46             Rhode Island      5090
47           South Carolina      4446
48             South Dakota      1685
49                Tennessee      7238
50                    Texas     19751
51                     Utah      3213
52                  Vermont       816
53           Virgin Islands        53
54                 Virginia      8990
55               Washington     12114
56            West Virginia       902
57                Wisconsin      4499
58                  Wyoming       317
59                Recovered         0

Looking through, we can see that New York was the state with the largest number of cases.

Note that we could have searched for the row with the maximum number of Confirmed cases using the command:

> d2[which.max(d2$Confirmed),]

Merging Datasets

In class, we’ve been editing the original data file to add a column with the state populations (called “Population”). I have this in a separate file called “state_populations.txt” (which is also a comma separated variable file, .csv, even if not so labeled). So I’m going to import the population data:

> pop <- read.csv("state_population.txt")

Now I’ll merge the two datasets to add the population data to “mydata”.

> mydata <- merge(mydata, pop)

Graphing (Histograms and Boxplots)

With the datasets together we can try doing a histogram of the confirmed cases. Note that there is a column labeled “Confirmed” in the mydata dataset, which we’ll address as “mydata$Confirmed”:

> hist(mydata$Confirmed)
Histogram of confirmed Covid-19 cases as of 04-20-2020.

Note that on April 20th, most states had very few cases, but there were a couple with a lot of cases. It would be nice to see the data that’s clumped in the 0-50000 range broken into more bins, so we’ll add an optional argument to the hist command. The option is called breaks and we’ll request 20 breaks.

> hist(mydata$Confirmed, breaks=20)
A more discretized version of the confirmed cases histogram.

Calculations (cases per 1000 population)

Of course, simply looking at the number of cases in not very informative because you’d expect, with all things being even, that states with the highest populations would have the highest number of cases. So let’s calculate the number of cases per capita. We’ll multiply that number by 1000 to make it more human readable:

> mydata$ConfirmedPerCapita1000 <- mydata$Confirmed / mydata$Population * 1000

Now our histogram would look like:

> hist(mydata$ConfirmedPerCapita1000, breaks=20)
Confirmed cases per 1000 people.

The dataset still has a long tail, but we can see the beginnings of a normal distribution.

The next thing we can do is make a boxplot of our cases per 1000 people. I’m going to set the range option to zero so that the plot has the long tails:

> boxplot(mydata$ConfirmedPerCapita1000, range=0)
Boxplot of US states’ confirmed cases per 1000 people.

The boxplot shows, more or less, the same information in the histogram.

Finding Specific Data in the Dataset

We’d like to figure out how Missouri is doing compared to the rest of the states, so we’ll calculate the z-score, which tells how many standard deviations you are away from the mean. While there is a built in z-score function in R, we’ll first see how we can use the search and statistics methods to find the relevant information.

First, finding Missouri’s number of confirmed cases. To find all of the data in the row for Missouri we can use:

> mydata[mydata$Province_State == "Missouri",]

which gives something like this. It has all of the data but is not easy to read.

   Province_State Population Country_Region         Last_Update     Lat
26       Missouri    5988927             US 2020-04-20 23:36:47 38.4561
      Long_ Confirmed Deaths Recovered Active FIPS Incident_Rate People_Tested
26 -92.2884      5890    200        NA   5690   29      100.5213         56013
   People_Hospitalized Mortality_Rate      UID ISO3 Testing_Rate
26                 873       3.395586 84000029  USA      955.942
   Hospitalization_Rate ConfirmedPerCapita1000
26             14.82173              0.9834817

To extract just the “Confirmed” cases, we’ll add that to our command like so:

> mydata[mydata$Province_State == "Missouri",]$Confirmed
[1] 5890

Which just gives the number 5890. Or the “ConfirmedPerCapita1000”:

> mydata[mydata$Province_State == "Missouri",]$ConfirmedPerCapita1000
[1] 0.9834817

This method would also be useful later on if we want to automate things.

z-score

We have the mean from when we did the summary command, but there’s also a mean command.

> mean(mydata$ConfirmedPerCapita1000)
[1] 2.006805

Similarly you can get the standard deviation with the sd function.

> sd(mydata$ConfirmedPerCapita1000)
[1] 2.400277

We can now calculate the z-score for Missouri:

\text{z-score} = \frac{(X - \mu)}{ \sigma}

which gives a results of:

\text{z-score} =  -0.43

So it looks like Missouri was doing reasonable well back in April, at least compared to the rest of the country.

Human Migration over the last 125,000 years

Video by Tobais Friedrich out of the University of Hawaii. It’s based on a recent paper that suggests that the large fluctuations in climate over the last 120,000 years opened and closed green corridors that allowed multiple pulses of migration out of Africa.

He has some other excellent earth science scientific visualizations.

Video: From a Single Cell to an Alpine Newt

Becoming from Aeon Video on Vimeo.

Watch a single cell become a complete organism in six pulsing minutes of timelapse. A film by Jan van IJken (www.janvanijken.com).

More on this video: aeon.co/videos/watch-a-single-cell-become-a-complete-organism-in-six-pulsing-minutes-of-timelapse
Watch more on Aeon: aeon.co/video
Subscribe: vimeo.com/aeonvideo

An exceptional timelapse of the developing of an Alpine newt by Jan van IJken

Trophic Cascade: The Effect of Wolves on Yellowstone

The reintroduction of wolves to Yellowstone National Park resulted in enormous changes to the ecology: more plants and animals as the wolves reduced the deer population and changed the deers’ behavior. The change in vegetation resulted in stabilization of the rivers, so the wolves changed the geomorphology of the park as well.