Comparing Covid Cases of US States

Missouri’s confirmed cases (z-score) compared to the other U.S. states from April 20th to October 3rd, 2020. The z-score is a measure of how far you are away from the average. In this case, a negative z-score is good because it indicates that you’re below the average number of cases (per 1000 people). For all the states.

Based on my students’ statistics projects, I automated the method (using R) to calculate the z-score for all the states in the U.S. We used the John Hopkins daily data.

I put graphs for all of the states on the COVID: The U.S. States Compared webpage.

The R functions (test.R) assumes all of the data is in a folder (COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports_us/), and outputs the graphs to the folder ‘images/zscore/‘ which needs to exist.

covid_data <- function(infile, state="Missouri") {
    filename <- paste(file_dir, infile, sep='')
    mydata <- read.csv(filename)
    pop <- read.csv('state_populations.txt')
    mydata <- merge(mydata, pop)
    mydata$ConfirmedPerCapita1000 <- mydata$Confirmed / mydata$Population *1000
    summary(mydata$ConfirmedPerCapita1000)
    stddev <- sd(mydata$ConfirmedPerCapita1000)
    avg <- mean(mydata$ConfirmedPerCapita1000)
    cpc1k <- mydata[mydata$Province_State == state,]$ConfirmedPerCapita1000
    zscore <- (cpc1k - avg)/stddev
    #print(infile, zscore)
    return(zscore)
}


get_zScore_history <-function(state='Missouri') {
  df <- data.frame(Date=as.Date(character()), zscore=numeric())
  for (f in datafiles){
    dateString <- as.Date(substring(f, 1, 10), format='%m-%d-%y')
    zscore <- covid_data(f, state=state)
    df[nrow(df) + 1,] = list(dateString, zscore)
  }
  df$day <- 1:nrow(df)

  plot_zScore(df, state)


  # LINEAR REGRESSIONS:
  # http://r-statistics.co/Linear-Regression.html
  lmod <- lm(day ~ zscore, df)
  return(df)
}

plot_zScore <- function(df, state){
  max_z <- max( abs(max(df$zscore)), abs(min(df$zscore)))
  print(max_z)


  zplot <- plot(x=df$day, y=df$zscore, main=paste('z-score: ', state), xlab="Day since April 20th, 2020", ylab='z-score', ylim=c(-max_z,max_z))
  abline(0,0, col='firebrick')
  dev.copy(png, paste('images/zscore/', state, '-zscore.png', sep=''))
  dev.off()
}

get_states <- function(){
  lastfile <- datafiles[ length(datafiles) ]
  filename <- paste(file_dir, lastfile, sep='')
  mydata <- read.csv(filename)
  pop <- read.csv('state_populations.txt')
  mydata <- merge(mydata, pop)
  return(mydata$Province_State)
}

graph_all_states <- function(){
  states <- get_states()
  for (state in states) {
    get_zScore_history(state)
  }
}

file_dir <- 'COVID-19-master/csse_covid_19_data/csse_covid_19_daily_reports_us/'
datafiles <- list.files(file_dir, pattern="*.csv")

print("To get the historical z-score data for a state run (for example):")
print(" > get_zScore_history('New York')" )

df = get_zScore_history()

You can run the code in test.R in the R console using the commands:

> source('test.R')

which does Missouri by default, but to do other states use:

> get_zScore_history('New York')

To get all the states use:

> graph_all_states()

Leave a Reply