From Couch to Half Marathon

In the fall of 2020 I set out to be more active and took up running as a hobby. Right as I completed the Couch to 5K Program (C25K), lockdowns were being implemented across the country and I found myself with a lot more time on my hands. So, I set out to improve speed next by getting my 5K time to under 30 minutes before shifting my focus to running my first ever half marathon. This blog post hopes to take you on the journey with the data I collected along the way.

Going from couch to half marathon took me through three different running plans, using two different iPhone apps. The first running plan I used was the Couch to 5K Program (C25K); a standalone app and plan created by Active. To improve speed, I used the “Tempo Run: 5k” training plan, followed by distance using the “Half Marathon Goal” plan, both found within the RunTracker Pro app. Each of these apps had simple to follow prompts telling you when to run, walk, or pick up the pace, and are designed to progressively build speed and endurance over time.

The C25K running training plan utilizes the run / walk method and includes 3 runs per week – each between 20 and 30 minutes – with the program lasting 9 weeks in total. Over the course of the 27 training runs, the proportion of walking decreases while the proportion of running increases, culminating with three 30 minute runs in the last week of the program. The “Tempo Run: 5k” plan consisted of three runs per week for a total of eight weeks, with the same structure each week: an interval run, a tempo run, and a base run. Similar to the C25K plan, runs progressively increase in both mileage and intensity throughout. Finally, the “Half Marathon Goal” running plan consisted of four runs per week – a base run, an interval run, a tempo run, and a long run – for a total of twelve weeks. In this plan, each week ends with a long, slow distance (LSD) run, culminating in a final run of 2 hours and 15 minutes in the last week of the program. In the graphs below, we see great representations of both normal (bottom) and positively skewed (top) distributions when we look at speed and distances ran throughout these programs:

Overall Distribution of Running Distances & Paces

Given that each program had different goals, we see some clear distinctions between each of them. Unsurprisingly, the Half Marathon program featured the longest runs and the largest spread (i.e. variance) with respect to distance, but the least amount of variability with respect to speed. Another expected result was the with Tempo Run: 5K program, which featured the fastest runs with the least amount of variability in distance throughout the program. These results are clearly represented in the box plots below:

Distribution of Running Distances and Paces by Program

Since there was an ordered component to these programs, the best way to view these data is through a scatter plot, which allows us to vizualize progress over time. We can see that running pace improved at a significantly greater rate in the C25K & Faster 5K program when compared to the Half Marathon plan, which makes sense, given their respective goals. This also explains the curvature in the data when looking at running pace. When investigating distance, we see that most runs stayed within 2 to 4 miles throughout each program, with the exception of the long weekend runs in the Half Marathon plan, which clearly separate themselves from the pack linearly over time:

Scatter Plots of Running Distances & Paces over Time

Final Thoughts

While I initially did not set out to go from Couch to Half Marathon, that is what ended up happening, thanks to a few inexpensive running apps and some extra time on my hands due to a global pandemic. The C25K app is a great resource for anyone who is looking to get into running. Employing the run/walk method, the program consists of 27 runs, spread out over 9 weeks. To run faster I completed the Tempo Run: 5K (ie. Faster 5k) plan, before tackling the Half Marathon Goal plan, both of which were subsumed with the Runtracker Pro App. Both of these apps are inexpensive and helpful resources for those who are interested in getting into, or improving their running.

One word of caution: Many people who have completed this program inculcate that you should not be afraid to add extra rest days or repeat workouts as needed. I would agree with that. More importantly, you absolutely should not skip ahead, nor should run on back to back days in the beginning. The quickest way to halt any progress is through injury, so take your time and enjoy the run!

Below are links to posts breaking down each of the programs individually, along with the raw data and code used to create the charts and analyis.

Thanks for reading!

Couch to 5K

Faster 5K

Half Marathon Goal


# clean up (this clears out the previous environment)
ls()

# Load Packages 
library(tidyverse)
library(wordcloud2)
library(mosaic)
library(readxl)
library(hrbrthemes)
library(viridis)

# Likert Data Packages
library(psych)
library(FSA)
library(lattice)
library(boot)
library(likert)

#install.packages("wordcloud")
library(wordcloud)
library(tm)
library(wordcloud)


# Grid Extra for Multiplots
library("gridExtra")

# Multiple plot function (just copy paste code)

multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)

  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }

 if (numPlots==1) {
    print(plots[[1]])

  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}


# Couch to Half

# Import data from CSV, no factors

Couch2Half <- read.csv("Couch2Half.csv", stringsAsFactors = FALSE)

Couch2Half <- Couch2Half %>%
  na.omit()

Couch2Half

Couch2Half %>% 
  count(Program)

ggplot(Couch2Half, aes(x = Program, fill = Program)) +
  geom_bar() + 
  labs( x ="", y = "Speed (Miles per Hour)", title = "Runs by Program",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
  scale_fill_manual(values=c('#999999','#E69F00', '#56B4E9'))

# Plot 1 - Density Plot of Running Distances

p1 <- ggplot(Couch2Half, aes(x=Distance)) + 
  geom_density(color="#E69F00", fill="#999999") + labs( x ="Distance (Miles)", y = "", title = "Running Distances",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(hjust = 1, face = "italic"), 
    axis.text.y=element_blank(),
    axis.ticks.y=element_blank(),
    panel.background = element_blank())

# Plot 1 - Density Plot of of Running Speeds

p2 <- ggplot(Couch2Half, aes(x=Pace_MPH)) + 
  geom_density(color="#E69F00", fill="#56B4E9") + 
  labs( x ="Pace (Miles per Hour)", y = "", title = "Running Paces",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(hjust = 1, face = "italic"), 
    axis.text.y=element_blank(),
    axis.ticks.y=element_blank(),
    panel.background = element_blank())

# Combine plots using multi-plot function:

multiplot( p1, p2, cols=1)


# Plot
p3 <- Couch2Half %>%
  ggplot( aes(x=Program, y= Distance, fill=Program)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="", y = "Distance (Miles)", title = "Distance by Workout",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
  scale_fill_manual(values=c('#999999','#E69F00', '#56B4E9'))
  

# Plot
p4 <- Couch2Half %>%
  ggplot( aes(x=Program, y= Pace_MPH, fill=Program)) +
  geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="", y = "Speed (Miles per Hour)", title = "Speed by Workout",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
  scale_fill_manual(values=c('#999999','#E69F00', '#56B4E9'))


# Combine plots using multi-plot function
multiplot( p3, p4, cols=2)


p5 <- ggplot(Couch2Half, aes(x=Run, y= Pace_MPH, color = Program)) + geom_point() +  geom_smooth(method=lm , color="Black", se=TRUE) + labs( x ="Training Session", y = "Pace (Miles per Hour)", title = "Running Pace",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank()) + scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))



p6<- ggplot(Couch2Half, aes(x=Run, y= Distance, color = Program)) + geom_point() +  geom_smooth(method=lm , color="Black", se=TRUE) + labs( x ="Training Session", y = "Distance (Miles)", title = "Running Distance",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank()) + scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))

# Combine plots using multi-plot function:

multiplot( p5, p6, cols=1)


# Summary Statistics of Distance
favstats(Couch2Half$Distance)

# Summary Statistics of Pace
favstats(Couch2Half$Pace_MPH)

# Pearson Product Correlation of Distance over Time (session)
cor.test(Couch2Half$Session, Couch2Half$Distance, method = "pearson")

# Pearson Product Correlation of Pace over Time (session)
cor.test(Couch2Half$Session, Couch2Half$Pace_MPH, method = "pearson")

Author: Scott Atchison

I am a statistician and data scientist, who enjoys writing, visualizing, and talking about data, especially when we can use it to answer interesting questions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: