From Couch to Half Marathon

In the fall of 2020 I set out to be more active and took up running as a hobby. Right as I completed the Couch to 5K Program (C25K), lockdowns were being implemented across the country and I found myself with a lot more time on my hands. So, I set out to improve speed next by getting my 5K time to under 30 minutes before shifting my focus to running my first ever half marathon. This blog post hopes to take you on the journey with the data I collected along the way.

Going from couch to half marathon took me through three different running plans, using two different iPhone apps. The first running plan I used was the Couch to 5K Program (C25K); a standalone app and plan created by Active. To improve speed, I used the “Tempo Run: 5k” training plan, followed by distance using the “Half Marathon Goal” plan, both found within the RunTracker Pro app. Each of these apps had simple to follow prompts telling you when to run, walk, or pick up the pace, and are designed to progressively build speed and endurance over time.

The C25K running training plan utilizes the run / walk method and includes 3 runs per week – each between 20 and 30 minutes – with the program lasting 9 weeks in total. Over the course of the 27 training runs, the proportion of walking decreases while the proportion of running increases, culminating with three 30 minute runs in the last week of the program. The “Tempo Run: 5k” plan consisted of three runs per week for a total of eight weeks, with the same structure each week: an interval run, a tempo run, and a base run. Similar to the C25K plan, runs progressively increase in both mileage and intensity throughout. Finally, the “Half Marathon Goal” running plan consisted of four runs per week – a base run, an interval run, a tempo run, and a long run – for a total of twelve weeks. In this plan, each week ends with a long, slow distance (LSD) run, culminating in a final run of 2 hours and 15 minutes in the last week of the program. In the graphs below, we see great representations of both normal (bottom) and positively skewed (top) distributions when we look at speed and distances ran throughout these programs:

Overall Distribution of Running Distances & Paces

Given that each program had different goals, we see some clear distinctions between each of them. Unsurprisingly, the Half Marathon program featured the longest runs and the largest spread (i.e. variance) with respect to distance, but the least amount of variability with respect to speed. Another expected result was the with Tempo Run: 5K program, which featured the fastest runs with the least amount of variability in distance throughout the program. These results are clearly represented in the box plots below:

Distribution of Running Distances and Paces by Program

Since there was an ordered component to these programs, the best way to view these data is through a scatter plot, which allows us to vizualize progress over time. We can see that running pace improved at a significantly greater rate in the C25K & Faster 5K program when compared to the Half Marathon plan, which makes sense, given their respective goals. This also explains the curvature in the data when looking at running pace. When investigating distance, we see that most runs stayed within 2 to 4 miles throughout each program, with the exception of the long weekend runs in the Half Marathon plan, which clearly separate themselves from the pack linearly over time:

Scatter Plots of Running Distances & Paces over Time

Final Thoughts

While I initially did not set out to go from Couch to Half Marathon, that is what ended up happening, thanks to a few inexpensive running apps and some extra time on my hands due to a global pandemic. The C25K app is a great resource for anyone who is looking to get into running. Employing the run/walk method, the program consists of 27 runs, spread out over 9 weeks. To run faster I completed the Tempo Run: 5K (ie. Faster 5k) plan, before tackling the Half Marathon Goal plan, both of which were subsumed with the Runtracker Pro App. Both of these apps are inexpensive and helpful resources for those who are interested in getting into, or improving their running.

One word of caution: Many people who have completed this program inculcate that you should not be afraid to add extra rest days or repeat workouts as needed. I would agree with that. More importantly, you absolutely should not skip ahead, nor should run on back to back days in the beginning. The quickest way to halt any progress is through injury, so take your time and enjoy the run!

Below are links to posts breaking down each of the programs individually, along with the raw data and code used to create the charts and analyis.

Thanks for reading!

Couch to 5K

Faster 5K

Half Marathon Goal


# clean up (this clears out the previous environment)
ls()

# Load Packages 
library(tidyverse)
library(wordcloud2)
library(mosaic)
library(readxl)
library(hrbrthemes)
library(viridis)

# Likert Data Packages
library(psych)
library(FSA)
library(lattice)
library(boot)
library(likert)

#install.packages("wordcloud")
library(wordcloud)
library(tm)
library(wordcloud)


# Grid Extra for Multiplots
library("gridExtra")

# Multiple plot function (just copy paste code)

multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)

  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }

 if (numPlots==1) {
    print(plots[[1]])

  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}


# Couch to Half

# Import data from CSV, no factors

Couch2Half <- read.csv("Couch2Half.csv", stringsAsFactors = FALSE)

Couch2Half <- Couch2Half %>%
  na.omit()

Couch2Half

Couch2Half %>% 
  count(Program)

ggplot(Couch2Half, aes(x = Program, fill = Program)) +
  geom_bar() + 
  labs( x ="", y = "Speed (Miles per Hour)", title = "Runs by Program",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
  scale_fill_manual(values=c('#999999','#E69F00', '#56B4E9'))

# Plot 1 - Density Plot of Running Distances

p1 <- ggplot(Couch2Half, aes(x=Distance)) + 
  geom_density(color="#E69F00", fill="#999999") + labs( x ="Distance (Miles)", y = "", title = "Running Distances",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(hjust = 1, face = "italic"), 
    axis.text.y=element_blank(),
    axis.ticks.y=element_blank(),
    panel.background = element_blank())

# Plot 1 - Density Plot of of Running Speeds

p2 <- ggplot(Couch2Half, aes(x=Pace_MPH)) + 
  geom_density(color="#E69F00", fill="#56B4E9") + 
  labs( x ="Pace (Miles per Hour)", y = "", title = "Running Paces",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(hjust = 1, face = "italic"), 
    axis.text.y=element_blank(),
    axis.ticks.y=element_blank(),
    panel.background = element_blank())

# Combine plots using multi-plot function:

multiplot( p1, p2, cols=1)


# Plot
p3 <- Couch2Half %>%
  ggplot( aes(x=Program, y= Distance, fill=Program)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="", y = "Distance (Miles)", title = "Distance by Workout",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
  scale_fill_manual(values=c('#999999','#E69F00', '#56B4E9'))
  

# Plot
p4 <- Couch2Half %>%
  ggplot( aes(x=Program, y= Pace_MPH, fill=Program)) +
  geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="", y = "Speed (Miles per Hour)", title = "Speed by Workout",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
  scale_fill_manual(values=c('#999999','#E69F00', '#56B4E9'))


# Combine plots using multi-plot function
multiplot( p3, p4, cols=2)


p5 <- ggplot(Couch2Half, aes(x=Run, y= Pace_MPH, color = Program)) + geom_point() +  geom_smooth(method=lm , color="Black", se=TRUE) + labs( x ="Training Session", y = "Pace (Miles per Hour)", title = "Running Pace",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank()) + scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))



p6<- ggplot(Couch2Half, aes(x=Run, y= Distance, color = Program)) + geom_point() +  geom_smooth(method=lm , color="Black", se=TRUE) + labs( x ="Training Session", y = "Distance (Miles)", title = "Running Distance",  subtitle = "Couch to Half Marathon", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank()) + scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))

# Combine plots using multi-plot function:

multiplot( p5, p6, cols=1)


# Summary Statistics of Distance
favstats(Couch2Half$Distance)

# Summary Statistics of Pace
favstats(Couch2Half$Pace_MPH)

# Pearson Product Correlation of Distance over Time (session)
cor.test(Couch2Half$Session, Couch2Half$Distance, method = "pearson")

# Pearson Product Correlation of Pace over Time (session)
cor.test(Couch2Half$Session, Couch2Half$Pace_MPH, method = "pearson")

Running Through the Data: Half Marathon Goal by RunTracker

In the later half of 2020, I set a new goal for myself: run 13.1 miles by the end of the year. Earlier in the year I had completed the couch to 5k program and later set the goal to improve my time to under 30 minutes. Given the extra time at home thanks to a Global Pandemic, I set my sights on the half marathon distance. Since I was already familiar with the RunTracker app, I decided to stick with that and used their “Half Marathon Goal” training plan.

The Runtracker app, made by the Fitness 22 company, features a series of running plans tailored to individuals’ current fitness levels and goals. The “Half Marathon Goal” running plan consisted of four runs per week for a total of twelve weeks, with a consistent structure throughout most of the program. After a series of base runs in the first week, the next ten weeks featured a base run on Tuesdays, segments on Thursdays, intervals on Fridays, and long run on Sundays. Duration of workouts increase steadily over the course of the first ten weeks before tapering in the final two weeks of the program.

My experience with this running plan was great once I got used to the structure. Previously, the most I had run was three days a week, while this program requires four. This means there would be runs on consecutive days, which I was not used to. Having just finished a training plan geared towards speed work, I quickly learned I would need to slow down if I was going to keep from getting inured. Once I got settled into the format, mileage built progressively and speed eventually followed. By the end of the twelve-week program, I was able to confidently run 13.1 miles using my usual training route, which coincidentally looked like a shoe:

Distance & Pace

Since my goal was to complete a half marathon, the primary variable of interest was obviously distance. Like most runners, I also tend to focus on times, so average running pace served as the secondary variable of interest. Distances ran throughout the training program ranged from 2.14 to 13.12 miles per run, with a mean of 4.85 miles per run. Running paces ranged from 5.16 to 6.1 miles per run (11:38 to 9:50 min/mile ), with a mean of 5.54 miles per hour ( 10:50 min / mile). The distributions of my runs by distance and speed for this program can be seen in the density plots below:

Comparing Workouts

When taking a closer look at these distributions by workout type, we can see some clear patterns in the data. Distances for base runs, interval sessions, and segments, remained relatively close to one another, ranging from 2.14 to 6.02 miles per run. The long runs on Sundays though lived up to their name, ranging from 5.7 to 13.12, with an average of 9.16. Running pace for all workout types were somewhat consistent between groups, with each workout type averaging between 5.5 and 5.6 miles per hour. Distributions by workout type for distance and pace can be seen in the box plots below:

Training Progress

Given that there is an ordered component to training, we can look at these data linearly (i.e. regression). Below are scatter plots of distances covered and running speeds over the course of the 46 training runs in the program. We see a slightly positive association with trainings volume (mileage), while intensity (pace) remained relatable stable throughout the training program. When you take a closer look at the distance plot, we can see how the majority of volume is gained in training through the long runs on weekends, which is typical of most long distance training programs:

Cadence & Heart Rate

Two important considerations for runners are heart rate and cadence. When runners let their heart rates get too high, they tire much quicker. So, distance runners constantly work to keep their heart rate down while still running quickly. This can be aided by increasing cadence to the rate of approximately 180 beats per minute. Increasing cadence allows runners to develop better efficiency in their technique – typically by shortening the stride – which over time can lead to a lower heart rate. This translates into better performance with respect to both speed and endurance. In the plot below we can see that both cadence and heart rate are positively associated with running pace, with a clear interaction between these two variables as speed increases, represented by the slopes crossing one another:

Final Thoughts

The “Half Marathon Goal” plan on the RunTracker app is geared towards regular runners who are ready to tackle the 13.1 distance. The training structure consists of three runs per week with a base run, a session of mile repeats, an interval session, and one long run on the weekend. The variety of workouts in the program are designed primarily to build the strength and endurance to run a half marathon, with some speed work included to build anaerobic capacity as well. For anyone who has been running for a while and is ready to tackle longer distances, this program could be an excellent option.

Below are some links related on running a first half marathon, along with the raw data and code used to create the charts and analysis.

Thanks for reading!

Resources & Code

# FRONT MATTTER

### Note: The HM_1.xlxs file will need to be converted to HM_1.csv to read in correctly. Also, all packages can be downloaded using the install.packages() function. This only needs to be done once before loading. 

## clean up (this clears out the previous environment)
ls()

## Load Packages 
library(tidyverse)
library(wordcloud2)
library(mosaic)
library(readxl)
library(hrbrthemes)
library(viridis)

## Likert Data Packages
library(psych)
library(FSA)
library(lattice)
library(boot)
library(likert)

## Grid Extra for Multiplots
library("gridExtra")

## Multiple plot function (just copy paste code)

multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)

  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }

 if (numPlots==1) {
    print(plots[[1]])

  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}


# HALF MARATHON GOAL by RUNTRACKER

## Import data from CSV, no factors

HM_1 <- read.csv("HM_1.csv", stringsAsFactors = FALSE)

HM_1 <- HM_1  %>%
  na.omit()

HM_1 


## Plot 1

p1 <- ggplot(HM_1 , aes(x=Distance)) + 
  geom_density(color="Pink", fill="Pink") + labs( x ="Distance (Miles)", y = "", title = "Running Distances",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(hjust = 1, face = "italic"), 
    axis.text.y=element_blank(),
    axis.ticks.y=element_blank(),
    panel.background = element_blank())


## Plot 2

p2 <- ggplot(HM_1, aes(x=Pace_MPH)) + 
  geom_density(color="light blue", fill="light blue") + 
  labs( x ="Speed (Miles per Hour)", y = "", title = "Running Pace",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(hjust = 1, face = "italic"), 
    axis.text.y=element_blank(),
    axis.ticks.y=element_blank(),
    panel.background = element_blank())


## Combine plots using multi-plot function:

multiplot( p1, p2, cols=1)

## Plot 3

p3 <- ggplot(HM_1 , aes(x= Session, y= Distance)) + geom_point(color="Black") +  geom_smooth(method=lm , color="Red", se=TRUE) + labs(x ="Training Session", y = "Distance (Miles)", title = "Running Distance",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
   theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())

## Plot 4

p4<- ggplot(HM_1 , aes(x=Session, y= Pace_MPH)) + geom_point(color="Black") +  geom_smooth(method=lm , color="Blue", se=TRUE) + labs( x ="Training Session", y = "Speed (Miles per Hour)", title = "Running Pace",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())

## Combine plots using multi-plot function
multiplot( p3, p4, cols=1)

## Summary Statistics of Distance
favstats(HM_1$Distance)

## Summary Statistics of Pace
favstats(HM_1$Pace_MPH)



## Pearson Product Correlation of Distance over Time (session)
cor.test(HM_1$Session, HM_1$Distance, method = "pearson")

## Pearson Product Correlation of Pace over Time (session)
cor.test(HM_1$Session, HM_1$Pace_MPH, method = "pearson")


## Plot
p5 <-  HM_1 %>%
  filter(Workout != "Race") %>%
  ggplot( aes(x=Workout, y= Distance, fill=Workout)) +
  geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="Workout Type", y = "Distance (Miles)", title = "Comparing Distances",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
    scale_fill_brewer(palette="Reds")
  
## Plot
p6  <-  HM_1 %>%
  filter(Workout != "Race") %>%
  ggplot( aes(x=Workout, y= Pace_MPH, fill=Workout)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="Workout Type", y = "Speed (Miles per Hour)", title = "Comparing Paces",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
    scale_fill_brewer(palette="Blues")

## Combine plots using multi-plot function
multiplot( p5, p6, cols=2)

## Plot 7

p7 <- ggplot(HM_1 , aes(x= Cadence, y= Distance)) + geom_point(color="Black") +  geom_smooth(method=lm , color="Red", se=TRUE) + labs(x ="Average Running Cadence", y = "Distance (Miles)", title = "Cadence by Distance",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
   theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())


## Plot 8

p8<- ggplot(HM_1 , aes(x=Cadence, y= Pace_MPH)) + geom_point(color="Black") +  geom_smooth(method=lm , color="Green", se=TRUE) + labs( x ="Average Running Cadence", y = "Speed (Miles per Hour)", title = "Cadence by Pace",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())


## Plot 9

p9 <- ggplot(HM_1 , aes(x= Avg_Heart_Rate, y= Distance)) + geom_point(color="Black") +  geom_smooth(method=lm , color="Blue", se=TRUE) + labs(x ="Average Heart Rate", y = "Distance (Miles)", title = "Heart Rate by Distance",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
   theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())

## Plot 10

p10<- ggplot(HM_1 , aes(x=Avg_Heart_Rate, y= Pace_MPH)) + geom_point(color="Black") +  geom_smooth(method=lm , color="Purple", se=TRUE) + labs( x ="Average Heart Rate", y = "Speed (Miles per Hour)", title = "Heart Rate by Pace",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())

## Combine plots using multi-plot function
multiplot( p7, p8, p9, p10, cols=2)

## Pivot data from wide to long for next chart

HM_1A <- gather(HM_1, Measurement, BPM, Cadence, Avg_Heart_Rate)

HM_1A

## Plot 11

p11<- ggplot(HM_1A , aes(x=Pace_MPH, y= BPM, Color= Measurement)) +
     geom_point() +
     geom_smooth(method = "lm", alpha = .15, aes(fill = Measurement)) + labs(x ="Average Pace (Miles per Hour)", y = "Beats per Minute", title = "Heart Rate & Cadence by Pace",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())

p11

## Plot 12

p12<- ggplot(HM_1A , aes(x=Distance, y= BPM, Color= Measurement)) +
     geom_point() +
     geom_smooth(method = "lm", alpha = .15, aes(fill = Measurement)) + labs( x ="Average Distance in Miles", y = "Beats per Minute", title = "Heart Rate & Cadence by Distance",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())

p12

# Combine plots using multi-plot function
multiplot( p11, p12, cols=1)



## Plot 13
p13 <- ggplot(HM_1A , aes(x = Pace_MPH, y = BPM, color = Measurement) ) +
     geom_point() +
     geom_smooth(method = "lm", alpha = .15, aes(fill = Measurement)) + labs(x ="Average Pace (Miles per Hour)", y = "Beats per Minute", title = "Heart Rate & Cadence by Pace",  subtitle = "Half Marathon Goal by Runtracker", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"))

Running Through the Data: Tempo Run: 5K by Runtracker

In the Summer of 2020, I set a really simple goal for myself: run a 5k under 30 minutes. At the time, I had just completed the couch to 5k (C25K) program and was able to complete the distance in around 32-33 minutes, but couldn’t seem to get much quicker than that and wanted to see if trying a different training plan would help. After some experimenting, I settled on the Tempo Run: 5k Plan on the Runtracker app to help me break the 30-minute mark.

Runtracker is an app made by the Fitness 22 company, featuring a series of running plans tailored to individuals’ current fitness levels and goals. Since I was a runner who could currently run the 5k distance and ran about 3 times a week, the app recommended the “Tempo Run: 5k” plan. This running plan consisted of three runs per week for a total of eight weeks, with the same structure each week. The first run of the week consisted of interval training of various lengths throughout the program, while the second  run of the week was always a tempo run of steadily increasing durations. The third and final run each week was a 35-minute base run at an easy pace. This format remained consistent over the course of all 8 weeks and was built to progressively increase both mileage and intensity throughout.

Tempo Run: 5K Training Plan, by Runtracker

My experience with this running plan was great for a variety of reasons. The most structured kind of running I had done before was the run/walk method used in couch to 5k (C25K). Interval sessions, which included high intensity running, easy pace running, and walking helped build power and figure out pacing. Tempo sessions pushed me to find the gear between interval and easy pace, which helped develop the habit of running the second half of my runs, faster than the first (i.e. “negative splits”). The long easy sessions on the weekends helped build confidence and efficiency. By the end of the program, I had taken minutes off my 5K time and had a way better understanding of pacing, which was the biggest takeaway for me. Many of the things I do now as a runner, mirror the types of workouts I was first introduced to in this app, so this data has been fun to look at a few years removed.  

Training Progress 

To get a better picture of my progress throughout the program, three primary variables came into focus: Pace measured in miles per hour (mph); Distance, measured in miles; and Training Session, numbering 1 to 24 and completed in order. Running paces ranged from 5.09 to 6.58 mph (11:47 min/mile to 9:07 min/mile), with a mean of 5.83 mph ( 10:18 min/mile), while distances ran ranged from 2.4 to 5.43 miles, with a mean of 3.44 miles per run. Since there is an ordered component to these workouts (by session), progress can be visualized through scatter plots. Below, are plots of running distance and pace over the course of the 24 workout sessions. Notice how the spread between data opens up as training progresses, especially with respect to distance ran. This “fanning effect” would normally be problematic in statistics, but for running this is often a desired feature in training: 

Image by Author

Workout Type

As I mentioned above, the biggest takeaway of the program for me was my understanding of pacing. Interval sessions, tempo runs, and base runs, require very different kinds of efforts, all of which can improve performance. Interval sessions remained the most consistent with respect to running pace, but had the largest range and highest average number of miles ran. Tempo runs and base runs remained relatively consistent in terms of mileage, with tempo runs having the widest range along with the highest average running pace. These findings can be better visualized through the box plots below for both paces and distances ran:

Comparing with C25K

In my previous blog post, we went through the data of the C25K program.  Since both of these trainings were focused on the same distance, I thought it would be fun to compare progress side by side on the primary variable of interest, pace. The C25K program had a range of 4.01 to 5.51 mph, with an average of 4.79 mph, while the Tempo Runner program had a range of 5.09 to 6.58 mph, with an average of 5.83 mph. Given that both programs had a sequential component (i.e. “training session”), these data can also be expressed as a regression. Below are box plots of running pace distributions (left) and scatter plots of running pace throughout training (right) for both programs. Notice how the Faster 5K program is noticeably higher on average than the C25K program, while the C25K program has a more positive slope. Since the Couch to 5K programs designed to take runners from sedentary to being able to complete a 3.1 mile run, there is naturally going to be much greater gains (i.e. higher slope) in the beginning, with later improvements occurring more incrementally:

Image by Author

Final Thoughts

The Tempo Runner: 5K plan on the runtracker app is geared towards regular runners who can currently run a 5K and are interested in improving performance. The training stricture consists of three runs per week with one interval session, one tempo run, and one 35-minute steady state run. The variety of workouts in the program are designed to build both aerobic (endurance) and anaerobic (speed) capacity in runners. For anyone who is new to running, or hasn’t had structured training before, this program could be an excellent introduction. 

Below are some links related to improving 5K times, along with the raw data and code used to create the charts and analysis.  If you are interested in my experience with Couch to 5K, you can find that post here and for my first half marathon, you can find that here.

Thanks for reading! 

Resources & Code:

# FRONT MATTTER

### Note: All packages can be downloaded using the install.packages() function. This only needs to be done once before loading. 

# clean up (this clears out the previous environment)
ls()

# Load Packages 
library(tidyverse)
library(wordcloud2)
library(mosaic)
library(readxl)
library(hrbrthemes)
library(viridis)

# Likert Data Packages
library(psych)
library(FSA)
library(lattice)
library(boot)
library(likert)

#install.packages("wordcloud")
library(wordcloud)
library(tm)
library(wordcloud)


# Grid Extra for Multiplots
library("gridExtra")

# Multiple plot function (just copy paste code)

multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)

  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }

 if (numPlots==1) {
    print(plots[[1]])

  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}



# FASTER 5K

# Data Intake

Faster5K <- read.csv("https://raw.githubusercontent.com/scottatchison/The-Data-Runner/master/Faster5k.csv")

Faster5K <- Faster5K %>%
  na.omit()

Faster5K

# Plot 1 - Density Plot of Running Distances

p1 <- ggplot(Faster5K, aes(x=Distance)) + 
  geom_density(color="light blue", fill="Pink") + labs( x ="Distance (Miles)", y = "", title = "Running Distances",  subtitle = "Tempo Run: 5K Training Plan", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(hjust = 1, face = "italic"), 
    axis.text.y=element_blank(),
    axis.ticks.y=element_blank(),
    panel.background = element_blank())

p1

# Plot 1 - Density Plot of of Running Speeds

p2 <- ggplot(Faster5K, aes(x=Pace_MPH)) + 
  geom_density(color="Pink", fill="light blue") + 
  labs( x ="Speed (Miles per Hour)", y = "", title = "Running Speeds",  subtitle = "Tempo Run: 5K Training Plan", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(hjust = 1, face = "italic"), 
    axis.text.y=element_blank(),
    axis.ticks.y=element_blank(),
    panel.background = element_blank())

p2

# Combine plots using multi-plot function:

multiplot( p1, p2, cols=1)

# Plot 3 - Density Plot of of Running Distance over Time

p3 <- ggplot(Faster5K, aes(x= Session, y= Distance)) + geom_point(color="Purple") +  geom_smooth(method=lm , color="Green", se=TRUE) + labs(x ="Training Session", y = "Distance (Miles)", title = "Running Distance",  subtitle = "Tempo Run: 5K Training Plan", caption = "Data source: TheDataRunner.com") +
   theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())

p3

# Plot 4 - Density Plot of of Running Speed over Time

p4<- ggplot(Faster5K, aes(x=Session, y= Pace_MPH)) + geom_point(color="Green") +  geom_smooth(method=lm , color="Purple", se=TRUE) + labs( x ="Training Session", y = "Speed (Miles per Hour)", title = "Running Speed",  subtitle = "Tempo Run: 5K Training Plan", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank())

p4

# Combine plots using multi-plot function
multiplot( p3, p4, cols=1)

# Summary Statistics of Distance
favstats(Faster5K$Distance)

# Summary Statistics of Pace
favstats(Faster5K$Pace_MPH)

# Pearson Product Correlation of Distance over Time (session)
cor.test(Faster5K$Session, Faster5K$Distance, method = "pearson")

# Pearson Product Correlation of Pace over Time (session)
cor.test(Faster5K$Session, Faster5K$Pace_MPH, method = "pearson")


# Pearson Product Correlation of Pace over Time (session)
cor.test(C25K$Session, C25K$Pace_MPH, method = "pearson")

# Simple Linear Model of Pace & Session
Distance <- lm(Distance ~ Session, data = Faster5K)
summary(Distance)

# Simple Linear Model of Pace & Session
Speed <- lm(Pace_MPH ~ Session, data = Faster5K)
summary(Speed)


# Import data from CSV, no factors

Plans_5K <- read.csv("5K_Plans.csv",  stringsAsFactors = FALSE)

Plans_5K

# Plot
p7 <- Faster5K %>%
  ggplot( aes(x=Workout, y= Distance, fill=Workout)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="", y = "Distance (Miles)", title = "Distance by Workout",  subtitle = "Tempo Run: 5K Running Plan", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
    scale_fill_brewer(palette="Greens")
  

# Plot
p8 <- Faster5K %>%
  ggplot( aes(x=Workout, y= Pace_MPH, fill=Workout)) +
  geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="", y = "Speed (Miles per Hour)", title = "Speed by Workout",  subtitle = "Tempo Run: 5K Running Plan", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
    scale_fill_brewer(palette="Purples")


# Combine plots using multi-plot function
multiplot( p7, p8, cols=1)

# Combine plots using multi-plot function
multiplot( p7, p8, cols=2)


# Combine plots using multi-plot function
multiplot( p1, p7, cols=2)


# Combine plots using multi-plot function
multiplot( p2, p8, cols=2)
aggregate(Faster5K$Workout, list(Faster5K$Pace_MPH), FUN=mean) 


# Summarize Mean Distance & Pace by Workout Type
Faster5K  %>%
  group_by(Workout) %>%
  summarise_at(vars(Distance, Pace_MPH), list(Average = mean))

Plans_5K  %>%
  group_by(Program) %>%
  summarise_at(vars(Distance, Pace_MPH), list(Average = mean))

# Plot
p5 <- Plans_5K %>%
  ggplot( aes(x=Program, y= Pace_MPH, fill=Program)) +
  geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="Training Session", y = "Speed (Miles per Hour)", title = "Comparing Paces",  subtitle = "C25K & Tempo Run: 5K Training Plans", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
    scale_fill_brewer(palette="BuPu")

p5

# Plot
p6 <- Plans_5K %>%
  ggplot( aes(x=Program, y= Distance, fill=Program)) +
    geom_boxplot() +
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="Black", size=0.4, alpha=0.9) + 
  labs( x ="Training Session", y = "Distance (Miles)", title = "Comparing Distances",  subtitle = "C25K & Tempo Run: 5K Training Plans", caption = "Data source: TheDataRunner.com") +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank(),
    legend.position = "none") +
    scale_fill_brewer(palette="PRGn")

p6


multiplot( p5, p6, cols=2)

t.test(Pace_MPH ~ Program, data = Plans_5K)

t.test(Distance ~ Program, data = Plans_5K)

# Plot

p10 <- ggplot(Plans_5K, aes(x=Session, y= Pace_MPH, color = Program )) + geom_point() +  geom_smooth(method=lm , se=TRUE,aes(color=Program)) + labs( x ="Training Session", y = "Speed (Miles per Hour)", title = "Pace Through Training",  subtitle = "C25K & Tempo Run: 5K Training Plans", caption = "Data source: TheDataRunner.com") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12), 
    plot.caption = element_text(hjust = 1, face = "italic"),
    panel.background = element_blank()) + 
  scale_color_manual(values=c('blue', 'orange'))+
  theme(legend.position="none")


p10


multiplot( p5, p10, cols=2)