## Running Through the Data: C25K

I am not typically one for New Years resolutions, but in 2020 I made a really small one: keeping better track of my activity (using my Apple Watch). I thought that by simply measuring my activity, it might result in an increase overall. I was pretty sedentary at the beginning, but after a few weeks of tracking I saw a noticeable improvement in activity level, and I felt better. So, I decided to see if I could raise the bar a bit by completing the Couch to 5K program, using the C25K  running app.

The C25K running app is based on Josh Clark’s running programs, which scaffolds participants through a series of manageable expectations. The training plan includes 3 runs per week – each between 20 and 30 minutes – with the program lasting 9 weeks in total. The most noticeable feature of the training plan is it combines both running and walking. Over the course of the 27 training runs, the proportion of walking decreases while the proportion of running increases, culminating with three 30 minute runs in the last week of the program.

This was my second time using the C25K program. My first time trying the app, I completed it, generally enjoyed it, and even ran a few 5K’s afterwards. However, I ended up getting hurt / burned out, and within a few years was definitely back to square one. This time, I made my primary goal to stay injury and pain free, so I focused more closely on listening to my body, slowing down, and taking rest as needed. Since I am still running today, I decided to take a look back at those training runs and share the data with anyone who is interested:

## Speed, Distance, & Progress

The two most obvious variables to look at were speed and distance. The distances ran throughout this program ranged from 2.01 to 3.74 miles, with an average of 2.65 miles per run. Running speed ranged from 4.02 (14:55 min/mile) to 5.51 mph (10:53  min/ mile), with an average of 4.79 mph ( 12:31 min/mile). The distributions of my runs by distance and speed for the C25K program can be seen in the density plots below:

Since people are generally more interested in seeing progress, below are scatter plots of distances covered and running speeds over the course of the 27 training runs. At first glance, we see a strong positive association with training volume (mileage) and intensity (speed) throughout the duration of the training program. When you take a closer look at both scatter plots, you see clear cycles ebbing and flowing along the positive slopes. Most training plans are designed to take on this kind of shape, so neither of these results are surprising:

Looking back at the data two years removed, a number of interesting things stand out to me. The first one is how tightly packed, and predictable, the data is. Both speed and distance remain very similar in adjacent runs. This is how the program is designed, and completely makes sense when developing a fitness base. However, most of the training I do now is very different than that. Speed and distance vary widely from run to run, to allow for different kinds of stress and recovery. The second novel finding was how strong the slope was for both variables. When first starting out, the good news is you are probably going to improve very quickly – although it may not feel like it at the time. The longer you run, the rate of improvement slows down considerably. Most of my work now as a runner is built on slow, gradual gains; so improvement like this over this short of a period would put me at risk of injury now. The key difference for me now is I can run much further distances and have a much higher top speed, but the rate of progress is far less noticeable.

## Final Thoughts

With millions of downloads, the C25K app has consistently been one of the most popular training apps for new runners, and for good reason. Based on a series of running plans developed by Josh Clark’s in the 90’s, the C25K training plans are structured to build runners up slowly, using a run / walk method. Whenever I talk with people who are interested in starting a running routine, one of the first things I recommend is they get this app, primarily because it employs the run / walk method. Many people think running should not include walking, or that walking is cheating or a sign weakness. Objectively, it is not. The longer you run, the more important it is that you find your ideal pace, in order to keep your heart down, breathing under control, and good running form. The run / walk method accomplishes this by slowly increasing the proportion of running to walking over time. Also, you would be freaked out by how fast and how far some people who use the run walk method are.

A couple of words of caution about the program though. First and foremost, no one training app is going to fit everyone. Depending on current level of fitness and variety of other factors, the training program may take longer than 9 weeks. One of the most consistent pieces of advice you will find on the C25K program is that you should not be afraid to repeat runs, repeat weeks, or add extra rest if your body needs it. I couldn’t agree with this more. There are a few times when the increase in running volume felt like a lot (week 5, for example), so don’t be scared to slow down or add some extra rest. Definitely don’t skip ahead or run back to back days. The app is built so you will get faster and you will run further as you progress through the program. That’s baked in, but none of that will matter if you get hurt. Increasing speed or volume too quickly is the faster way to injury, but if you listen to your body and aren’t afraid slow down (i.e. walk more), then C25K could be a great way to get started.

Below are some links to C25K reviews, along with the raw data and code used to create the charts and analysis. For my next post, I plan to break down the data for the Faster 5k Training Plan that I used to shave a few minutes off my 5k time by introducing speed work.

Thanks for reading!

### Resources & Code:

C25K Running Data can be found here. The code I used (in R) to create plots and analysis is below:

``````# FRONT MATTTER

### All packages can be downloaded using the install.packages() function. This only needs to be done once before loading.

## Load Packages
library(tidyverse)
library(wordcloud2)
library(mosaic)
library(readxl)

## Grid Extra for Multiplots
library("gridExtra")

## Multiple plot function
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
library(grid)

# Make a list from the ... arguments and plotlist
plots <- c(list(...), plotlist)

numPlots = length(plots)

# If layout is NULL, then use 'cols' to determine layout
if (is.null(layout)) {
# Make the panel
# ncol: Number of columns of plots
# nrow: Number of rows needed, calculated from # of cols
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}

if (numPlots==1) {
print(plots[[1]])

} else {
# Set up the page
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

# Make each plot, in the correct location
for (i in 1:numPlots) {
# Get the i,j matrix positions of the regions that contain this subplot
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

print(plots[[i]], vp = viewport(layout.pos.row = matchidx\$row,
layout.pos.col = matchidx\$col))
}
}
}

# COUCH TO 5K

# Data Intake

C25K<- read.csv("https://raw.githubusercontent.com/scottatchison/The-Data-Runner/8c1162e60a0c3af4e900ed38c222304da1542cb9/Half_1_2.csv")

C25K

## Plot 1 - Density Plot of Running Distances

p1 <- ggplot(C25K, aes(x=Distance)) +
geom_density(color="Green", fill="Purple") + labs( x ="Distance (Miles)", y = "", title = "Distribution of Running Distances",  subtitle = "Couch to 5K Training Plan", caption = "Data source: TheDataRunner.com") +
theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 14),
plot.caption = element_text(hjust = 1, face = "italic"),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background = element_blank())

## Plot 1 - Density Plot of of Running Speeds

p2 <- ggplot(C25K, aes(x=Pace_MPH)) +
geom_density(color="Purple", fill="Green") +
labs( x ="Speed (Miles per Hour)", y = "", title = "Distribution of Running Speeds",  subtitle = "Couch to 5K Training Plan", caption = "Data source: TheDataRunner.com") +
theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 14),
plot.caption = element_text(hjust = 1, face = "italic"),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background = element_blank())

## Combine plots using multi-plot function:

multiplot( p1, p2, cols=1)

## Plot 3 - Density Plot of of Running Distance over Time

p3 <- ggplot(C25K, aes(x=Session, y= Distance)) + geom_point(color="blue") +  geom_smooth(method=lm , color="red", se=TRUE) + labs(x ="Training Session", y = "Distance (Miles)", title = "Progression of Running Distance",  subtitle = "Couch to 5K Training Plan", caption = "Data source: TheDataRunner.com") +
theme(
plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 14),
plot.caption = element_text(hjust = 1, face = "italic"),
panel.background = element_blank())

## Plot 4 - Density Plot of of Running Speed over Time

p4<- ggplot(C25K, aes(x=Session, y= Pace_MPH)) + geom_point(color="red") +  geom_smooth(method=lm , color="blue", se=TRUE) + labs( x ="Training Session", y = "Speed (Miles per Hour)", title = "Progression of Running Speed",  subtitle = "Couch to 5K Training Plan", caption = "Data source: TheDataRunner.com") +
theme(
plot.title = element_text(hjust = 0.5, size = 20, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 14),
plot.caption = element_text(hjust = 1, face = "italic"),
panel.background = element_blank())

## Combine plots using multi-plot function
multiplot( p3, p4, cols=1)

## Summary Statistics of Distance
favstats(C25K\$Distance)

# Summary Statistics of Pace
favstats(C25K\$Pace_MPH)

# Pearson Product Correlation of Distance over Time (session)
cor.test(C25K\$Session, C25K\$Distance, method = "pearson")

# Pearson Product Correlation of Pace over Time (session)
cor.test(C25K\$Session, C25K\$Pace_MPH, method = "pearson")

# Simple Linear Model of Pace & Session
Distance <- lm(Distance ~ Session, data =C25K)
summary(Distance)

# Simple Linear Model of Pace & Session
Speed <- lm(Pace_MPH ~ Session, data =C25K)
summary(Speed)
``````