ARTICLE: Data Mining for runners–Intro

I invested some of my injury time filing all the data for all the races and time trials I have done in my life. The end result is a file I called “Race Statistics”, which you can feel free to steal. If you have a bit of general Excel knowledge, you’ll be able to adapt the file for your own purposes.

Purpose

The purpose behind the file is to allow for informed evaluation of improvement year-on-year and objective, rather than subjective, post-race evaluation. We have evolved to be pattern-seeking mammals but unfortunately our perceptions are prone to bias and seeing correlations where none exist.

Over time we developed one tool to allow us to make truly objective observations: the scientific method which ushered in the era of progress, we enjoy today. To employ the scientific method you need reliable data and you need to know what exactly you wish to measure and how.

Sampling

I found myself with 107 samples of races and time trials, primarily from my active running years (2007-2010). Luckily these are one-hundred percent of all races I have done and with your trusty Garmin you’ll never lack accurate data.

I was primarily interested in race performance so began by looking at the standard measurables that describe a race such as time, overall position, % of winner’s time, pace, distance, average heart rate, ascent etc.

Secondly, I wished to be able to “stratify” my race data. This is another statistical practice used when subpopulations within the data are very different. Given that I run on many different terrains and that a hill race is hard to compare to a track race, I assigned each race a “Type” with a value of Cross, Hill, Road, Track or Trail.

I also assigned a “League” as the leagues and championships in hill races feature races of similar character and it’s useful to be able to look at each one separately.

Enriching the data

What more did I want to know? Well, I was interested in capturing any factor that could have influenced the outcome of any individual race, so I added three “flags“ to each race observation: Was I lost during the race? Was I injured during the race and/or was I sick during the race?

Finally, I wanted to gain a greater understanding of my current ability to judge race performances subjectively by inserting another “flag” for each race: Was I happy after? From this I hoped to determine if I already possessed realistic judgement and how much I’d likely need to lean on the objective data in the future.

Data gathering

Next I just had to populate all the data I had from Garmin recordings, race results from websites and my Excel trackers over the years and the reporting was ready to go. From the raw data, I can generate any type of pivot report I want, for instance, I can track my progression in 5k races or I can compare my percentage of winning time in the Leinster League year-on-year or my performances on the roads versus those on the track or in the hills.

So join me in the next instalment, when I will look show examples of the sort of conclusions you can pull from the data and how it compares with my own subjective judgements back when I originally reacted to each race.

Comments