Graphing Ironman Race Data


This past weekend, I was a volunteer helping with an Ironman 70.3 race. I was really impressed and inspired by the athletes. I was also excited about the quantity and variety of data generated by this type of race (compared to a regular marathon). And as a 'graph guy' I just had to try my hand at visualizing that data!

Here's the basic race course. They started early in the morning at Jordan Lake (west of Raleigh), where the athletes swam a 1.2 mile triangle starting and ending at Vista Point. Next, they hopped on their bicycles, and biked down to Shearon Harris Reservoir (where the nuclear power plant is, southwest of Raleigh), then east by Lake Wheeler (which is south of Raleigh), and finally headed north to downtown Raleigh (a total of 56 miles). For the last leg of the Ironman, they ran west from downtown along Hillsborough Street, almost to the edge of town, and then looped back to downtown. They ran that loop twice, racking up 13.1 miles (yes, basically a half-marathon in 90-degree heat, with some gnarly hills along the way).

About a dozen members of our dragon boat paddling team volunteered to help with the race. Here's a picture our club president Alicia took of one of the elite athletes at the front of the pack, running past her station near downtown Raleigh:

Soon after the race, I found the results on the Ironman website. There didn't appear to be an easy way to download it, so I copy-n-pasted all 88 pages into a text file, and then wrote some SAS code to import it. I then created a simple scatter plot of the almost 1,500 who completed the race, to see how their finish times compared. The graph definitely shows that there were a handful of elite athletes who finished well ahead of everyone else (these are at the left side of the graph below).

Next, I wanted to see a little more detail about each racer. I wondered if the times for the swim/bike/run were fairly consistent from athlete to athlete, or if certain athletes were faster in one and slower in another. For this, I used a stacked bar chart. Below is the top portion of the bar chart, to show you how I organized it. I tried to make things very logical, for example with the swim/bike/run bar segments stacked in the order the events occurred (rather than alphabetical order). Click the image below to see the full-size interactive version of my chart -- you can then scroll up/down to see all of the approximately1,500 athletes (the graph is 9000 pixels up/down), and the bars have mouse-over text so you can see each runner's name and time data:

A few things about the data jumped out at me in the graph. As a 'data guy,' and not an Ironman/race guy, I'm not sure whether these things are data problems, or just aspects of the race that I don't understand. Perhaps some of you athletes out there can help with this part! (Feel free to give your thoughts, or provide extra insight, in the comments.)

There are some 'gaps' in the bar chart. For example, there is a gap at overall_rank=60. Does that mean there was nobody in the race with overall_rank=60, or was that person maybe disqualified? I double-checked the Ironman website, and their data jumps from 59 to 61. Here's a screen-capture:

Another 'oddity' -- the swim+bike+run times don't sum up to the 'finish time'. I guess maybe there's some time between these 3 activities, and that time gets counted in their total 'finish time'? Here's the data for the top 10 finishers.

Also, there were a few cases where the bar segments didn't line up consistently with the athletes having a similar overall rank. For example, Jill Ganley's biking segment of the race seems to have been really slow, compared to her swim and run. Perhaps her bike had a mechanical problem, or she had to repair a flat tire? Or perhaps she's just a very fast runner, and a slow biker?

Another example that looks odd in the chart. Richard Holden seems to have completed the running portion of the race much faster than the other runners in the upper-700s overall rank. The Ironman website data says his run time was less than 2 hours, which is as fast as people with overall rank of ~300. Perhaps he's a very fast runner, and slower biker (or had problems with his bike), or maybe there was an error in recording his data?

Anyway, this was fun data to try to visually analyze, and it was an interesting challenge to try to plot all the data so you could see the data for each of the individual ~1,500 runners who finished the race. I used quite a few tricks in my SAS code to get the graph "just so," and here's a link if you'd like to see the code.

Update:

Based on some helpful feedback from actual triathletes, it appears that the table on the Ironman website leaves out the data fields for the 'transition' times (between swimming/biking, and biking/running), when the athletes change shoes, etc. It would be ideal to have the data values for both of the transition times separately, but since that data is not available in the table I have calculated a single value for the total transition time. I added this to the bar chart as a single gray bar segment on the end of each bar.

Here's a snapshot of the top portion of the new/improved graph. You can click it to see the full chart:

This content was reposted from the SAS Learning Post. Go there to view the original.

Robert Allison, The Graph Guy!, SAS

Robert Allison has worked at SAS for more than 20 years and is perhaps the foremost expert in creating custom graphs using SAS/GRAPH. His educational background is in computer science, and he holds a BS, MS, and PhD from North Carolina State University. He is the author of several conference papers, has won a few graphic competitions, and has written a book calledSAS/GRAPH: Beyond the Basics.

What Cities are in Hurricane Irma's Path?

Here's an example of using data and visualization to look at weather -- specifically, the possible path of Hurricane Irma. Does your city need to get ready?

Mapping Out the Next Robot Invasion

Where are all the robots today? Here's a look at a better data visualization to represent where in the US all the robots are.


Re: Swimming
  • 6/30/2017 11:38:00 PM
NO RATINGS

Or we'll already be meauring every activity. Data!!

Re: Swimming
  • 6/19/2017 10:37:28 PM
NO RATINGS

That gives hope to all us non swimmers out there. One of the reasons I have always shied away from any triathlon is my lack of comfort and skills in the water --- that and the grueling exercise of course! But now that we have data proof that the biking and running are long enough to allow participants to make up significant time after dogging it with the doggie paddle.

Re: Swimming
  • 6/16/2017 11:39:20 AM
NO RATINGS

Maybe as we progress in analytics we will have the races set up for better data gathering. I think most will find it useful.

Re: Swimming
  • 6/16/2017 10:36:46 AM
NO RATINGS

And as noted, there's usually going to be a bit of a hiccup in gathering and putting the data together as it was later found "Based on some helpful feedback from actual triathletes, it appears that the table on the Ironman website leaves out......" Those little details can surely mess up our premise if not cause some real confusion in trying to figure out what was really going on. Hopefully we'll catch those before it's too late.

Re: Swimming
  • 6/12/2017 3:19:59 PM
NO RATINGS

I agree. What a great way to find areas for improvement and a way to compare yourself in races.

Swimming
  • 6/12/2017 12:08:08 PM
NO RATINGS

This data format is wonderful for being able to compare Ironman competitors at a glance.

I've known that some people are stronger (or weaker) in one of the events. What I see in this data is that the swim times have enourmous variability 

I would not have expected to see people placing in the top 100 that are 10 minutes behind when they get out of the water after a relatively short swim component.

Scatter plots
  • 6/10/2017 6:41:35 PM
NO RATINGS

Thank You for taking to offer us this nice analysis.

As a 70.3 enthusiast myself it doesn't surprises me to see big relative differences between segments. As I come from a running background with some swim in my early youth, the bike segment for me is always (and by far) the weakest segment.

It would be really nice to see some scatter plots for all athletes: swim x bike, bike x run, swim x run. With those plots it would be easy (and interesting) to spot the oddities and the general trends.

Thanks again

INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +