Wildflower Long Course 2008: Using Tableau to Analyze the Results of a Half-Ironman Triathlon
Posted by Jeff Solomon on August 5, 2008
If you've ever competed in a triathlon and wondered how you did beyond your overall time and place, this blog post is for you. I've taken the results of the 2008 Wildflower Long Course triathlon and dumped them into Tableau. Let's see what we find.
The Wildflower Long Course Triathlon is a half-ironman triathlon (1.2M swim, 56M bike, 13.1M run) held each year in Lake San Antonio, CA. The 2008 race had 1637 finishers (1226 male - including me, 411 female) broken down in the following age groups:
The spread of men vs. women and their age group break down is typical of a triathlon.
I downloaded a full spreadsheet of race results from the race website. This spreadsheet includes every racer's finishing time, gender, age group, swim time, t1 time, bike time, t2 time, and run time. I then used Tableau to generate additional results and put everything into this packaged workbook. You'll need Tableau Desktop 4.0 or higher, or the free Tableau Reader if you would like to play with the data yourself.
The Field's Performance at a Glance
The first question to ask about the race is how did the field do in each leg of the race compared to how they did overall? This question is answered below in a stacked bar chart.
The view is made up of 1600+ colums, where the height of each column represents the finishing time of each racer. Additionally, the column's color shows how much of their race they spent swimming, biking, running and transitioning. What immediately jumps out is that swim performance has such little bearing on overall performance. It would be interesting to see a similar graph for a full ironman or an olympic distance event to see how much the swim portion matters in those races.
Notice also the occassional bars of green that shoot down into blue. That is a racer who went relatively much faster on their bike leg versus their run compared to the average time of the rest of the field. Or said another way, it's a racer who went too hard on the bike and blew up on the run.
Now let's look at a different view of the same data.
Each mark in the view is a racer's time in a particular leg given their overall finishing place. Each racer is represented by three marks showing their time in each leg, which line up vertically depending on their finishing place. Trend lines are enabled (per color) to show average expected performance given final overall place. This view works because the range of expected finishing times for each sport is disjoint.
The conclusion I draw form this view is that the cluster of times of each sport is tighter the higher your overall place, which means that as you get faster in the sport, you must be fast in all three events to place high overall.
Individual Racer Performace at a Glance
The next interesting question is to ask is how did an individual racer do in each leg given their overall finishing place? For example, was their swim extremely fast compared to their run? We can answer that question by using the same view that we used above, but this time, we'll highlight the performance of myself and five members of my triathlon team. If a time in an event is below the trend line, then that time is fast compared to the expected finish, and vice versa. The goal of every racer would be to have their leg time exactly on the trend line, which would mean that they did equally well in all three sports.
Notice that I scaled the y-axis on all five plots to encompass the range of times for that event.
The first thing to notice is that bike leg performance is a very good indicator of overall performance, which makes sense given that the bike leg is the longest. Look at the orange mark, a teammate who finished about 280th overall. It is obvious from the graph that his swim time is very slow compared to his bike and run times, although his bike and run times aren't that much faster than the average given his place, which reinforces the idea that swim performance has the least effect on overall performance. This visualization gives the viewer an instananeous assessment of relative performance of each leg.
Finally, our coach can be happy with our team's performance in transition. Besides a slow T1 for my teammate represented by the green mark, we all have average to very fast transition times, which many coaches see as "free speed."
Relative Gender Performance
Next we ask the question, how do male overall finishing times compare against female finishing times? To show this, I plotted overall finish times versus percentage of gender place so we could compare, for example, the time difference between the 50th percentile male and 50th percentile female finishers. This is below.
Males are consistently about 35 minutes faster than the corresponding female finisher, regardless of their finishing place. Next, I plotted how the genders did in each leg and put the results into a Tableau dashboard.
The dashboard demonstrates that men and women are most closely matched in the swim and farthest apart on the bike. This result is intuitive, but the view shows it explicitly. Even though the time scales are different for each leg, since the time axis covers the same percentage range, it's accurate to compare the distance between the gender curves for the different sports.
Relative Age Group Performance
The last question I asked is how does age effect race performance. The conventional wisdom is that once a triathlete turns 45, their performance severely degrades. Is that true? Let's see. Graphs for male and female race preformance broken down by age group are below.
For males, it's good to be young. Males 20-24 are the fastest age group. Then there is a tight clustering of males between 25 and 44 although performance does degrade slightly with age. Then performance drops off steadily between the ages of 45 and 60, but doesn't really fall off of a cliff until age 60 and above. It's good news to know that you can be fast until relatively late in life.
For females, it's really good to be young. There is a big gap in performance between females 20-24 and everyone else. Starting at age 25 until 54, there is a steady drop in performance, but no performance cliff. Not enough women over 54 competed to add their results to the graph.
Conclusion
I hope you have found looking at triathlon results in Tableau as much fun as I have. Each one of the views showed a different aspect of an entire database full of results in a single picture. That's what visual analysis is all about.
If you have any ideas of other ways to look at the results, I'd love to hear them. Please leave me a comment below. Happy triathloning!
Can You Improve this Graph?
Posted by Robert Morton on July 31, 2008
One of the blogs I read regularly is Flowing Data, which discusses effective visualization techniques for making sense of data. A recurring topic is a challenge to the readers: can you improve this graph?
The most recent challenge at Flowing Data is a graph that attempts to demonstrate a correlation between suicide rates and unemployment levels in Japan. Nathan identifies some areas for improvement and links to the source data, which I've used to build a Tableau visualization. You can see my results in the attached image.
The first step I took was to transpose the row/column orientation of the Excel file, and then connect to it with Tableau. Both the "Unemployment Rate" and "Suicide Rate" have missing data points, which were fairly straightforward to resolve. In the former case, I converted "Unemployment Rate" to a numeric Measure instead of a textual Dimension, and then filtered the data to start at the year 1980. I created a simple line graph to show the unemployment rate against time, and used "Suicide Rate" to control the width of the line. To fill in the missing data points, I used a Table Calculation in Tableau to make a moving window for the suicide rate, averaging up to two data points within +/- 4 years.
I've attached a Tableau 4.0 Packaged Workbook for Beta users to explore. One week from today we release Tableau 4.0, and you will be able to download the free trial if you're interested in exploring Tableau Desktop!
EagerEyes Blog Lists Top 10 Information Visualization Influences on Tableau’s Jock Mackinlay
Posted by Elissa Fink on July 30, 2008I love it when people much more knowledgeable than I give me inside views into what shaped them. It’s fascinating to see what influenced them as they developed into the industry experts we know. EagerEyes.org, a terrific blog with lots of resources on information visualization, recently asked Tableau’s own Dr. Jock Mackinlay to name his top 10 influences.
Jock responded with a thoughtful list of 10 books, theses and articles that shaped him even as he helped found and define the modern information visualization industry.
I was particularly moved by Jock’s description of how he discovered Jacques Bertin’s book The Semiology of Graphics (1967). He describes how he saw a copy in French and knew it was an important book even though he doesn’t read French! When he later discovered the English translation as a Ph.D. candidate, he knew he had to have it despite his budget constraints.
Jock’s list contains sources you may expect (like Tufte) but others I wasn’t at all familiar with. So check out the entry on EagerEyes blog. By the way, you’ll also find Pat Hanrahan’s list of top 10 influences there as well.
Learning from Tableau, and from you
Posted by Raif Majeed on July 23, 2008This old Tableau dog learned a few new tricks over the past couple of days. And most of it was not from fellow Tableau employees.
Here's a sampling from the Tableau presentations I saw. (I'll post links and further instructions and screenshots, as they become available, to show you how to do these things.)
* From Ross Bunker's advanced calculations talk, I learned about Pareto analysis.
* From a couple of the advanced talks (including Ross's and Marc Reuter's), I learned how to introduce a rank to sorted fields.
* From Michael Drumheller, I learned about the F-statistic and its importance for determining whether categorical factors are significant in a trend model.
* From Ty Alevizos, I learned that you can make box plots in Tableau!
* From Dan Jewett, I learned a number of things about the server experience, but most particularly the simple pleasure of reordering your favorites. (Note: This is a 4.0 feature, still in beta.)
These were all pretty cool, but the best part was learning from you, the customers of Tableau. From pharmaceuticals to space science to beverages to education to government, you welcomed us into your industries for a couple of short days and gave us a tremendous number of new ideas for the future, while showing us what you've managed to do with Tableau. The interactions with you all were simply amazing.
More to come -- but for now, after a couple of very intense days, I really need some BBQ....
Online videos of Stephen Few and Pat Hanrahan keynotes at Tuesday's Tableau Conference
Posted by Elissa Fink on July 23, 2008Check out our on-the-spot video recordings of keynote speakers from Day Two (Tuesday, July 22) of Tableau's first ever Customer Conference. They include videos of Stephen Few and Pat Hanrahan.
+ Stephen Few of Perceptual Edge kicked us off in the morning with a talk about "Now You See It". His talk was smart, inspriational, aspirational, funny and thoughtfully reflective. Overall, a rousing hit.
+ Right after that, Steve along with Tableau's Jock Mackinlay and Erin Easter did an interactive "Extreme Makeover: Viz Edition". We asked conference attendees to submit their visualizations and Steve, Jock and Erin reacted real-time with positive comments and suggestions for improvements.
+ Stanford Professor and Tableau CTO Pat Hanrahan closed the official part of our conference with an inspiring and thoughtful talk that described how a picture can be worth 10,000 words.
Online videos of Monday's keynotes - nearly live and in person
Posted by Elissa Fink on July 22, 2008Check out our on-the-spot video recordings of keynote speakers from Day One (Monday, July 20)...
+ Christian Chabot assesses where analytics has been and is going, highlights key capabilities of Tableau 4.0 and shares his vision of Tableau's future. Watch the video - about 45 minutes.
+ Chris Stolte, along with 3 of Tableau's key developers, showcases and demos the new capablities of Tableau 4.0 real-time. Watch the video - about 45 minutes.
We'll post more videos from Day Two's general sessions tomorrow.
Understanding Surveys using Highlight Tables
Posted by Robert Morton on July 22, 2008Surveys on topics such as customer satisfaction are rich with qualitative data, but analysis often requires quantitative comparisons, aggregation, etcetera. Steve Wexler, Director of Research at the eLearning Guild, discusses how some straightforward techniques in Tableau lead to "visualizations that people can grok from the back of a conference room."
The first samples of customer survey results that Steve demonstrated used stacked bar charts to reveal the proportion of responses in each of the categories, such as "Disagree" or "Strongly Agree". Sorting the data on one of the categories could reveal strong customer preference across a range of products, for example. However overall customer satisfaction should take into account all categories, and stacked bar charts made the remaining categories difficult to visually compare. The root of this problem is the qualitative nature of the data.
Steve tackled this challenge by building calculated fields to convert responses into quantitative values on a 0-5 scale, known as a Likert scale. This allows for aggregation such as averages to measure responses across a range of questions. To demonstrate this, Steve showed results from the eLearning Guild research which compared corporate plans / progress in developing mobile learning platforms versus these corporations' opinions of the value of mobile learning.
To render this in a way that could be grokked from a distance, Steve turned to a visualization known as highlight tables. Like heatmaps applied to textual tables, highlight tables quickly identify the magnitude of values in each cell. While one dimension separates responses to each question, the second dimension of the table could be used in many ways: for example to aggregate responses by company size, by mobile platform preference, or by the company's plans for mobile learning. In interacting with the audience, Steve demonstrated that the survey response trends and correlations are clearly visible.
Raising the Bar: Team-Oriented Brainstorming with Tableau
Posted by Robert Morton on July 22, 2008"Insight brings value out of data," explains George Smirnoff. As Managing Director at Trexin Consulting, George enlists a multidisciplinary staff for collaborative root-cause analysis. Tableau is well suited for the iterative process of developing insight, and is an exciting centerpiece of his teams' dynamic brainstorming sessions.
His teams have investigated unexplained revenue shortfalls, examined securities fraud and shored up corporate defense strategies. This often revolves around outlier analysis: when slicing data in different ways, the same trend becomes apparent. Using simple rules to segment the data helps isolate outliers by establishing conditions which become red flags in combination.
George's presentation was exciting and fast-paced, drawing in excellent questions from the audience. He paced the presentation with some humor as well, making a dig at lawyers' superiority complex: "they are looking at this awesome Tableau stuff and acting like it's normal!" For some time afterward, a number of clustered conversations lingered in the room; customers with similar backgrounds were discussing their successes with Tableau in their own interactive settings. One customer encouraged another to "go for it" in an upcoming, high-profile storytelling meeting - take Tableau to the CEO, and leave PowerPoint as an afterthought.
Some key takeaways for successful collaborative sessions with Tableau:
- Accept that you will find some erroneous data, and resolve to address it as soon as possible.
- Explain the nature of the meeting in advance to the participants: it is a problem-solving session, not a blame-finding one. Defensiveness can completely kill the productivity of these sessions.
- Explore the data in different ways to find recurring outliers. Beware of sampling issues, since aggregations of low-count data will have a high margin of error.
Liveblogging the Tableau Customer conference
Posted by Robert Morton on July 21, 2008Welcome to the first-ever Tableau Customer Conference! Whether you are joining us in person or following this event remotely, follow our blog posts as Tableau employees liveblog this exciting event across a variety of social networks.
Several Tableau employees will liveblog on their favorite social networks, ranging from LinkedIn and Facebook to Twitter and Flickr. Search for the Social Media Tag #TableauConf08 (Twitter, Ustream.tv, YouTube and Flickr), or follow the links below:
Tableau Software Blog
Twitter
LinkedIn
Facebook
While you are visiting these groups feel free to link, join, friend or subscribe to keep up to date. And if you're sharing your experiences from the conference, be sure to include the Social Media Tag #TableauConf08.
Battling Anecdote with Analysis using Tableau
Posted by Robert Morton on July 21, 2008Jon Nakamoto, M.D., Ph.D. describes a key challenge of his job as battling the anecdotes and innuendos that fuel his customers' stereotypes about inherent inefficiency. As Managing Director at the Quest Diagnostics Nichols Institute, his successes with Tableau have inspired a sense of trust and partnership with his very demanding customers.
Dr. Nakamoto prefers analysis over anecdote, and true to form much of his presentation involved live storytelling with Tableau! He demonstrated how Quest helps customers identify operational efficiency problems, for example with medical test turnaround-time, whose solutions are often surprisingly simple. Dr. Nakamoto showed how a simple day-of-week view revealed a customer's peak volume each Tuesday for medical test processing, elucidating the need for increased staffing. Exploring a bit deeper, he examined time-of-day for test processing to reveal that much of Tuesday's volume came from sample collection late in the day on Monday, when it could reasonably be pulled in earlier on that day.
Typically within a week of customer meetings they have demonstrated substantial improvements to their processes, and within four weeks they have completed their implementation. Quest's presentations to their customers are a lot like Dr. Nakamoto's presentation today: using Tableau for a live analysis of their data allows immediate answers to questions, fueling decisive and effective meetings.
What is Dr. Nakamoto looking forward to most in the conference? He expects the hands-on training to help break some of his bad habits, or "Tableau ruts", for performing analysis in Tableau in a rigid fashion as one would with Excel. Tableau's ease of non-linear exploration allows for successively better insight, as each answer prompts new questions and deeper understanding of the data.