Tuesday, October 23, 2012

Visual Analytics and the Athlete: Part 3

"Sometimes you learn more from what you don't know than what you do." - Another anonymous comment.

So now you've got the data. You're doing all your weekly workouts. Weight is good. Eating is good. The "A" race comes up and boom...you come up short. What happened? Let's go back to the very first charts from the very first post.
What do we know? We know that in the past there were two defined peaks, both of which correlate to a race. We know that from the first to the second there is a rapid drop and climb, some somewhat jagged times, followed by a leveling and a climb to another peak. i.e. Following Lake Placid in 2009 I shut it all down for a time to give some attention to my life. 2010 was defined as a "bridge" year to get to 2011 without a significant degradation in ability. And 2011 was the build up to Coeur D'Alene. And if that's all you got, that is actually a pretty fair interpretation. But let's dive a little more into the weeds which means discussing the negative space, i.e. the stuff we don't see and how it influences the chart as much as the training which we've quantified and recorded.

See that climb and baby peak just before Lake Placid followed by the immediate retreat? That's Mooseman 2009, arguably one of my best races ever. The lead up to that race was very trying emotionally and again following it my family life came under such duress and the very thought of competing in Lake Placid seemed unrealistic. I won't rehash it here, but it is documented elsewhere in this blog. Anyway once it was determined LP was a go there's the final push to the race and race day itself. So the chart has the steep slope not because the process of peaking for the race brought about some great physiological improvement, but because the data leading up to the race was fairly inconsistent and spotty. Life didn't normalize much until the middle of 2010. With the exception of one dip which I believe was related to late summer, kids, family, time off, etc... The lines actually start to track on a fairly consistent pattern to Coeur D'Alene with a climb into race day which is supported more by peaking than anything else.

The point of all this is that forces outside of the data and outside of the graph are influencing what is in the graph in the first place. It's based on the premise that a type-A athlete, and if you're going through this stuff you're type-A, will perform precisely on plan if there are no other "interruptions" from life. So when we see that they are not tracking to plan, we can assume something is going on. The components of the negative space, the items that are not cast on paper or on a computer screen, actually contribute to the uniqueness of each athlete's graph as much as does their training plan. As a coach or someone evaluating the visual, the mere existence of randomness in the patterns is an indication that something might be going on with the athlete, so maybe it's worth a follow-up.

"But if this other data is so important, does that mean we should stop recording our current data?" No. But if feasible you should find a way to record these other events. This may come in the form of a training log, comments, a blog, etc... And yes to build a predictive model we will need to somehow quantify these outside influences, but that is a topic for the next post.

Saturday, October 13, 2012

Visual Analytics and the Athlete: Part 2

Creating Measures with an Eye Towards the Future

Feedback from the person who read both parts 1a and 1b has been incredible! And he asked an important question.

"Is the Run Score a subjective measure?"

The answer is 'yes'...sort of. It is subjective in so much as it does not exist anywhere else. It was invented. It is not, however, arbitrary. There is a formula behind it, and the formula is an important piece in the evolution of any analytics model. Creating measures is a full time job for many people.  In banking we referred to the people who invented the measures and "quantitative modelers," or quants for short. In baseball you need go no further than sabermetrics and Bill James to see similar, though obviously more highly evolved measures in action. And measures aren't a "do it once, set it and forget it" artifact. You evolve them over time as more inputs which were previously unknown become known. So in the case of run score there is an equation which is based on the non-linear relationship of a person's heart rate to their ground speed. Furthermore it attempts to account for how that changes over the period of the run. The measure itself is in its infancy, but that does not mean it has no value. Because it is an objective calculation, the results are consistent based on the data set provided. As that first data set is actuals, meaning we known what actually happened, we can begin to tune the measure so it gains accuracy.

So the run score is an objective subjective measure. What's that you ask? Why not just plot speed, HR, period, etc...? Well we absolutely could do that, and they do exist in the model which produced the graphs you saw in the earlier posts. However more often than not looking at the individual pieces of data gathered clouds the issue. e.g. You just did a run and had an average pace of 7:30/mile. Is that good or bad? Answer: not enough information. So you see plotting all those individual points creates a lot of "not enough information" scenarios. "But they do have a relationship to each other." Yes, yes they do. And what exactly is the nature of those relationships? Perhaps HR (how hard you ran) is related to speed some how. And time is related. Maybe terrain. So if you look at all those points, all those numbers together maybe there is a pattern. And that pattern, if you could represent it in a number is ... wait for it ... Run Score! Much easier to look at a single visual that conveys the message of the whole data set, than to try and process the whole set at the same time, all the time.

Great so now what? Well I figure most non-geeks stopped reading after part 1b, so I'm going to explore a little more in depth here. The evolution of the measure is such that after you tune it for your actuals, you begin to project it into the future by applying a range of probabilities. This is your predictive model.

 "What? Okay you're losing me again."

Below is a visual for a well known predictive model.
In the graphic you see the line depicting the actual track at of Isaac at the time and then a range of possibilities as to where the storm could end up. If you saw this on a spreadsheet, and no doubt it exists on one somewhere, your eyes would glaze over as there'd be lots of numbers, percents, longitude and latitude, etc... The visual presents it in an easier to understand fashion. It communicates the clear message that you might need to load up the family truckster and get out!

I think I'll stop for now, because we've covered some important stuff. We now understand measures :) and have introduced another key element we don't have, probability. So how do we come up with the probability? There are many ways people do this. I have my own approach and in the case of training data we need to undertake another exercise which can be tedious, un-glamorous, and entirely insightful. We need to gather as much data as we can that you never tracked. In part 3 I'll take you through that exercise, which is very much related to the topic I teased in part 1, "Negative Space" analytics. And note, this part is more art than engineering, which thus makes it my favorite!

Monday, October 08, 2012

Visual Analytics and the Athlete: Part 1B

"What do the pictures mean?!"

Yes I understand I didn't do "the great reveal" in the last post. It was on purpose. My first step to analyzing visuals, as I mentioned previously, is to do so in a vacuum, try to do as much as you can to keep preconceived ideas from skewing the analysis. For the record I actually had Leanna look at the 2nd chart and tell me what she saw. She immediately noticed the two locations where the blue line crossed the tan line on an upswing. These two peaks are also visible in chart 3 which is the same data spread over months (the previous chart is a quarterly spread). So we have identified two points of interest, so what are they? Well I've already let you know that the category line, the 'X' axis, is time. Those two peaks are July 26, 2009 and June 26, 2011. The measures being illustrated are my run speed and a "run score", a calculated measure I produced to be able to "grade" very different types of runs. In both cases the score, which typically tracks just below the run speed, jumps distinctly above the speed line. The significance of those two dates is that they are both race dates, Ironman Lake Placid 2009, and Ironman Coeur D'Alene 2011. It's not entirely unexpected that this would happen as the run score is weighted towards longer runs (I am an endurance athlete after all!), but this also corresponds with how I felt about the runs. The races were not perfect, but I felt like I ran them really well.


Friday, October 05, 2012

Visual Analytics and the Athlete: Part 1


"There is no failure. Only feedback. "
~ Robert Allen

Athletes are human beings. Human beings are physiologically, mentally, and emotionally creatures of habit. We typically gravitate towards those things we do well, especially during times of stress or failure in other areas of our lives. So what if you could somehow capture these retreats to familiar ground over time in a picture? Maybe the picture would be an affirmation. Or maybe it would indicate a destructive tendency.

People think of analytics simply as a way of validating, or “scoring”, a current course of action, i.e. what went right or wrong and am I doing better or worse. And while they are very capable, you quickly discover the usefulness of this type of scoring is limited, because more often than not you create a picture that you already know, especially if it is simply a picture of what you are doing right now. Visualizations, especially those that span time, extend analytics allowing you to see your habits, both known and more importantly unknown. And these habits are frequently the things that prevent us from reaching a new level of performance. Let’s face it if what we did over and over and over were really working, we wouldn’t still be searching for ways to improve! “But these things worked for me before?” Yes, they did. And when they worked they were reasonably if not entirely new to you. At that point in time they were an unfamiliar stressor on your mind, body, or soul. And you adapted to handle them. And those adaptions led you to some success. And you did it again. And maybe had a bit more success, though not quite as striking a change as before. And again, and less striking success. You see as you adapt those stressors become the norm. They become “comfortable,” a refuge.

Visual Analytics Basics: I know how you THINK you did, but what do your EYES tell you?

So as an analyst what am I looking for? Well I’ll start by telling you what I’m not looking for…counts, absolute numbers of any kind (total miles, total hours, etc…). While they are usually the easiest measures to capture, they tell you so little really, especially if you are trying to normalize your findings across multiple datasets (e.g. compare two or more athletes). Sure they might relevant to each individual, but they don’t explain much. Changes, slopes, ratios, area under the curve or between two curves…those tell you more. Wow, this sounds a whole lot like geometry and calculus! Yes…yes it does. And that is why we use visuals, because many times our eyes can tell us the story without having to write the equation. And for the mathematically inclined, the visuals show you which equations to write!

Because we humans are creatures of habits, I look at these visuals for patterns, both repeating and original; consistency and breaks, expected and especially unexpected. At the bottom you’ll see 4 line charts. These basic graphs are all the same data set. I’m using a technique I’ve really come to appreciate, masking all series, labels, and legends. Sure I know which measures were put on the chart, but I don’t know which line is which. I also know the full date range used, but cannot see which day is which. Finally the charts follow a traverse down my time “dimension”, i.e. to finer and finer slices of time. The first is a yearly view (3 yrs). Next you see the quarterly view of the same data. We move then to monthly, and then finally daily (which is only a partial chart). Traversing the time state is useful in finding the right amount of detail to tell you the story but not confuse things with outliers.

Okay, anyone still awake? No? Well, if you care to learn more, stay tuned for Part 2 where I get crazy with more detail about creating relevant measures, and perhaps delve into “Negative Space” analytics (my term so don’t bother looking it up!).