Saturday, October 13, 2012

Visual Analytics and the Athlete: Part 2

Creating Measures with an Eye Towards the Future

Feedback from the person who read both parts 1a and 1b has been incredible! And he asked an important question.

"Is the Run Score a subjective measure?"

The answer is 'yes'...sort of. It is subjective in so much as it does not exist anywhere else. It was invented. It is not, however, arbitrary. There is a formula behind it, and the formula is an important piece in the evolution of any analytics model. Creating measures is a full time job for many people.  In banking we referred to the people who invented the measures and "quantitative modelers," or quants for short. In baseball you need go no further than sabermetrics and Bill James to see similar, though obviously more highly evolved measures in action. And measures aren't a "do it once, set it and forget it" artifact. You evolve them over time as more inputs which were previously unknown become known. So in the case of run score there is an equation which is based on the non-linear relationship of a person's heart rate to their ground speed. Furthermore it attempts to account for how that changes over the period of the run. The measure itself is in its infancy, but that does not mean it has no value. Because it is an objective calculation, the results are consistent based on the data set provided. As that first data set is actuals, meaning we known what actually happened, we can begin to tune the measure so it gains accuracy.

So the run score is an objective subjective measure. What's that you ask? Why not just plot speed, HR, period, etc...? Well we absolutely could do that, and they do exist in the model which produced the graphs you saw in the earlier posts. However more often than not looking at the individual pieces of data gathered clouds the issue. e.g. You just did a run and had an average pace of 7:30/mile. Is that good or bad? Answer: not enough information. So you see plotting all those individual points creates a lot of "not enough information" scenarios. "But they do have a relationship to each other." Yes, yes they do. And what exactly is the nature of those relationships? Perhaps HR (how hard you ran) is related to speed some how. And time is related. Maybe terrain. So if you look at all those points, all those numbers together maybe there is a pattern. And that pattern, if you could represent it in a number is ... wait for it ... Run Score! Much easier to look at a single visual that conveys the message of the whole data set, than to try and process the whole set at the same time, all the time.

Great so now what? Well I figure most non-geeks stopped reading after part 1b, so I'm going to explore a little more in depth here. The evolution of the measure is such that after you tune it for your actuals, you begin to project it into the future by applying a range of probabilities. This is your predictive model.

 "What? Okay you're losing me again."

Below is a visual for a well known predictive model.
In the graphic you see the line depicting the actual track at of Isaac at the time and then a range of possibilities as to where the storm could end up. If you saw this on a spreadsheet, and no doubt it exists on one somewhere, your eyes would glaze over as there'd be lots of numbers, percents, longitude and latitude, etc... The visual presents it in an easier to understand fashion. It communicates the clear message that you might need to load up the family truckster and get out!

I think I'll stop for now, because we've covered some important stuff. We now understand measures :) and have introduced another key element we don't have, probability. So how do we come up with the probability? There are many ways people do this. I have my own approach and in the case of training data we need to undertake another exercise which can be tedious, un-glamorous, and entirely insightful. We need to gather as much data as we can that you never tracked. In part 3 I'll take you through that exercise, which is very much related to the topic I teased in part 1, "Negative Space" analytics. And note, this part is more art than engineering, which thus makes it my favorite!