Vancouver Canucks rookie Bo “Scorevat” Horvat has scored some goals this season. More goals, even, than people expected. This is making Canucks fans like me excited. And, until recently, Horvat scoring was something that only happened when the Canucks played good games. After Horvat scored in an ignominious loss to Columbus, the Canucks are now 11-1-1 in games where he scores. As goes Horvat, so do the Canucks, apparently. Which would be great news, because, being a 19 year old rookie, he is likely to keep getting better.
But what does the above stat actually mean? Before you scream that you know all about small sample sizes, and how a statistic based on 13 goals is meaningless, let me assure you that that’s not what this will be about. That’s still true, mind, it’s just that it’s not a very interesting thing to bring up any more. Everyone already knows that. I also don’t think Horvat either (a) inspires his team to win or (b) only scores when they’re winning anyway.
One possibility is that this shows that Vancouver wins when it gets secondary scoring. But I don’t think so. The win-loss stats for primary scorer Radim Vrbata, for example (from my count 21-5-2), are only marginally less impressive. These stats come about because the Canucks score more when they win than when they lose () – last time I counted, which was several games back, they have scored 151 goals in games they’ve won compared to 48 in games they’ve lost. This can be thought of as a Monty Hall type problem. The games where a given person (say Bo Horvat) scores are going to be much higher scoring games for the Canucks than “games where a Canuck scores”. Thus the thing we think we are picking at random (Canucks games) is actually something we aren’t picking at random at all. This is the main reason I’m making this post. It seems really cool that there is a real-life version of Monty Hall that pops up in hockey stats.
Is that the only thing we can learn from this stat? And is everything about the timing of Bo Horvat’s (or some other player’s) goals meaningless? I think not, and, especially if you extend your purview to the timing of Bo’s shots, as well, you can potentially learn something. Very shortly before his blog went dark, Tyler Dellow had a post about the “Gazdic index” of particular players, which went like this: Edmonton Oilers enforcer Luke Gazdic sucks as a player, so it would theoretically be really awful to have him play on your NHL team. And it is, but it’s actually not as bad as it might seem, because during all the important parts of the game, Gazdic is stapled to the bench. It’d be a huge handicap to have such a comparatively shitty player be the one on the ice when you’re down by one with 5 minutes left in the third, for example. But he never is, so it doesn’t really matter.
My sense is that Bo Horvat is not like that. He both doesn’t suck, and is trusted by Coach Willie Desjardins. And I think this would be reflected in which proportion of Horvat’s goals and shots come at an important time in the game (i.e. when the game is close). My guess is that this number would be high, and also that a high number correlates with being a good player. There are two sort of related issues here. One is: how much do the coaches trust Horvat. The second: what does he do with that trust. The trust is well represented by the score close ice time in the 2nd and 3rd periods. The quality of play can be given by a shot rating during that time (with all the usual caveats about using raw shot stats). Dividing the two figures will give a rating that has something to do with trustworthiness. There are several ways to do this. One possibility is to score trust as a “perceived clutchiness rating” which looks at how much more/less a player is used in important situations compared to unimportant ones
PCR = [(player score close 2nd & 3rd period ice time)/(overall player ice time)]x[(team score close 2nd & 3rd period ice time)/(overall team icetime)]
For the performance, you can use standard time-weighted shot stats. Shot Differential per 60 minutes is one that I think makes sense overall. You can also separate the offensive and defensive sides of the game, because I think that makes sense. You are not necessarily going to have the same people out protecting a lead as you will pushing for an equalizer. Basically we already have the SA/60 and SF/60 metrics to tell us which players are generating and preventing shots. Dividing SF/60 by PCR we get a rating where a large number indicates a player that is strangely under-used, whereas a small number is one that is over-used. To get a similar rating from SA/60, we need to convert it to a number where a larger denominator is good. For instance by subtracting from the largest SA/60 on the team. To make the two symmetric we can use (SA/60_max – SA/60) and (SF/60-SF/60_min) as the defensive and offensive denominators .
What advantage could such metrics have over other metrics which can tell us similar information? A coach’s defensive trust in a player can be gleaned at from defensive zone start percentage, for example. But not exactly. Say the Sedins start in the offensive zone a very high portion of the time, like they did under Alain Vigneault. Are they being “sheltered” because they are not trusted defensively, or are they put in an offensive-maximizing position because Vigneault needs them to score? We know the answer to that because we’ve watched the Sedins play, but we wouldn’t know just from d-zone start % alone.
Another point is that, though these are “player” metrics, they could actually be useful for finding out about what coaches are doing. For example, are the players with highest defensive PCR the same ones who are the playing the penalty kill? That would indicate a very defensive mindset. Who are the players with the highest PCR in score-tied situations? That can indicate whether the coach is playing aggressively or cautiously.
A final potential of our stat is that you can use perceived clutchiness as a starting point to calculate “actual clutchiness.” That is, take the shot rates of a player during important times of the game (this is necessary because of score effects and late game effects). Of course the whole concept is highly questionable, but the only thing that I can think of that gets at it right now is a stat like “game winning goals.” And that is just about the most useless of useless stats. Apart from being plagued by sample size issues (there is no such thing as “game winning shots attempted”), it is also swamped by comebacks. Say a team goes up 5-2 but the other guys make it back to 5-4. The fifth goal will be “game winning” but was not actually scored in a game-on-the-line situation. Whereas with our metric. This means that if I wanted to, I could actually calculate who on the Canucks, is most statistically clutch. But I refuse to even try because everyone knows Burrows is the clutchest.