Are Usage Rate and Involvement Rate good metrics for football?
It’s time for November’s Get Your Stats Right, and this month Ben has actually gone above and beyond. He’s adapted an NBA metric to basically invent a whole new data point for football. And it’s exclusively being premiered right here on UTAS.com. He’s then looked at the numbers for this new data point across the league, the U’s specifically, and our upcoming opponents.
Read on to find out more, and let us know in the comments what you make of it.
This season these pieces are only for Coconut Tier members. You can sign up below for £5 a month and get all of Ben’s pieces, as well as other weekly articles, extra exclusive monthly pods, and first dibs on upcoming merch releases 🥥
A couple of weeks ago, Jules from UTAS sent me avideo that took inspiration from the NBA to try and apply a certain metric to football data. They tried to take NBA’s “usage rate” and create it for football. It’s a very interesting video so be sure to check it out!
In the NBA world, usage rate is a key metric to see roughly how central a player might be to their team’s play. It’s calculated as the proportion of possession-ending actions a player makes relative to all of their team’s possessions (such as a shot, a turnover, or winning a foul) when that player was on the court. In basketball, this is a pretty good metric since there are loads of possessions and shots per game.
But in football, there are fewer possessions and, importantly, far fewer shots. And another complicating factor is that it can be difficult to get access to data with possession chains included. The video attempts to address this by generalising aggregate data (shots, incomplete passes, times dispossessed, miscontrols, and failed dribbles) and turn that into a Per 90 Minutes number. They then divide each player’s number by the sum of all of these actions their team makes per game.
However, that is very different from the usage rate in basketball. It’s not “bad” at all, it’s a very interesting metric, but it’s no longer NBA’s usage rate. A key difference is the lack of possessions and knowing how many possessions their team had when they were on the pitch. This is just a limitation on the data used.Kieran Doyle, writing for American Soccer Analysis, addresses this limitation by only including possessions when that player was on the pitch. Termed “action usage %”, this is much closer to the NBA’s metric.
But if we really want to get a solid look at the possession/playmaking responsibility a player has in a football team, we need to look at more than just actions that end possessions. Because, in theory, a key playmaker who is also quite good at completing passes would have a low usage rate since the overwhelming majority of possession-ending actions in football are incomplete passes.
So, for this GYSR edition, I coded usage rate and something I’m calling “involvement rate”. Rather than only looking at the actions thatend possessions, this involvement rate is the number of possessions a player is involved in in any way (recovery, complete pass, incomplete pass, shot, etc.) divided by the total possessions their team had when the player was on the pitch.
I also only consider possessions with at least 2 passes. The reason for this is that I am using raw event data from Opta that doesn’t tag unique possessions. I had to code possessions myself (and I compared them to official Opta possession-sequence data that I did have access to for some games, and it matches 90%+ of the time so I think it’s solid), and can’t consider phases where neither team actually had possession. Which, in League Two, does happen a bit. For example, if a team cleared the ball, then the opponent headed it right back to that team, who headed it back to the opponent, etc… this is a phase where neither team has possession and I need to exclude it.
So I settled on including possessions with at least one complete pass and then a further pass, complete or incomplete. These can have some issues too, but I think requiring 2 complete passes could bias the data.
Right, so, after getting possessions on every event, it’s easy to find the possessions a player has been involved with. It’s also easy to find the possessions theyended, so we can calculate both the usage rate and involvement rate at the same time. The final step is using starting lineup and substitution times to iterate through every player and get the number of possessions their team had when they were on the pitch.
And one last note… usage rate will be heavily (and I meanheavily) skewed towards attackers in football. Maybe some centerbacks or goalkeepers who are instructed to kick it long every single time they touch the ball as well. And the involvement rate will likely be skewed to midfielders, but will almost certainly lead to strikers having low numbers. It’s important to see raw numbers, but that could lead to some important players in specific positions being “hidden”.
The positions we’ll see in this article are generated by a model I built, rather than just a generalised “he’s a full back” team-sheet position. The reason for that is because, particularly with a project like this usage and involvement rate, we need to ensure players are compared appropriately based on the opportunity they have.
I use the same event data to classify a player’s position based on where they tend to have actions on the pitch. Please see aTwitter thread I made a while back with more info. Almost every player will be the position we expect, like a FB, winger, CM, etc. But there might be a fullback that’s classified as a winger because they tend to push up very high and are operating in winger areas, passing like a winger, maybe cutting inside like a winger, and shooting like a winger. Or maybe a FB who is deep and narrow, almost resembling a CB. I’m sorry if this is a bit complex, but I have a problem of spending too much time on explaining my methodologies for things like this… bear with me!
So, let’sfinally see some data! All data here has a minimum threshold of 540 minutes played, or 6 full matches. When looking at rates, we need to weed out some low-minute players.