arstechnica.com

March Madness: A few statistical tips could give you an edge

Crafting the perfect bracket is nearly impossible but there are solid strategies to improve your odds

Credit: Public domain

March Madness is upon us yet again as the NCAA basketball tournament kicks off in earnest: 68 teams facing off in a series of matchups until only one champion is left standing. Millions of sports fans take part by creating their own bracket predictions. The challenge is that there are so many variables and potential outcomes, it's almost impossible to devise the perfect bracket, correctly calling the outcome of every matchup. Online advice abounds, from amateur enthusiasts to professional sports analysts at media giants. (CBS Sportsline's bracket projection model, for instance, simulates every game in the tournament 10,000 times to optimize the accuracy of their predictions.)

But what if you're just looking to, say, beat the office pool, or you don't want to simply blindly follow the seedings or predictions of those elaborate models? What simpler strategies might one employ to gain an edge in a crowded field? Albert Cohen of Michigan State University, who specializes in statistics and actuarial science, including sports analytics, isn't a gambler himself. (Someone who "looks at life based on risk" is understandably rather risk averse when it comes to gambling.) But he did offer Ars some insight into the science of "bracketology," along with a few handy general tips.

Ars Technica: There are so many different possible brackets. What are the actual odds of someone picking the perfect one?

Albert Cohen: If you look at 64 teams, you're going to have 63 games. How come? Well, it's a geometric series. You have 32 plus 16 plus 8 plus 4 plus 2 plus 1. That's 63 games or 63 coin flips. There are going to be 63 teams that lost, and you can map each of their losses to a game. There's one team that never lost. If you add that up as a formula, which is 2N-1—in this case 264-1—that's 9 quadrillion, or 9 with 18 zeros after it. Complete randomness would be a 9.2 quintillion chance of picking the perfect NCAA bracket.

Obviously, each of those coin flips isn't 50/50; they're more like 1 out of 140 for the 16 versus 1s, for instance. If you're looking at a 16 seed beating a 1 seed, I think that's happened once out of 140 games. I'm more of a probabilist than anything else, a mathematician. But a statistician would say, "Oh, maximum likelihood, the parameter to fit for that coin flip would be P, the probability of a heads is 1 out of 140."

A couple of things I'm interested in research-wise is looking at team formation. My background is in quantitative finance and mathematical finance specifically, so we're interested in valuation. What's this thing worth? What's this call option worth? What's this annuity worth? What are the risks underneath it? And so for valuation, it's swapping random play for a fixed contract.

Ars Technica: So what should one look at in a basketball team?

Albert Cohen: I think it's similar to soccer. In soccer, you see passing networks within, and you see that evolve over time. I think that you're going to see that centrality in a basketball team. You're going to see these leaders pop up, like Oakland last year, or St. Peter's. These teams are the underdogs, but if you paid attention, you'd maybe see some of those seeds were already there. So those are the numbers I'm interested in. How many people have gotten a perfect first-round bracket? I think that folks focus on the extremes, but getting 32 coin flips right—it's happened. I'm interested in these derivative events. I think sports is ripe for this kind of thinking.

If a team is undervalued, they might be more likely to take risks because they have nothing to lose in a one-and-done situation like the NCAA tournament. If you've got a team that's bold, understands each other, is willing to take risks but also holds each other accountable, I think that's a dangerous team.

Ars Technica: Poker is sometimes described as a game of incomplete information. It strikes me that there's an element of that here. People do have some information when they're making their picks. They've been following the teams, they know the seed rankings, they have a sense of who the players are, the team strengths and weaknesses. How does that factor into the probabilities?

Albert Cohen: That's a great question. The biggest probability in this case would be the S curve or the logistic curve, which you could use in what's called the Elo ranking, created by a Hungarian physicist [Arpad Elo]. The question he had was, "How do I figure out the probability of two ranked opponents that maybe have never met before? What's the probability of A winning over B?" So you look at the ranking of A and the ranking of B, and you look at the difference of those rankings, and you put that in the logistic function modified by a constant, and that's your prediction.

Why am I mentioning this? It's the recipe, and the ingredients are those rankings. You have the inputs of these rankings, and the difference of the rankings is what you put in the exponent. Are the seedings related to some kind of ranking? Do you agree with the seedings? That's where I think the edge could be. Do you have a better ranking system or one that you think has better information? Have you watched these teams more closely? Bryant versus Michigan State—how many people have been watching Bryant, a 15 seed, very closely?

When you make these kinds of bets one at a time, that little edge of information doesn't matter. But when you look at investment theory, it's leverage, or in poker, when you're having a higher wager—that little extra edge multiplied by a high wager, that's where you start to see the results coming back in your favor. Making one bet at a slight little edge is not going to have a huge effect on someone's bracket.

Ars Technica: Why do you advise people to look at historical matchups when making their choices?

Albert Cohen: The fact that you have two teams that haven't met each other, could you find a proxy for that team? This is why pro sports teams have scout teams. You're going to simulate your opponent. Has there been a team like Bryant that MSU has faced? Probably more likely in the preseason, I would guess, but still, did they have any trouble during a certain stretch of the time? Was that stretch closer toward the end of the game where things are really chaotic? That's what I would probably look for if I were to do a deeper dive.

A player like Steve Nash really punched above his weight. But he was the NBA's most valuable player for multiple years because he made everybody around him better. Have you noticed a player on a smaller team that you think could be an upset, who is that rock driving a lot of the performance share? That's what I would look for to reseed the rankings. That might give you an edge over someone just following the seedings. I'd also consider filling out multiple brackets because maybe one of them has a chance.

Ars Technica: How likely is the number 1 or number 2 seed to win the whole tournament? Did you actually look at that statistically?

Albert Cohen: I don't have an answer, but I have the same question. I've been thinking about this because as a Canucks ice hockey fan in Vancouver, we lost against the Boston Bruins, and the next year, we were the number 1 seed. We won the president's trophy. We were number 1 in the league going up against the number 8 LA Kings in a seven-game series for the Stanley Cup, which we lost. I think a one-and-done top seed is probably more likely to win than a long slog like the Stanley Cup playoffs.

[***Note***: *Per Wikipedia, historically, there have been ten years when two number 1 seeds have faced off in the championship game and 18 instances since 1985 when at least one number 1 seed reached the championship game. There have only been eight years when no number 1 seed made it to the final round. Make of that what you will.*]

Read full news in source page