Distributed Power Rating: How I Built The Athletics NFL Predictions Model using team strengths an

This NFL season I’m going to be releasing a suite of numbers on a weekly basis including updating team strengths, weekly game projections, and season-long playoffs and Super Bowl odds. These are all based on a novel framework I’ve created to discern which NFL teams are contenders, and which are pretenders. I’m calling this framework and associated metrics Distributed Power Rating, or DPR. I’m very excited about this model and think it can go a long way in advancing the public discourse on team strengths. Because this isn’t my first time releasing a new metric, I’ve gone ahead and compiled some responses to questions I am sure to be asked. If I’ve left off something you’d like to know, please feel free to leave a comment below or tweet at me.

Super Bowl chances, record projections and playoff odds for every team

Week 1 NFL Predictions Model

What is Distributed Power Rating (DPR)?

DPR is simply an estimate of how good each team is, as measured by how much we’d favor them to beat an average team on a neutral field. Each team has a DPR for their individual units (Passing Offense, Passing Defense, Rushing Offense, Rushing Defense, and Special Teams), hence the “Distributed” in the title, which combine to form one total rating.

How are you estimating team strength?

DPR leverages two key algorithms:

A decades-old rating system known as Elo, named after Arpad Elo, its creator.

A modern NFL Expected Points model provided by nflfastR

Blending the granularity and credit distribution that an Expected Points framework provides with the simplicity and natural opponent adjustments offered by Elo ratings proves to be an exceptional way to capture and appropriately weight team performance. Let me explain.

Through a simple look at team record, scoring differential, or recent Lombardi Trophies, we know the Super Bowl LIV champions, the Tampa Bay Buccaneers, are a good team. However, what specifically makes the Bucs good? Is it their rushing attack? Their staunch passing defense? These are all things that dedicated fans innately pick up on, but can be hard for box scores to differentiate. Expected Points gives us a framework through which we can start to distribute credit to a team’s units for their success.

Further, football is a unique sport in that while two teams always play head-to-head, the majority of players on one team’s roster will rarely go up against the majority of players on the opposing team’s roster. While you may have seen Super Bowl LIV branded as “Mahomes vs Brady,” those two players were not truly competing against each other — rather, they were competing against the opposing defense.

We can use Expected Points to leverage this concept and better account for opponent strength. A defense in 2020 smothering the Chiefs explosive passing unit is a much different proposition than a defense halting their rather pedestrian rushing attack. How I systematically account for that difference and update team ratings accordingly is where Elo comes into play.

In my framework, six units face off each game, each with their own individual Elo rating. Passing offenses vs. passing defenses, rushing offenses vs. rushing defenses, and special teams coverage vs. special teams kicking/punt. Each game, these units are assigned a “win” or a “loss” and a “point margin”, determined by the Expected Points Added by that unit over the course of the game, which allows their rating to be updated through the traditional Elo framework.

Before the Expected Points are summed up for each unit, some modifications are made. Certain aspects of a team’s performance are more likely to be indicative of future performance than others, and so Expected Points are weighted at the play level to add more predictive value. For example, early-down passes are weighted higher than late-down passes, fumbles on sacks are weighted higher than fumbles on rushing plays, and punt/kickoff plays are weighted higher than field goal attempts.

At the highest level, that’s all DPR is doing — distributing team performance by unit, weighing that performance, and then using that team performance to update unit ratings with Elo. I then linearly combine these team unit ratings to form one team strength rating, weighing each unit strength by how well it predicts eventual game outcomes.

How do you know your estimates are any good?

Great question. We never really know if our models are any good. The NFL is full of small sample sizes — we only see 256 (now 272) games per season, and the landscape of the league is constantly changing. I’ve decided to assess how good my team strength rating system is by using it to predict upcoming games and measuring how far off my point prediction is from the final game result. Specifically, I’ve individually set aside 2016, 2017, 2018, 2019, and 2020 seasons, letting the model learn from every prior season before assessing how well it performs on the unseen data (e.g. to get my 2018 error, the model is trained from 1999-2017).

Here’s how that looks in comparison to all of the public models available at thepredictiontracker.com.

Chart showing how well DPR performs compared to other public models. DPR has the lowest error with the exception of the closing lines

Each line here represents a public team strength rating system. I’ve specifically highlighted a few important benchmarks. The dashed black line is a team’s Pythagorean rating, a simple algorithm that turns a team’s point differential (points scored – points allowed) into a predictive team strength rating. The solid black line is the betting market closing spread lines. My team strength rating, DPR, is the bolded blue line. Averaging the error from 2016 through 2020, my model beats every public rating system on the chart with the exception of the betting market closing lines, which my model actually beats in 2019 and is near identical to in 2020. While anything can happen in 2021, the above gives me confidence that my model will have strong results moving forward.

How do you use these estimates to make projections?

For each upcoming game, we have 12 team strength ratings — six for each team. I find the difference in ratings between the two units set to face-off, e.g. the difference between the home team’s offensive pass DPR and the away team’s defensive pass DPR. These differences, along with contextual information about the game such as how much rest each team has had and who has the home-field advantage, are fed into a new model to predict the score of the upcoming games.

This model also allows me to “simulate” seasons, forming season-long projections. Specifically, I run 100,000 Monte Carlo Simulations, adding uncertainty to my game predictions as the season progresses to account for the fact that our initial team strength ratings are not likely to hold up by week 18.

How do you account for roster turnover & injury?

I don’t! At least, not directly. I do take general roster turnover into account by regressing each unit’s rating a little closer to the mean between seasons, varying the amount of that regression by unit. For instance, a pass defense DPR will be pulled closer to average between seasons than a pass offense DPR, as historically pass defense performance sees more variance between season. However, I don’t account for specific roster turnover on any particular team. This means that the Jaguars rolling out Trevor Lawrence in week one is not reflected in their initial ratings for the season, but the general fact that some teams may be bringing in new talent at QB is accounted for. This may cause some small but notable discrepancies in my model and the betting markets early on in the season, and is definitely a source of bias. However, my framework is built to update quickly — once Trevor Lawrence starts taking snaps for the Jaguars if he ends up being a big upgrade to their passing offense, their passing offense Elo rating will update to reflect that. As you can see in the chart above, the team strength ratings are valid predictive tools despite this limitation.

Why have you excluded betting market information?

Incorporating information from betting markets is surely the fastest way to improve your model accuracy. However, my goal here isn’t just accuracy, it’s to use these ratings to tell a story. How are teams progressing throughout the season? Which units are strongest, and which contenders have some key weaknesses? How does on-field performance differ from the betting markets, and what might that tell us?

Including betting market data (for which we don’t have access to the inputs of) muddles our ability to tell a clear story. Plus, I enjoyed the challenge of achieving high accuracy without it!

Why do you hate my team?

For fans of 31/32 teams, good news, I don’t hate your team! For the remainder of you, I’ve still got good news! I may hate one team, but DPR doesn’t know which one. The model is, well, just that — a statistical model. It doesn’t have emotions, it doesn’t hate, love, or even feel anything about any team at all! All it does is take numbers, perform calculations, and output more numbers. Kind of a sad existence, when you think about it.

In fact, I’ll often personally disagree with what the model outputs, and there will likely be many scenarios where it would behoove you to believe yourself over what these silly ratings say. That said, I’m doubtful in my ability to discern team strengths at the level of precision necessary to match the performance of closing point spreads, and thus I often favor the model over my own intuition.

Why should I care?

Surprisingly, not everyone is keen to geek out on amalgams of accurate algorithms. There are, however, a few reasons why you, a curious reader, should care about this specific amalgam of accurate algorithms. For one, it’s possibly the most accurate public team strength model on the internet, so you can leverage it to help you win office pools, impress your friends, or bet. Two, we can actually use outputs from this model to learn a lot about the game itself, like how winning the passing game is key to sustained NFL success. And lastly, and most certainly most importantly, you can and should use it to settle arguments on the internet about which team is actually better.

(Photo of Kyler Murray: Mike Ehrmann / Getty Images)

ncG1vNJzZmismJqutbTLnquim16YvK57kXFobGphZnxzfJFqZmlxX2WFcLDIrKuroZKqwaawjKmmsJ2iYr%2BiwMinnmagn6x6qnnBrqClrF2ptaZ5wK2fpZ2knrC0ec2fo2aoopqxqq%2FToqanq12ivKWxy2asrKGenHq1scCmZKysopq7qMDHrGSappRixKatyqecrKuVqHw%3D

DazeVlog

Distributed Power Rating: How I Built The Athletics NFL Predictions Model using team strengths an