Appendix A — NFL Analytics Quick Reference Guide

A.1 Air Yards

Air yards is the measure that the ball travels through the air, from the line of scrimmage, to the exact point where the wide receivers catches, or does not catch, the football. It does not take into consideration the amount of yardage gained after the catch by the wide receiver (which would be yards after catch).

For an example, please see the below illustration. In it, the line of scrimmage is at the 20-yardline. The QB completes a pass that is caught at midfield (the 50-yardline). After catching the football, the wide receiver is able to advance the ball down to the opposing 30-yardline before getting tackled. First and foremost, the quarterback is credited with a total of 50 passing yards on the play, while the wide receiver is credited with the same.

However, because air yards is a better metric to explore a QB’s true impact on a play, he is credited with 30 air yards while the wide receiver is credited with 20 yards after catch.

In the end, quarterbacks with higher air yards per attempt are generally assumed to be throwing the ball deeper downfield than QBs with lower air yards per attempt.

Visual representation of air yards

There are multiple ways to collect data pertaining to air yards. However, the most straightforward way is to use load_player_stats:

data <- nflreadr::load_player_stats(2021)

air.yards <- data %>%
  filter(season_type == "REG") %>%
  group_by(player_id) %>%
  summarize(
    attempts = sum(attempts),
    name = first(player_name),
    air.yards = sum(passing_air_yards),
    avg.ay = mean(passing_air_yards)) %>%
  filter(attempts >= 100) %>%
  select(name, air.yards, avg.ay) %>%
  arrange(-air.yards)

air.yards
# A tibble: 42 x 3
   name       air.yards avg.ay
   <chr>          <dbl>  <dbl>
 1 T.Brady         5821   342.
 2 J.Allen         5295   311.
 3 M.Stafford      5094   300.
 4 D.Carr          5084   299.
 5 J.Herbert       5069   298.
 6 P.Mahomes       4825   284.
 7 T.Lawrence      4732   278.
 8 D.Prescott      4612   288.
 9 K.Cousins       4575   286.
10 J.Burrow        4225   264.
# i 32 more rows

In the above example, we can see that Tom Brady led the NFL during the 2021 regular season with a comined total of 5,821 air yards which works out to an average of 342 air yards per game.

A.2 Air Yards Share

A receiving statistics, air yards share is the measure of a player’s share of the team’s total air yards in a game/season. This metric can be found using load_player_stats().

nfl_stats <- nflreadr::load_player_stats()

total_ay_share <- nfl_stats %>%
  filter(position == "WR") %>%
  group_by(player_name) %>%
  summarize(
    total_rec = sum(receptions, na.rm = TRUE),
    ay_share = sum(air_yards_share, na.rm = TRUE)) %>%
  filter(total_rec >= 100) %>%
  arrange(-ay_share) %>%
  slice(1:10)

total_ay_share
# A tibble: 10 x 3
   player_name total_rec ay_share
   <chr>           <int>    <dbl>
 1 D.Adams           103     7.61
 2 D.Moore           101     7.53
 3 A.Brown           106     7.10
 4 T.Hill            119     7.04
 5 C.Lamb            135     6.14
 6 S.Diggs           107     5.77
 7 J.Chase           100     5.67
 8 P.Nacua           105     5.56
 9 K.Allen           108     5.42
10 M.Pittman         109     5.42

A.3 Average Cushion

The average cushion measures the distance, in yards, between a WR/TE and the defender lined up against them at the line of scrimmage. This metric is included in the load_nextgen_stats() function.

nextgen_stats <- nflreadr::load_nextgen_stats(stat_type = "receiving")

wr_cushion <- nextgen_stats %>%
  filter(week == 0 & season == 2022 & receptions >= 100) %>%
  select(player_display_name, avg_cushion) %>%
  arrange(-avg_cushion) %>%
  slice(1:10)

wr_cushion
# A tibble: 8 x 2
  player_display_name avg_cushion
  <chr>                     <dbl>
1 Chris Godwin               6.68
2 CeeDee Lamb                6.56
3 Amon-Ra St. Brown          6.52
4 Tyreek Hill                6.38
5 Travis Kelce               6.28
6 Davante Adams              5.55
7 Justin Jefferson           5.43
8 Stefon Diggs               5.36

A.4 Average Separation

Average separation measures the distance (in yards) between the receivers and the nearest defender at the time of catch/incompletion.

nextgen_stats <- nflreadr::load_nextgen_stats(stat_type = "receiving")

wr_separation <- nextgen_stats %>%
  filter(week == 0 & season == 2022 & receptions >= 100) %>%
  select(player_display_name, avg_separation) %>%
  arrange(-avg_separation) %>%
  slice(1:10)

wr_separation
# A tibble: 8 x 2
  player_display_name avg_separation
  <chr>                        <dbl>
1 Tyreek Hill                   3.31
2 Amon-Ra St. Brown             3.10
3 Justin Jefferson              3.09
4 CeeDee Lamb                   3.07
5 Chris Godwin                  2.98
6 Davante Adams                 2.95
7 Travis Kelce                  2.88
8 Stefon Diggs                  2.83

A.5 Average Depth of Target

As mentioned above, a QB’s air yards per attempt can highlight whether or not he is attempting to push the ball deeper down field than his counterparts. The official name of this is Average Depth of Target (or ADOT). We can easily generate this statistic using the load_player_stats function within nflreader:

data <- nflreadr::load_player_stats(2021)

adot <- data %>%
  filter(season_type == "REG") %>%
  group_by(player_id) %>%
  summarize(
    name = first(player_name),
    attempts = sum(attempts),
    air.yards = sum(passing_air_yards),
    adot = air.yards / attempts) %>%
  filter(attempts >= 100) %>%
  arrange(-adot)

adot
# A tibble: 42 x 5
   player_id  name       attempts air.yards  adot
   <chr>      <chr>         <int>     <dbl> <dbl>
 1 00-0035704 D.Lock          111      1117 10.1 
 2 00-0029263 R.Wilson        400      3955  9.89
 3 00-0036945 J.Fields        270      2636  9.76
 4 00-0034796 L.Jackson       382      3531  9.24
 5 00-0036389 J.Hurts         432      3882  8.99
 6 00-0034855 B.Mayfield      418      3651  8.73
 7 00-0026498 M.Stafford      601      5094  8.48
 8 00-0031503 J.Winston       161      1340  8.32
 9 00-0034857 J.Allen         646      5295  8.20
10 00-0029604 K.Cousins       561      4575  8.16
# i 32 more rows

As seen in the results, if we ignore Drew Lock’s 10.1 ADOT on just 111 attempts during the 2021 regular season, Russell Wilson attempted to push the ball, on average, furthest downfield among QBs with at least 100 attempts.

A.6 Completion Percentage Over Expected (CPOE)

At the conclusion fo the 2016 season, Sam Bradford, the quarterback of the Minnesota Vikings, recorded the highest completion percentage in NFL history, connecting on 71.6% of his attempts during the season. However, Bradford achieved this record by averaging just 6.4 yards per attempt. Bradford’s record-breaking completion percentage is suddenly less impressive when one realizes that he was rarely attempting downfield passes.

Because of this example, we can conclude that a quarterback’s completion percentage may not tell us the whole “story.” To adjust a quarterback’s completion percentage to include such contextual inputs such as air yards, we can turn to using completion percentage over expected (CPOE). A pre-calculated metric based on historical attempts in similar situations, CPOE take into account multiple variables, including: field position, down, yards to go, total air yards, etc.

The CPOE metric in the nflverse was developed by Ben Baldwin with a further explanation of it here: nflfastR EP, WP, CP, xYAC, and xPass Models. The data is included in the load_pbp() function of nflreadR.

pbp <- nflreadr::load_pbp(2022) %>%
  filter(season_type == "REG")

cpoe_2022 <- pbp %>%
  group_by(passer) %>%
  filter(complete_pass == 1 |
           incomplete_pass == 1 |
           interception == 1,
         !is.na(down)) %>%
  summarize(total_attempts = n(),
            mean_cpoe = mean(cpoe, na.rm = TRUE)) %>%
  filter(total_attempts >= 300) %>%
  arrange(-mean_cpoe) %>%
  slice(1:10)

cpoe_2022
# A tibble: 10 x 3
   passer       total_attempts mean_cpoe
   <chr>                 <int>     <dbl>
 1 G.Smith                 571      5.68
 2 P.Mahomes               648      3.59
 3 J.Brissett              366      2.86
 4 J.Burrow                605      2.74
 5 J.Hurts                 460      2.73
 6 D.Jones                 467      2.32
 7 T.Lawrence              582      1.45
 8 T.Tagovailoa            399      1.43
 9 J.Herbert               697      1.35
10 K.Cousins               640      1.26

A.7 DAKOTA

A QB’s DAKOTA score is the adjusted EPA+CPOE composite that is based on the coefficients which best predicted the adjusted EPA/play in the prior year. The DAKOTA score is available in the load_player_stats() function within the nflverse.

nfl_stats <- nflreadr::load_player_stats()

mean_dakota <- nfl_stats %>%
  filter(position == "QB") %>%
  group_by(player_name) %>%
  summarize(
    total_cmp = sum(completions, na.rm = TRUE),
    mean_dakota = mean(dakota, na.rm = TRUE)) %>%
  filter(total_cmp >= 250) %>%
  arrange(-mean_dakota) %>%
  slice(1:10)

mean_dakota
# A tibble: 10 x 3
   player_name  total_cmp mean_dakota
   <chr>            <int>       <dbl>
 1 B.Purdy            308       0.218
 2 D.Prescott         410       0.164
 3 T.Tagovailoa       388       0.151
 4 J.Allen            385       0.147
 5 L.Jackson          307       0.134
 6 J.Love             372       0.132
 7 J.Hurts            352       0.129
 8 J.Goff             407       0.124
 9 P.Mahomes          401       0.119
10 D.Carr             375       0.116

A.8 Running Back Efficiency

A running back’s efficiency is measured by taking the total distance traveled, according to NextGen Stats, per the total of yards gained on the run. A lower number indicates a more North/South type runner, while a higher number indicates a running back that “dances” and runs laterally relevant to the line of scrimmage.

nextgen_stats <- nflreadr::load_nextgen_stats(stat_type = "rushing")

rb_efficiency <- nextgen_stats %>%
  filter(week == 0 & rush_attempts >= 300) %>%
  select(player_display_name, efficiency) %>%
  arrange(efficiency)

rb_efficiency
# A tibble: 12 x 2
   player_display_name efficiency
   <chr>                    <dbl>
 1 Jonathan Taylor           3.17
 2 Josh Jacobs               3.33
 3 Derrick Henry             3.45
 4 Ezekiel Elliott           3.49
 5 Ezekiel Elliott           3.57
 6 Derrick Henry             3.57
 7 Ezekiel Elliott           3.65
 8 Dalvin Cook               3.74
 9 Nick Chubb                3.80
10 Derrick Henry             3.85
11 Najee Harris              3.97
12 Le'Veon Bell              4.17

A.9 Expected Points Added (EPA)

Expected Points Added is a measure of how well a team/player performed on a single play against relative expectations. At its core, EPA is the difference in expected points before and after each play. Because of this, on any given play, a team has the ability to either increase or decrease the expected points, with EPA being that specific difference. Importantly, EPA includes various contextual factors into its calculation such as down, distance, position on the field, etc. Then, based on historical data, an estimation of how many points, on average, a team is expected to score on a given situation is provided.

For instance, if the Chiefs are on their own 20-yard line with a 1st and 10, the expected points might be 0.5 based on historical data. This is the expected points before the play. If Mahomes completes a 10-yard pass, and now it is 1st and 10 on their own 30-yard line, the expected points for this new situation may increase to 0.8.

Therefore, the expected points added (EPA) of that 10-yard pass would be 0.8 - 0.5 = 0.3 points. The resulting positive number indicates that the pass was beneficial and increased the team’s expected points.

The EP and EPA values are provided for each play in the play-by-play data.

ep_and_epa <- nflreadr::load_pbp(2022) %>%
  filter(season_type == "REG" & posteam == "KC") %>%
  filter(!play_type %in% c("kickoff", "no_play")) %>%
  select(posteam, down, ydstogo, desc, play_type, ep, epa)

ep_and_epa
# A tibble: 1,252 x 7
   posteam  down ydstogo desc                   play_type    ep    epa
   <chr>   <dbl>   <dbl> <chr>                  <chr>     <dbl>  <dbl>
 1 KC          1      10 (15:00) (Shotgun) 25-~ run       0.931  0.841
 2 KC          2       1 (14:27) (Shotgun) 15-~ run       1.77  -0.263
 3 KC          1      10 (13:52) 25-C.Edwards-~ run       1.51   0.454
 4 KC          2       3 (13:15) (Shotgun) 25-~ run       1.96   1.37 
 5 KC          1      10 (12:36) (Shotgun) 15-~ pass      3.34  -0.573
 6 KC          2      10 (12:30) (Shotgun) 15-~ pass      2.76   1.32 
 7 KC          1      10 (11:54) 15-P.Mahomes ~ pass      4.09  -0.178
 8 KC          2       7 (11:11) (Shotgun) 25-~ run       3.91  -0.634
 9 KC          3       7 (10:28) (Shotgun) 15-~ pass      3.28   2.03 
10 KC          1       9 (9:47) (Shotgun) 15-P~ pass      5.30  -0.565
# i 1,242 more rows

As well, the load_player_stats() function provides calculated passing_epa, rushing_epa, and receiving_epa per player on a weekly basis.

weekly_epa <- nflreadr::load_player_stats() %>%
  filter(player_display_name == "Tom Brady") %>%
  select(player_display_name, week, passing_epa)

weekly_epa
# A tibble: 0 x 3
# i 3 variables: player_display_name <chr>, week <int>,
#   passing_epa <dbl>

A.10 Expected Yards After Catch Success (xYAC Success)

xYAC Success is the probability that a play results in positive EPA (relative to where the play started) based on where the receiver caught the ball.

pbp <- nflreadr::load_pbp(2022) %>%
  filter(season_type == "REG")

xyac <- pbp %>%
  filter(!is.na(xyac_success)) %>%
  group_by(receiver) %>%
  summarize(
    completetions = sum(complete_pass == 1, na.rm = TRUE),
    mean_xyac = mean(xyac_success, na.rm = TRUE)) %>%
  filter(completetions >= 100) %>%
  arrange(-mean_xyac) %>%
  slice(1:10)

xyac
# A tibble: 8 x 3
  receiver    completetions mean_xyac
  <chr>               <int>     <dbl>
1 S.Diggs               100     0.872
2 T.Kelce               106     0.872
3 J.Jefferson           124     0.863
4 T.Hill                125     0.856
5 C.Lamb                103     0.832
6 A.St. Brown           103     0.810
7 C.Godwin              103     0.788
8 A.Ekeler              106     0.548

Considering only those receivers with 100 or more receptions during the 2022 regular season, Stefon Diggs and Travis Kelcoe had the highest expected yards after catch success rate, with the model predicting that just over 82% of their receptions would result with a positive EPA once factoring in yards after catch.

A.11 Expected Yards After Catch Mean Yardage (xYAC Mean Yardage)

Just as above XYAC Success is the probability that the reception results in a positive EPA, xYAC Mean Yardage is the expected yards after catch based on where the ball was caught. We can use this metric to determine how much impact the receiver had after the reception against what the xYAC Mean Yardage model predicted.

pbp <- nflreadr::load_pbp(2022) %>%
  filter(season_type == "REG")

xyac_meanyardage <- pbp %>%
  filter(!is.na(xyac_mean_yardage)) %>%
  group_by(receiver) %>%
  mutate(mean_yardage_result = ifelse(yards_after_catch >= xyac_mean_yardage,
                                      1, 0)) %>%
  summarize(total_receptions = sum(complete_pass == 1,
                                   na.rm = TRUE),
            total_higher = sum(mean_yardage_result,
                               na.rm = TRUE),
            pct = total_higher / total_receptions) %>%
  filter(total_receptions >= 100) %>%
  arrange(-pct) %>%
  slice(1:10)

xyac_meanyardage
# A tibble: 8 x 4
  receiver    total_receptions total_higher   pct
  <chr>                  <int>        <dbl> <dbl>
1 T.Kelce                  106           51 0.481
2 A.Ekeler                 106           47 0.443
3 A.St. Brown              103           44 0.427
4 J.Jefferson              124           47 0.379
5 S.Diggs                  100           37 0.37 
6 C.Lamb                   103           35 0.340
7 T.Hill                   125           41 0.328
8 C.Godwin                 103           29 0.282

A.12 Pass Over Expected

Pass Over Expected is the probability that a play will be a pass scaled from 0 to 100 and is based on multiple factors, including yard line, score differential, to who is the home team. The numeric value indicates how much over (or under) expectation each offense called a pass play in a given situation.

For example, we can use the metric to determine the pass over expected value for Buffalo, Cincinnati, Kansas City, and Philadelphia on 1st down with between 1 and 10 yards to go.

pbp <- nflreadr::load_pbp(2022) %>%
  filter(season_type == "REG")

pass_over_expected <- pbp %>%
  filter(down == 1 & ydstogo <= 10) %>%
  filter(posteam %in% c("BUF", "CIN", "KC", "PHI")) %>%
  group_by(posteam, ydstogo) %>%
  summarize(mean_passoe = mean(pass_oe, na.rm = TRUE))

ggplot(pass_over_expected, aes(x = ydstogo, y = mean_passoe,
                               group = posteam)) +
  geom_smooth(se = FALSE, aes(color = posteam), size = 2) +
  nflplotR::scale_color_nfl(type = "primary") +
  scale_x_continuous(breaks = seq(1,10, 1)) +
  scale_y_continuous(breaks = scales::pretty_breaks(),
                     labels = scales::percent_format(scale = 1)) +
  nfl_analytics_theme() +
  xlab("1st Down - Yards To Go") +
  ylab("Average Pass Over Expected") +
  labs(title = "**Average Pass Over Expected**",
       subtitle = "*1st Down: BUF, CIN, KC, PHI*",
       caption = "*An Introduction to NFL Analytics with R*<br>
       **Brad J. Congelio**")

Pass Over Expected Values on 1st Down for Buffalo, Cincinnati, Kansas City, and Philadelphia

Noticeably, the Chiefs pass well over expected (especially compared to the other three teams) when it is 1st down with six yards to go.

A.13 Success Rate

Prior to the formulation of EP and EPA, success rate was calculated based on the percentage of yardage gained on a play (40% of the necessary yards on 1st down, 60% on 2nd down, and 100% of the yards needed for a 1st down on both 3rd and 4th downs). However, modern success rate is determined simply by whether or not the specific play had an EPA greater than 0. Using success rate allows us to determine, for example, whether an offensive unit is stronger in rushing or passing attempts, as well as serving as a baseline indicator of a team’s consistency.

pbp <- nflreadr::load_pbp(2022) %>%
  filter(season_type == "REG")

success_rate <- pbp %>%
  filter(play_type %in% c("pass", "run")) %>%
  group_by(posteam, play_type) %>%
  summarize(mean_success = mean(success, na.rm = TRUE)) %>%
  filter(posteam %in% c("BAL", "CIN", "CLE", "PIT"))

success_rate
# A tibble: 8 x 3
# Groups:   posteam [4]
  posteam play_type mean_success
  <chr>   <chr>            <dbl>
1 BAL     pass             0.423
2 BAL     run              0.484
3 CIN     pass             0.492
4 CIN     run              0.439
5 CLE     pass             0.433
6 CLE     run              0.449
7 PIT     pass             0.434
8 PIT     run              0.470

As seen in the output of success_rate, the teams in the AFC North were generally evenly matched. The Ravens had more success running the ball (.48 to .42) while the Bengals found more success in the air (.49 to .43). The Browns’ success rate on passing and rushing attempts were nearly equal (.43 to .44).

A.14 Time to Line of Scrimmage

Measured by NextGen Stats, this is a calculation (to the 1/10th of a second), regarding how long it take the running back to cross the line of scrimmage.

nextgen_stats <- nflreadr::load_nextgen_stats(stat_type = "rushing")

avg_los <- nextgen_stats %>%
  filter(week == 0 & season == 2022 & rush_attempts >= 200) %>%
  select(player_display_name, avg_time_to_los) %>%
  arrange(avg_time_to_los) %>%
  slice(1:10)

avg_los
# A tibble: 10 x 2
   player_display_name avg_time_to_los
   <chr>                         <dbl>
 1 Jamaal Williams                2.64
 2 D'Onta Foreman                 2.65
 3 Joe Mixon                      2.66
 4 Saquon Barkley                 2.70
 5 Derrick Henry                  2.70
 6 Najee Harris                   2.72
 7 Ezekiel Elliott                2.72
 8 Rhamondre Stevenson            2.73
 9 Alvin Kamara                   2.75
10 Austin Ekeler                  2.76