Home > Identifying Value: Regression, Randomness, and Running Backs, Part 2

*Jonathan Bales is the author of** **Fantasy Football for Smart People: How to Dominate Your Draft**. *

*If you missed PART 1 of this series – Read it here*

One of the most frequent mistakes made by fantasy football owners is assuming all correlations are due to a causal effect. Lots of things in life are related, yet have no effect on one another. The old notion that great running teams win football games is an illusion based on a misunderstanding of the correlation/causation distinction, for example. Yes, winning teams average more rushing yards than losing teams, but that’s because teams that are *already winning* run the ball late in games. In reality, they usually gain the lead by passing the football effectively.

A prevalent fantasy football “truism” is the idea that overworked running backs struggle in subsequent seasons. There are numerous studies out there detailing how running backs struggle when coming off a season with 350 touches, or 370 touches, or however many touches is necessary for the study to make sense. The exact number is usually chosen ex post facto and is to be regarded as a “magical threshold” that spells doom in the following season if crossed.

The graph above shows you just how silly some of this analysis can get. In a study on the effect of 370-plus carries on running backs, the number seems to be chosen after the fact because it makes the numbers more extreme. You can see the abundance of running backs that actually improved their yards-per-carry, yet came just a few carries short of 370. Are we really to believe a running back who carried the ball 365 times in a season is to be trusted in the subsequent season, but those with 370 are doomed?

In reality, though, **running backs who garner a large number of touches in a season are generally more likely to see a drop in production and health in the following year, but this information is both insignificant and irrelevant.**

Think about what it takes to acquire nearly 400 touches in a season. For one, a running back needs to be healthy. Really healthy. Secondly, chances are he is running efficiently. Running backs who average 3.5 yards-per-carry over the first half of the season don’t generally continue to see the 24 carries a game needed to break the 370 threshold. Thus, our sample size of high-carry backs is skewed by those performing well.

This is where regression toward the mean comes in. By filtering out injured and underperforming backs, selecting those with a high number of carries means we are selecting the outliers in more areas than one. We aren’t isolating the numbers based on carries, but rather based on health and efficiency as well. So when we make conclusions concerning health and efficiency, all we’re really saying is players who have unusual health and a higher-than-normal YPC are likely to have worse health and a lower YPC the following year. Uh, yeah. . .no crap.

So yeah, **running backs with a lot of carries in year Y usually see a drop in production in year+1, but it’s a product of regression, not a heavy workload**. The graph below supports this idea.

You can see **the efficiency of all running backs tends to regress to the mean, not just those coming off seasons with heavy workloads**. If a back runs for 6.0 YPC in a year, he will probably see a decline in efficiency whether he had 50 carries or 400.

Thus, **while the production of a running back coming off a season with a heavy workload is likely to decrease, it is not a legitimate reason to avoid that player in fantasy drafts**. The (probable) decrease in production is due to the previous season being a statistical outlier (a result that is unusually far from the mean).

The best way to look at the situation is this: **what is the running back’s chance of generating production that is comparable to the previous year?** It is actually the sameas it was prior to the start of the previous season, i.e. the workload has no noticeable effect on his ability to produce.

For example, if a running back has a 20 percent chance of garnering 2,000 total yards in a season, that percentage remains stable (assuming his skills level does the same) from year to year. Thus, the chance of this player following a 2,000 yard season with another is unlikely, but not due to a heavy workload (a necessity for such productive output), but rather the fact that he only had a 20 percent chance to do so from the start. We wrongly (and ironically) attribute the decrease in production to the player’s prior success when, in reality, no such causal relationship exists.

As I wrote earlier, fantasy owners need to determine which aspects of players’ games are repeatable, and which are a matter of luck. Understanding the position consistency I detailed in previous sections is a start, and it gives us a foundation from which we can make projections of specific statistics.

Regression toward the mean is a factor in all projections. Rather than simply arbitrarily guessing projections, there are formulas we can use to make more educated predictions (albeit still “guesses”). To exemplify the magnitude of regression in projections, let’s examine how to go about predicting a running back’s yards-per-carry.

**It turns out yards-per-carry has a ****correlation strength of about 0.43**** from season to season**. That number is similar to the 0.50 correlational strength we saw with year-to-year rushing yards-per-game. It also means a large aspect of predicting running backs’ YPC is simply accounting for the “luck” they experienced the season before.

After all is said and done, we can accurately predict YPC with the following formula:

**YPC_n+1 = LgAvgYPC + 0.43*(YrNDiff)**

In layman’s terms, the most accurate YPC projection we can make is taking 3/7 of the previous year’s YPC and adding it to 4/7 of the league average (about 4.2). For a running back who averaged 6.0 yards-per-carry, the projection would be 6.0 (3/7) + 4.2 (4/7) = 4.97 YPC.

Notice the formula will decrease the projected YPC of any back who registered above 4.2, but increase the YPC of anyone below that figure. A back who mustered only 3.8 YPC in year Y is most likely to total 3.8 (3/7) + 4.2 (4/7) = 4.03 YPC in year Y+1.

There will be more information on projecting specific stats for each position in following analysis. A lot of this will be based on regression toward the mean, but there are certainly a lot of other factors at play. It is important to remember these formulas aren’t a definitive source for final projections, but rather a solid base from which to work.

- Regression to the mean states “extreme” events tend to regress toward the average, and fantasy owners can use it to acquire value. Just as traders buy low and sell high, fantasy owners can pinpoint which players are due for boosts or declines in production based on how much of their previous production was caused by random factors.

- Your job as an owner isn’t to select players who had poor seasons, but rather those who are being undervalued due to production that was below their “average season.” In effect, you are buying low on players whose value will “regress” upward.

- Regression toward the mean shows us running backs coming off of seasons with heavy workloads are likely to see a decline in production, but not because of the workload itself. Instead, these backs are necessarily the outliers from the previous season, and statistically likely to regress. In practical terms, it means there’s no reason for owners to purposely avoid running backs who had a lot of touches the prior season.

- One of the easiest and most accurate ways to predict a running back’s yards-per-carry is to multiply his YPC from the previous year by 3/7, then add that number by 4/7 of the league average YPC (which works out to 2.4). Other factors are of course relevant, but this is a great foundation from which to work.

**—————————————————**

You can **buy Fantasy Football for Smart People at Amazon**.

Be sure to check out other great articles at Fantasy Knuckleheads.