Almost everything you’re going to see in this series was not an organic idea of mine. Every ingredient placed into the work i’ve spent months on is built off the shoulders of others. If you missed Part 1, you can read it here.
You might notice i’m linking to Part 1 instead of Part 2 in the preamble. Part 2 turned out to be completely incorrect. This is why i stick to blogging, because as i gather data and information there’s a chance what i’m posting could have errors. Turns out Part 2 had many errors. Some records in the database were incorrect, completely flipping the model upside down. It never made sense for explosiveness to be negatively correlated to begin with. Now the errors are fixed. The model re-ran, and also upgraded.
So let’s start from the beginning. What we know so far is that college linebackers transitioning to the NFL tend to be successful when they are productive, young, and fast. But how important is each statistic? Are they predictive? We have correlation, but do we have predictive numbers? Well, that’s what I decided to test.
Using Gretl, I ran a backwards elimination logistic regression. That means I threw all the data that I assumed had value into the model and then removed the one that was least predictive. Then I re-ran the model, repeated removing the least predictive variable, and continued repeating until I was left with only predictive values. For this model, the significance level is set to .05 but the model ran better when one of the variables above that level were kept in.
Here’s what I threw into the model:
- BMI – Body Mass Index formula
- Height-Adjusted Speed Score (HASS) / Speed Score / 40 Time – Each number is a measurement of straight line speed. HASS and Speed Score adjust for size, while 40 time is only raw speed. Tested separately due to collinearity.
- Agility Score – Addition of 3-Cone and 20-Yard Shuttle drills.
- Lower Body Explosiveness – Contextualizes the broad jump/vertical jump in relation to the players size.
- Tackle Radius – Measurement of a players range to tackle, both laterally across the field and against their length. Uses 40 time, arm length, height. Tested separately due to collinearity with speed tests.
- Power Five – Yes/No of whether or not the player played in a Power Five conference.
- Breakout Age – Age at end of season where player surpass 8% of college teams total solo tackles that season.
- College Solo Tackle Share / College Tackle Share – The players best college season for percentage of team solo or total tackles. If final season wasn’t the best season, it’s averaged with the best season. Tested separately due to collinearity.
- Age At Draft – Players age at draft.
I’m sure you already have an idea of what is and isn’t important just from reading that list. You’ve already set some expectations for what will be and won’t be predictive. If you haven’t, then go do it. I want you to have an idea of what you think is important.
Here’s the best model that I could create out of these inputs:
That’s it. Three factors make up the predictive elements for repeat Top-24 off-ball linebackers. Height-Adjusted Speed Score, College Solo Tackle Share, and Age at Draft. It should make sense. Players that are fast in general are good, players that are fast for their size are versatile, and both have a greater margin of error. Players that produced a lot of solo tackles in college were better than their teammates at taking down players and also diagnosing plays. Younger players have more time to adapt to coaching and a higher chance of becoming prodigies. They probably grasped the college game at a younger age than their contemporaries too.
But how good is this model at predicting success?
Turns out not that great. It’s accurate 87.4% of the time with most of it’s accuracies being in predicting who will fail (177). It only correctly predicted 4 successful linebackers that also succeeded, incorrectly assumed 25 successful linebackers would fail, and incorrectly assumed 1 linebacker that failed would succeed. So the model can’t be trusted on it’s own, but it can still be used to analyze the players who are likely to fail at least.
But that’s just for linebackers who were repeat top-24 throughout their careers. What about if we want to find players who hit top-24 status in their first three years? Three years is a long time to hold onto a player who isn’t producing, so it makes sense to check that range, right?
Turns out its the exact same thing, except the model is even more certain of their values than it was before. Great!
Let’s see if this model is any better at predicting failures and successes:
This one has even less of an accuracy rate, but it did correctly predict more hits. Again, this model is also better used for filtering out the bad options rather than as the word of god.
I’m sure the burning question is, so what does that mean for the rookie class? Well, it means that these are the players the model prefers and the chance of success it’s giving to all of them:
Top 24 Repeaters
Top 24 In First Three Years
Note that the model can’t predict profiles that are missing any of the three major variables. So Reuben Foster, with no combine testing or pro-day, is not in here.
The model doesn’t discriminate production between small school and big school like the analyst (me) does, so understanding that off the bat is helpful. Beyond that, i’m going to let the results sit before I delve into them on another post.
This model is not the end of it all though, so if you’re planning to go out and buy Dylan Cole, you may want to wait on that.