This site is supported by the advertisements on it, please disable your AdBlocker so we can continue to provide you with the quality content you expect.
  1. Follow us on Twitter @buckeyeplanet and @bp_recruiting, like us on Facebook! Enjoy a post or article, recommend it to others! BP is only as strong as its community, and we only promote by word of mouth, so share away!
    Dismiss Notice
  2. Consider registering! Fewer and higher quality ads, no emails you don't want, access to all the forums, download game torrents, private messages, polls, Sportsbook, etc. Even if you just want to lurk, there are a lot of good reasons to register!
    Dismiss Notice

Official Statistical Analysis Thread

Discussion in 'Buckeye Football' started by DaddyBigBucks, Aug 7, 2007.

  1. utahosu63

    utahosu63 Quicksilver


    Those stats rock I wil be spending the rest of the day trying to memorize them so I can look somewhat intelligent at my next BBQ

    Thanks love the site
  2. Jaxbuck

    Jaxbuck I hate tsun ‘18 Fantasy Baseball Champ

    Great work as always.

    Rgarding TO's, their correlation to win % and our digression in that area I would offer that they are misleading because so much random luck is involved in the finished product of an actual TO. In 2003 the team caused an ass load of fumbles but just couldn't ever seem to get the bounces to go their way and land on them.

    I wish they would track TO opportunities and actual TO's. I would then venture the team that consistently creates opportunities will have, over the long term, a greater chance to convert them into actual TO's (duh) and you could then maybe get truer correlation to win% from opportunities vs TO's. Dunno for sure, just a thought.

    They probably already do this but your post seemd like a good time to discuss it.


    I would also say the correlation between TO's and win % is off because TO's are such a flawed stat, much like BA in baseball. For example last minute chuck ups to the end zone before the end of halfs get "intercepted" all the time, if you want to show the TO for what people assume it to be(a critical change of possesion) then you have to find a way to weed out the ones that don't really "matter".

    Also it does a defense no good to give them credit for an unforced TO (from the stats POV). It doesn't do you any good to draw conclusions from flawed data and taking credit for forcing a TO when some putz just drops the ball is a flawed assumption.

    The can of worms that opens is how you count opportunities, forced vs unforced, critical vs non important etc..It would become very similar to the official scoring of errors in baseball. Too subjective and varied from venue to venue and scorer to scorer.

    A coaching staff would have to keep track themselves from the game film and have a uniform system in place as to what was what. We as fans would not be able to track this.
    Last edited: Aug 10, 2007
  3. palmbuck

    palmbuck Newbie

    Very interesting!
    But I have to ask a question: Is it possible to have a correlation of 1.0 with NO causality whatsoever?
    I was under the impression that it was and you now have me unsure.
  4. DaddyBigBucks

    DaddyBigBucks Still Calculating Buckeye DSC... Staff Member Bookie

    It is absolutely possible to have a correlation of 1.0 with NO causality. Think about what happens when the power goes out. If two lights share a circuit breaker, you could track every incidence of the lights going out spontaneously and you might very well end up with a 1.0 correlation as long as neither of the bulbs burned out during your study (light bulbs can last a very long time these days). Every time one light goes out spontaneously, so does the other, so the correlation is 1.0. But one light going out does not cause the other to go out.

    So when two things share a cause, they can have a high level of correlation without there being a causal relationship BETWEEN them. Some might suggest that this is the case between scoring defense and winning percentage: that they are both caused by preventing the other team from scoring. My response to that would be that scoring defense IS preventing the other team from scoring.

    I guess it all depends on the definition of the word IS.
  5. palmbuck

    palmbuck Newbie

    OK, very good example.
    But what if the occurances don't share a cause. Isn't it possible that the correlation between them could be 1?
    For example, with the trillions of occurances in the universe, isn't it probable that some of them will perfectly correlate, even though there is absolutely no connection between them? Wouldn't a few of the trillions of random occurances perfectly correlate, even though they may exist in different galaxies?
    It seems to me there would be some, but I would be interested in what you think. There is probably a mathematical proof of the question, but I wouldn't know where to find it.
  6. DaddyBigBucks

    DaddyBigBucks Still Calculating Buckeye DSC... Staff Member Bookie

    Definitely possible, but the probability becomes vanishingly small as the amount of data grows.

    For the amount of data shown here, there is just no way that a correlation of -0.93 is a matter of random chance. I realize that you're not suggesting that; but I think it's a worthwhile example.

    I think I better come up with some more stats to post before we start getting into Kant, Hegel, Nietzche, et. al.
  7. Jaxbuck

    Jaxbuck I hate tsun ‘18 Fantasy Baseball Champ

    Kant wasn't a team player, Hegel couldn't get his timing down and Nietzche was a cancer in the clubhouse.
    daddyphatsacs likes this.
  8. palmbuck

    palmbuck Newbie

    Yeah, I think I remember that Nietzche guy; IIRC, he was a linebacker for Illinois and Green Bay. I bet he would agree that defense wins more often than offense.
    Those other two guys must have been before my time.:wink2:
    Good Post!
    Fungo Squiggly and DaddyBigBucks like this.
  9. DaddyBigBucks

    DaddyBigBucks Still Calculating Buckeye DSC... Staff Member Bookie

    Thanks for creating this for me. I often agonize over which thread to place something in, probably more than I should. This will be awfully convenient.

    I have now moved a post I made in Mili's "Winning %" thread over here because I think some of the data in it illuminates points made by me and others in this thread. I have also moved the posts related to mine as they would have seemed awfully out-of-place in their former thread without my post there.

    While not the guy I meant, (I know you knew that), there are a ton of youngsters here that don't know that the man pictured below is the man to whom you refer.


    Every football fan should know the name Ray Nitschke.
  10. DaddyBigBucks

    DaddyBigBucks Still Calculating Buckeye DSC... Staff Member Bookie

    In a post above I showed how different statistics relate to winning percentage. The most obvious point illuminated in that post is that, of those 18 statistics, the ones most closely associated with winning are Scoring Defense and Scoring Offense, in that order.

    But this made me wonder: How do the 16 other statistics relate to Scoring Offense and Defense respectively? Afterall, I now have all the data for every I-A team for the last 6 years; that's one hell of a sample size. So the data given will reliable. And the new calculations would only take a few minutes.

    But what a gold mine. I am very interested on hearing the opinions of some of the football gurus on this board about what some of this data means. I'll offer some of my own interpretations; but with any luck, this will be far from the last word on the subject.

    The Data

    The table below shows:

    Column 1: The 18 stats in descending order of their correlation to winning percentage
    Column 2: Their correlation to Scoring Offense
    Column 3: Their correlation to Scoring Defense

    Category____________________corr. SO_______corr. SD
    Scoring Defense_____________0.5781__________1.0000
    Scoring Offense_____________1.0000__________0.5781
    Pass Efficiency Defense_____0.5336__________0.8849
    Rushing Defense_____________0.5871__________0.8650
    Total Defense_______________0.4504__________0.9088
    Pass Efficiency_____________0.8338__________0.4989
    Turnover Margin_____________0.5832__________0.6639
    Total Offense_______________0.8819__________0.3639
    Net Punting_________________0.2323__________0.5330
    Punt Return Defense_________0.2694__________0.5423
    Rushing Offense_____________0.4969__________0.4782
    Punt Returns________________0.3954__________0.3566
    Kickoff Return Defense______0.3413__________0.3280
    Pass Offense________________0.4914__________0.0550
    Pass Defense________________0.0547__________0.5243
    Kickoff Returns_____________0.2547__________0.1746
    Penalty Yards______________-0.1608__________0.0324


    First off, one of the many checks to make sure the math was right was verifying that yes, Scoring Defense has a 1.0 correlation to itself, and so does Scoring Offense. Likewise, it is noteworthy, but just barely, that there is a 0.5781 correlation between the two. Very good teams tend to be good in both areas. Very bad teams tend to suck at both. The teams in the middle are mixed. Whatever.

    Further, it is not terribly surprising that Total Defense has the highest correlation to Scoring Defense and Total Offense has the highest correlation of any of the stats to Scoring Offense. If anything, this only serves to give me confidence in the data.


    Now this starts to get fun. Lets take another look at the rushing part of the table by themselves.

    Category____________________corr. SO_______corr. SD
    Scoring Defense_____________0.5781__________1.0000
    Scoring Offense_____________1.0000__________0.5781
    Rushing Defense_____________0.5871__________0.8650
    Rushing Offense_____________0.4969__________0.4782

    First off, look at Rushing Defense's correlation to Scoring Offense. Is that cool or what? There are only two stats more closely correlated to scoring offense than is Rushing DEFENSE, but Rushing OFFENSE is NOT one of them!!

    To me, the reason for this seems simple and I'll call it "The Blow-Out Effect". When you have an extremely high-powered offense, the other team typically ends up throwing the ball all over the joint trying to catch up. Not only does this drag down the # of carries per game, but even drags down the yards per carry as the number of sacks goes up (due to the NCAA's perverse habit of subtracting sacks from rushing yards). The Rushing Defense ends up with better numbers, but may not necessarily BE better.

    But this phenomenon seems to exacerbate the problem of the already unexpectedly low correlation between Rushing Offense and Scoring Offense. When you're getting killed, you tend to not rush as much; when you're blowing the other team out, you rush more so as to kill the clock. Shouldn't that INFLATE rushing (per game) numbers? Why on earth is Rushing Offense SIXTH on the list of things that are most correlated to Scoring Offense, and only barely above Passing Offense at that? Better football minds than mine will have to sort that out. All I can tell you is that I have checked and rechecked the data and the calculations.

    What is less mysterious but still up for conjecture is why Rushing Offense has only a slightly better correlation to scoring offense than to scoring defense. Some might suggest that the offense that is good at rushing is keeping the ball away from the opposition. Personally, I have always been suspicious of this reasoning. When you count the number of possessions per game, each team almost always gets between 9 (extreme Tressel-ball) and 14 (WAC ball), with the number of possessions seldom differing by more than 1. That comes out to about an equal number of chances to score.

    But maybe there is something to the idea of letting the other team's qb get "cold". I have always admitted that this is a factor, but have wondered just how much.

    On the other hand, isn't it true that old-school coaches who focus on Rushing Offense also tend to appreciate the importance of a stout D? And isn't it also true that WAC-a-doodle, pass-happy coaches often focus on the offense to the detriment of the defense? It seems to me that this might explain the data more than anything. Let's call this "The Coaching Effect" for future reference.


    The questions don't stop there. Here are the data for passing, taken by themselves:

    Category____________________corr. SO_______corr. SD
    Scoring Defense_____________0.5781__________1.0000
    Scoring Offense_____________1.0000__________0.5781
    Pass Efficiency Defense_____0.5336__________0.8849
    Pass Efficiency_____________0.8338__________0.4989
    Pass Offense________________0.4914__________0.0550
    Pass Defense________________0.0547__________0.5243

    What makes these numbers most interesting is when you compare them to the numbers for rushing statistics. While Rush Defense has a much higher effect on Scoring Defense than does Pass Defense (yardage), Pass EFFICIENCY Defense BEATS Rushing Defense with its correlation to Scoring Defense. This, I would expect, is also due to "The Blow-Out Effect": You know the pass is coming, so the other team's QB is throwing the ball while running for his life into a defensive backfield that is less concerned with run support than they otherwise might be. This also explains Pass Efficiency Defense's correlation to Scoring Offense, though it is notable that Rushing Defense's correlation to scoring is higher still.

    Similarly explainable, though with less certainty is the looseness of the connection between passing yardage and scoring. Pass offense's relatively low correlation to scoring offense, and likewise pass defense's loose association with scoring defense both seem attributable to "The Blow-Out Effect". You might throw less when you've scored a lot; and the other team will throw more when they're have a hard time scoring, because they're probably playing from behind for much of the game.

    But look at the other side of that coin; and better yet, compare it to the situation with rushing offense and defense. Passing offense has seemingly NO CONNECTION AT ALL with scoring defense, and Passing defense likewise has NO CONNECTION AT ALL with scoring offense. Recall that this was clearly not the case with the rushing numbers, as each rushing category has a moderate positive correlation with the opposite scoring category.

    It seems to me that both "The Blow-Out Effect" and "The Coaching Effect" would apply negative pressure to both of these correlations, and therefore the number I would expect would be moderately negative. I guess it is possible that very good teams being adept in all areas and very bad teams being uniformly awful might cancel that out though. It is also possible that Passing simply has no "Cross-Correlation" effect the way that rushing does.

    For those of you who are engineers and are therefore familiar with a very different definition for "Cross-Correlation", spare me. No one who isn't already familiar with it wants to know the "classical" definition of that term.


    Turnover Margin, it turns out, has a fairly significant impact on Scoring Offense, and an even greater impact on Scoring Defense. I wonder if the greater (effect?) on defense is caused by the shift in momentum attendant to turnovers ("sudden change" is the generalized term used by tOSU coaching staff).

    Special Teams

    Net Punting and Punt Return Defense both have a moderate impact on defense as you would expect. There effect on the field position battle would seem to explain their small but clear effect on scoring offense. What is interesting about these two is that Net Punting has a higher correlation to winning percentage, but Punt Return Defense has a higher correlation to both Scoring Offense and Scoring Defense. As the difference is not great, it isn't clear if there is anything to be gained by employing inductive reasoning to fathom the cause.

    It is notable that the effect Kick Return Defense has on the field position battle is seen more in Scoring Offense than in Scoring Defense, while the reverse was true for Punt Return Defense. It seems to me that this is attributable to the fact that a good Kick Return Defense will leave the opponent pinned deep in its own territory every time (punt defense only sometimes), ultimately resulting in the offense getting good field position; bad Kick Return Defense results in average field position much of the time, but bad Punt Return Defense can result in disastrous field position to your defense much of the time.


    Finally, there is the matter of the zebras. It is fascinating to me that penalties and penalty yards seem to have no effect whatsoever on defense. But look at the effect on offense. Bear in mind that this is the ranking for scoring offense correlated to the ranking for fewest-penalties and fewest-penalty-yards. So the lower number is better for both, thus resulting in an expectation of a positive correlation. But there is a small but measurable negative correlation. With this large a set of data, this correlation can be taken to be quite real, even if it is perplexing.

    My first thought was that this small correlation is a result in there being a difference between the way games are officiated from conference to conference in conjunction with the fact that there are vast differences in the average offensive output of each conference. But wouldn't that then show an effect on the defense as well? Better football minds than mine will have to hash that one out too.

    What seems very clear from this though is that the officials affect the game far more on calls regarding possession than on called (or not) penalties.
    Last edited: Aug 11, 2007
    BB73, LordJeffBuck and schwab like this.
  11. schwab

    schwab Donkey Punch Ann Arbor

  12. lvbuckeye

    lvbuckeye Silver Surfer

    holy crap... i think i'm gonna have to read that about six more times before all the nuances really sink in... absolutely AMAZING work, DBB...
  13. BB73

    BB73 Loves Buckeye History Staff Member Bookie '16 & '17 Upset Contest Winner

    This is great stuff, DBB!

    It seems logical that total offense is more important than either rushing or passing offense, as balance is needed for real success.

    It's reassuring to see that pass efficiency, on both offense and defense, is more important that passing yards gained or allowed. Too often analysts look only at passing yardage, rather than efficiency.

    I am somewhat surprised that rushing defense doesn't have a higher correlation to winning.
  14. DaddyBigBucks

    DaddyBigBucks Still Calculating Buckeye DSC... Staff Member Bookie

    Agreed on all counts. I was astonished that rushing defense was not more closely associated to winning. Your point about efficiency is well taken too. Efficiency turns out to be one of the most important stats; whereas passing yards, both offensively and defensively, are near the bottom of the list.


    The thing that fascinates me the most is the moderate "cross-correlation" of rushing and the complete absence of it for passing (yardage):
    • While Rushing Offense is moderately correlated to Scoring Defense...
    • and Rushing Defense is moderately correlated to Scoring Offense...
    • Passing Offense has practically zero correlation to Scoring Defense...
    • and Passing Defense has practically zero correlation to Scoring Offense
    Nevertheless, Pass Efficiency, and its defensive counterpart both display moderate "cross-correlation", even more so than Rushing.

    FWIW: I had one of the admins change the thread title for me. There are quite a few people here who have posted great stat analysis in the past (BB73, DiHard, Jax', and many others). With input from those I've mentioned and others that I will be embarrassed that I've forgotten for the moment; this thread has a chance to be one of the best on the 'Planet.
    Last edited: Aug 12, 2007
  15. DaddyBigBucks

    DaddyBigBucks Still Calculating Buckeye DSC... Staff Member Bookie

    Top to Bottom:

    First, let me repeat something that I said off-hand in another thread. If anyone wants to get the raw Excel files for these stats, let me know which post has the data you're interested in. If your only purpose is to ensure that I'm not making this stuff up, that's fine. If on the other hand you want to see how the calculations work, make a point of that in your request. Some of the spreadsheets were "dumbed down" in order to make sorting easier and will require some modifications to make them useful for you.


    While I was thinking about "The 18 stats" and their correlation to winning percentage, I started to wonder if some of them don't matter a lot more at the top of the standings than they do at the bottom. In other words: Which stats make the cream rise to the top? Conversely, which really separate the teams at the bottom but matter less at the top?

    It turns out that the answers to these questions may be the most interesting that I've stumbled across.

    To get these answers, I did the same calculations as before; but on subsets of Div. IA instead of on every team. I calculated how "The 18 stats" (plus the composite of them to make 19) correlated to winning % for:
    • The top 30 teams (Top Quartile)
    • The top 59 teams (Top Half)
    • The bottom 60 teams (Bottom Half)
    • The bottom 30 teams (Bottom Quartile)
    Here are the highlights:


    The table below shows:

    Column 1: Defensive Statistical Categories
    Column 2: Correlation of each stat to winning % for the Top 30 teams
    Column 3: Correlation of each stat to winning % for the Bottom 30 teams

    Defensive Stat__________Top 30__________Bottom 30
    Scoring D________________0.650______________0.486
    Total D__________________0.675______________0.004
    Pass Eff. D______________0.624______________0.352
    Rushing D________________0.647______________0.249
    Passing D________________0.354_____________-0.151

    • Defense makes MUCH more difference at the top than at the bottom.
    • Total Defense does not differentiate teams at the bottom, but is about as important as Scoring Defense at the top.
    • While Pass Efficiency D. has a higher correlation to winning than does Rushing D. among all Div. IA teams (and among the bottom 30); Rushing D. is more important (slightly) at the top.
    • The negative correlation for Passing D. (YPG) at the bottom indicates that the teams with better passing defense (of those in the bottom 30) tend to be ranked lower than those with worse passing defenses. This is probably because the worst of the worst get blown out so often that more of their opponents spend more of the games just running the clock.

    The table below shows:

    Column 1: Offensive Statistical Categories
    Column 2: Correlation of each stat to winning % for the Top 30 teams
    Column 3: Correlation of each stat to winning % for the Bottom 30 teams

    Defensive Stat__________Top 30__________Bottom 30
    Scoring O________________0.305______________0.706
    Total O__________________0.087______________0.467
    Pass Eff. O______________0.483______________0.581
    Rushing O________________0.204______________0.282
    Passing O________________0.076______________0.396

    • This is a mirror image of the results for defense. Offense makes MUCH more difference at the bottom than at the top; probably because bad defense at the bottom is a given.
    • Total Offense does not differentiate teams at the top!
    • Unlike the situation with defense, pass efficiency maintains a higher correlation to winning relative to rushing defense both at the top and at the bottom.
    Separating Wheat from Chaff

    The following stats were more important in the top 1/4 in the top 1/2; showing that the higher the level of competition, the more important they are:
    • Scoring Defense
    • Rushing Defense
    • Pass Defense
    • Total Defense
    • Punt Return Defense
    • Net Punting
    • Rushing Offense
    • Kickoff Returns
    One final observation that didn't fit anywhere else: Net Punting was the 3rd most important stat to the bottom 30. It was not nearly that important to any other quartile, and its overall importance is 9th.

    More to follow, I have to run....
    Last edited: Aug 14, 2007

Share This Page