How to lie with statistics, academia edition: Here’s what your $40,000 a year is paying for

December 23rd, 2007 → 12:57 pm @ // No Comments

In yesterday’s New York Times, a pair of academics — Columbia professor of sociology Jonathan Cole and University of Chicago professor of statistics Stephen Stigler — published an article titled “More Juice, Less Punch,” which aimed to ask the question: “Do [PEDs] make a difference sufficient to be detected in the players’ performance records?” Their answer, not surprisingly, is no (otherwise there wouldn’t have been any point in publishing their story in the first place): “An examination of the data on the players featured in the Mitchell report suggests that in most cases the drugs had either little or a negative effect.”

I feel sorry for the students that are forced to sit through these boobs’ courses.

Cole and Stigler try to prove their point by comparing stats from before and after a given player is accused of using roids (or HGH, or whatever). They explain their methodology thusly: “For pitchers identified by the report, we looked at the annual earned run average for their major league careers. For hitters we examined batting averages, home runs and slugging percentages. We then compared each player’s yearly performance before and after he is accused of having started using performance-enhancing drugs. After excluding those with insufficient information for a comparison, we were left with 48 batters and 23 pitchers.” The results, they say, show no net gain in performance.

This in itself would seem to intuitively demonstrate that PEDs do, in fact, work – baseball players, like mathematicians and physicists – show a dramatic tail-off at a very young age (for the geeks, their best work is usually done in their 20s; for ballplayers, the peak years usually come between 28 and 32) and if players with extended careers don’t show any decline in performance, that would indicate an unusual pattern.

Anyone who had any slight degree of sophistication would also realize that it’s next to meaningless to compare raw data – you need to make sure you understand what the data you’re looking at actually means. In this case, that means realizing that comparing stats like ERA or home runs or OPS or anything else tells you much less about a player’s relative performance than ERA+ or OPS+. (OPS+ normalizes OPS for the park and the league the player played in; ERA+ shows the player’s ERA in relation to the league’s ERA. This explains why Pedro’s 1.74 ERA in 2000, when the league ERA was 5.07, earned him an ERA+ of 291, while Sandy Koufax’s 1.74 ERA in 1964, when the league ERA was 3.25, only garnered him an ERA+ of 187. It also helps show why Pedro’s 2000 season was arguably the best ever. It resulted in the highest ERA+ since 1880, and the second best ever. Koufax’s top season ranks as 56th.)

Let’s drill down a little more. Cole and Stigler write, “The Roger Clemens is a case in point: a great pitcher before 1998, a great (if increasingly fragile) pitcher after he is supposed to have received treatment. But when we compared Clemens’s E.R.A. through 1997 with his E.R.A. from 1998 on, it was worse by 0.32 in the later period.” As I pointed out last year, the salient point here is how Clemens performed in his late 30s compared to his mid 20s. In the 12 years from Clemens’ breakout year in 1986, when he was 23, he had an ERA+ above 180 twice; in the 10 years from age 35 to 44, he had two more. Compare that to other Hall of Fame pitchers from this era like Greg Maddux, who had four years with an ERA+ of 180 or higher before age 35 and none afterwards, or Tom Glavine, whose five best years all came before age 35. Heck, compare it to Tom Seaver, the guy who was voted into the Hall with the highest percentage ever: his six best years all came before age 34.

Cole and Stigler are just as ignorant when it comes to hitters. “What should not be overlooked,” they write, “is that Bonds’s profile is strikingly like Babe Ruth’s high performance level until near the end of his career, with one standout home run year — a year in which other players on other teams also exceeded their previous levels.”

Actually, what should not be overlooked is the fact that Bonds has put up an OPS+ of greater than 200 in three out of the last six years, compared with comperable numbers in three of his first 14 years in the bigs. Ruth also had an OPS+ higher than 200 in three of his final six years…and another eight in the previous 14. (Another thing that should not be overlooked: Bonds has played the majority of his career in a home ballpark that has a spacious right field, unlike Ruth, who got to hit in Yankee Stadium.)

I know it’s not a shocker than a pair of academics don’t really understand baseball; it has taken autodidacts like Bill James to help illuminate the game. What is shocking is how little Cole and Stigler — professors who not only deal with numbers but teach at elite institutions — seem to understand about analyzing data.


Post Categories: Jonathan Cole & New York Times & Statistics & Stephen Stigler & Steroids

2 Comments → “How to lie with statistics, academia edition: Here’s what your $40,000 a year is paying for”


  1. Gee

    16 years ago

    What is shocking is how little Cole and Stigler — professors who not only deal with numbers but teach at elite institutions — seem to understand about analyzing data.

    Yes. Their thesis is essentially “The Bell Curve: MLB Edition.”

    Reply

  2. mikeb

    16 years ago

    Seth, your wonderful dissection of the study’s problems aside, none of that matters. Without an accurate timeline of the subjects’ PED use — including duration, frequency and volume — and a true control group, it’s an exercise in nonsense. Anyone who teaches stats for a living should know that.

    Reply

Leave a Reply

%d bloggers like this: