Thursday, June 3, 2010

6.03.2010 Let us take a brief diversion into baseball

The recent spate of perfect and almost perfect games led Dave Bug to wonder:



Well, luckily, baseball people are fanatic datahogs, meaning that public sites like Baseball Reference have stacks of data for everything we could possibly want.

First things first, the number of perfect games during a single season doesn't tell us all that much, because the number of games per season has changed over time. In 1964, when Jim Bunning threw a perfect game against the Mets, a total of 1620 games were played over the course of the season. When Cy Young threw his perfect game in 1904, a total of 1220 games were played. In 1880, which saw two perfect games, a mere 332 games were played*.

So, obviously, what is important in determining the odds of a perfect game in a season is the total number of chances for the event to occur. Since 1876, there have been a total of 188,921 baseball games played (as of 6.2.2010 at around 11pm at night) with 20 perfect games divided up between them, giving us an 0.0106% chance that on any single game, 27 hitters will not make it to a base as a result of hits, walks, or errors (though for Kip Wells, that number is probably smaller). Or, roughly, one for every 9,447 games. Alternately, we can look at the number of perfect games per total games played per season. Currently, we are at .002551 perfect games/total games played (hereafter PG/TG) in 2010. Since 1876, the average PG/TG is .000134. The standard deviation of the PG/TG ratio is .0005887.

This means that the current number of perfect games is about a 4 standard deviation event (assuming, of course, that perfect games are an evenly distributed random occurrence). Cy Young's perfect game was a 1 standard deviation event. And 1880, when both Lee Richmond and Monte Ward pitched perfect games? Well, that year was 10 standard deviations away from the expected.

As a side note, for just plain old no hitters, there have been a total of 258, or a 0.1366% per game. The No Hitter/Total Games average is .0016969, with a standard deviation of 0.002268, meaning that this year's three no hitters fall just under one standard deviation of the expected.

update: Roger Lowenstein ruminates with far greater skill than I on outliers in financial markets and baseball.



* I compiled these stats by adding up the number of wins for each team in each league (National, American, and the short lived Federal League).