Stats & Info: Stat Week

Stat Week: Defensive Storylines To Watch

March, 24, 2010
With a widely-increased focus on defense this offseason, let’s take a look at the most compelling defensive subplots heading into the 2010 season:

1. Can Seattle top 2009?

It's widely believed, by those who study the numbers, that the Seattle Mariners are the best defensive team in baseball, but by how much? At Baseball Info Solutions, we’ve estimated Defensive Runs Saved (as introduced in The Fielding Bible – Volume II) for the 2010 season.

A player’s Runs Saved value indicates how many runs a player saved or hurt his team in the field compared to the average player at his position, combining eight different aspects of defense. A player near zero Runs Saved is about average; a positive number of runs saved indicates above-average defense, below-average fielders post negative Runs Saved totals.

We took each player’s defensive performance over the past three years and prorated their performance based on the number of innings each is projected to play this year. Our projections expect Seattle to nearly match their 110 Runs Saved from 2009 and to dwarf the second-best Phillies, 2008’s best defensive team.

As Dave Cameron mentioned in a post earlier this month, the Mariners are experimenting with a Chone Figgins/Jose Lopez position swap. While Figgins rates highly at third and Lopez is roughly average at second, the Mariners coaching staff will have to assess whether the shift is a net positive for the team.

2. How much has Boston improved?

The Boston Red Sox have gotten a lot of attention this offseason for signing Adrian Beltre, Marco Scutaro, and Mike Cameron while moving incumbent centerfielder Jacoby Ellsbury to left field and letting free agent Jason Bay sign elsewhere. We’re projecting their defense to improve 87 runs over last year’s total of -52 runs saved, largely as a result of their offseason transactions.

3. Which Mariner will be the most valuable defender in 2010?

Franklin Gutierrez led baseball with 32 Runs Saved in center field in 2009, though new Mariners teammates Chone Figgins (31 Runs Saved) and Jack Wilson (27 Runs Saved) finished close behind. We project Gutierrez to hold his crown, but Wilson and 2008 champion Chase Utley should give him a run for his money.

4. Are the Astros this year’s version of the Mariners?

With the out-of-position Miguel Tejada at short and the combination of two “Jeffs” (Geoff Blum and Jeff Keppinger) at third, the Astros’ left-side was porous in 2009. Free agent signee Pedro Feliz and highly-reputed rookie Tommy Manzella will join Michael Bourn and Hunter Pence on a much-improved Houston defense.

Even assuming the unproven Manzella is merely an average defensive shortstop, we project the Astros to improve 39 Runs Saved this season- almost four full wins due to defensive upgrades.

5. Will Albert Pujols win his fifth consecutive Fielding Bible Award?

The Fielding Bible Awards are voted on by a panel of 10 experts in the industry, including Bill James, John Dewan, and's Rob Neyer. Each November 1st, we announce the results of the balloting and declare the single best major league defender at each position. Since the inaugural Fielding Bible Awards in 2006, Albert Pujols has been voted the best first baseman every single year. Is there anything this guy can’t do?

Stat Week: Another look at "quality" starts

March, 23, 2010
Baseball Tonight continues its look at statistical analysis by looking at pitching evaluation methods. This piece takes a closer look at evaluating starting pitching

A few years ago, ESPN baseball columnist Rob Neyer wrote a piece about why the quality start is actually a quality statistic. The key argument in his article is that in 2005 there were 2,447 quality starts in the majors, and a team won 67.4% of those starts.

This got us thinking…what if we could somehow predict the winning percentage of ANY team given ANY starting pitching line. This seems like it would improve the quality start metric, as it would give a better indication of how a pitcher impacted his team’s chance of winning. Others, such as Bill James, have devised methods to do something similar (you may be familiar with "Game Score," listed in box scores). We tried another approach.

First, we compiled the starting pitching lines from every starter for the last five seasons. There are approximately 2430 MLB games per season, two starters per game, for five seasons (2005-2009). That's 24,300 observations.

We then used a statistical technique known as binary, or logistic, regression to predict a team's probability of winning based on the starter’s pitching line. Essentially, we plug basic stats from a box score (such as innings pitched, earned runs, walks, strikeouts) into a mathematical model and the model then spits out the team’s chance of winning the game based on those stats.

In order to simplify the model and keep it as close to the current criteria of a quality start, we used only innings pitched and earned runs as variable for the regression. It’s also worth noting that those two stats – IP and ER – also had the largest statistical impact on a team’s win probability.

Let’s get to the data: Here’s the predicted team winning percentage for a few different combinations of IP and ER by the starting pitcher:

Pitcher A: 6 IP, 3 ER Team win pct = 49.6%
Pitcher B: 7 IP, 3 ER Team win pct = 55.0%
Pitcher C: 9 IP, 4 ER Team win pct = 54.5%
Pitcher D: 6 IP, 2 ER Team win pct = 60.9%

What’s most interesting here is that we see at least one non-quality start stat line (9 IP, 4 ER) which gives the team a better chance of winning than the minimum quality start criteria of 6 IP and 3 ER.

As to the question of how we can possibly improve the existing quality start metric. Here is a quick example of how our model helps better judge which starting pitchers truly impacted their team’s chance of winning the game.

Using the current definition of quality starts, Roy Halladay and Doug Davis tied for 15th in the majors last season with 22 quality starts. We think most fans would agree that Halladay is arguably a better pitcher than Davis and likely helped his team win more games.

That’s where our new model can help.

If we set the our quality start threshold to any start where the starting pitcher gave his team at least a 75 percent chance of winning, Halladay had 13, nearly twice as many as the seven by Davis. Halladay’s 13 tied for sixth-most in the majors. Here’s a look at the top five from last season:
Now, let’s remember that this is just a start (no pun intended), as this regression model can certainly be improved by adding more variables and conducting further tests. Hopefully, though, this a good primer for a different way to judge starting pitching success, and will spark some interesting discussions in ballparks and bars across the country this season.

Stat Week: An Argument for OPS

March, 22, 2010
Throughout the week, Baseballl Tonight will be taking a closer look at the use of "non-traditional" statistics in baseball analysis, with "Stat Week" segments on its program. The Max Info will be providing further analysis on the topics discussed, as well as delve into additional subjects to enhance those discussed on the show.

OPS, defined as on-base percentage plus slugging percentage, has become a widely used term in baseball analysis. However, OPS is often criticized by both the traditionalists of baseball as well as sabermetricians. Traditionalists prefer to measure offensive worth with more established stats, such as batting average.

Sabermetricians feel OPS does not go deep enough. They prefer to use a more in-depth breakdown, such as replacement player analysis. Clearly, there is no right or wrong philosophy. Simplicity is always good, but so is validity. Something easy to understand works, but at what cost to the statistical significance?

Now the question must be addressed, does OPS really do a good job of evaluating offensive production?

It is fairly common practice (and deservedly so) to measure an offense by the number of runs they score. Therefore, a simple way to check which statistics work well in judging the value of an offense is to observe which statistics correlate well with runs scored.

Below is a table showing the correlation between frequently used baseball statistics with runs scored, using team data from 2000 to 2009.

In statistical terms, a correlation coefficient of +1.000 is a perfect positive linear relationship, a coefficient of -1.000 is a perfect negative linear relationship, and a coefficient of 0 means no linear relationship.

In other words, the closer a number is to one, the better the relationship between the two stats.

As shown above, OPS has the strongest relationship with runs scored by a team over the past 10 seasons among the statistics checked. In fact, in each year since 2002 the team that has scored the most runs in Major League Baseball has also had the highest OPS.

I am not suggesting that there is no use for more advanced statistical research in baseball, far from it. The intelligent front offices and sharp fantasy players who utilize cutting edge analysis will certainly have a leg up on their competitors.

The issue of simplicity vs statistical significance is not a problem specific to baseball. The answer most have come up with is known as Occam's razor, which states “entities must not be multiplied beyond necessity.”

In this case, if there is something uncomplicated that does just as good of a job explaining offense as a more complex one does, we should use the simpler stat.

While certainly a clever sabermetrician can (and surely has) found other replacement player or linear weight metrics that show a higher correlation to runs scored than OPS, the simplicity, as well as the utility, of OPS should be appreciated by all.

MLB, OPS, Stat Week