Wednesday, January 30, 2013
Q&A: New Cubs 'saberist' Tom Tango
By Jon Greenberg
On Monday, the advanced statistics-loving community of Chicago Cubs fans got a welcome surprise when noted "saberist" Tom Tango announced he was working exclusively for the Cubs. In a post titled Psst…wanna work for the Cubs (and with me)?" Tango noted the Cubs were looking for a Director of Research of Development in Baseball Operations and dropped that "I am now providing my consulting services exclusively the Cubs."
The Cubs confirmed the hire to ESPNChicago.
The move from writer to team employee is not uncommon in the baseball world (baseball writer Kevin Goldstein joined the Houston Astros as pro scouting coordinator recently).
Tango (a pseudonym) is Canadian-born and lives in New Jersey. He has consulted for several Major League teams and has written for ESPN.com, the Hardball Times and many other outlets, including his popular blog. He created the very popular statistics wOBA (Weighted On-Base Average) and FIP (Fielding Independent Pitching), and co-authored "The Book: Playing the Percentages in Baseball."
He's kind of a big deal.
As someone who wants to learn about advanced statistics, I thought it would be interesting to touch base with the Cubs' not-so secret weapon.
So I e-mailed him (three times) some questions about his role (which he mostly ignored for contractual reasons) and for the blissfully ignorant, some questions about his work and the role of advanced stats in baseball. There has been slight editing for grammar and spelling, but I kept my lame questions intact and in order.
You're Internet famous and front office famous, but most fans don't know Tom Tango from Jose Lambada. How did you get your start in analytics? Why devote so much time to this?
Tom Tango: I've been fascinated with numbers since as far as I can remember. I started following MLB, NHL, WHA, and CFL in the late 1970s, I read an article about Pete Palmer's Linear Weights in a Baseball Digest when I was a teenager, read his The Hidden Game, absorbed by Bill James' Baseball Abstracts, played plenty of rotisserie/fantasy games in my 20s. Throw in majoring in computer science, with plenty of probability and stats classes, and, well, I'm lucky that I've been able to intersect the things I like into a vocation.
I know you've done analysis for other teams, and you obviously aren't the first analyst to sign an exclusive deal with a team, but in Chicago, this is new ground (for Chicago teams I mean). Can you talk about what the exclusivity means? What kind of work are you doing for the Cubs?
TT: I can't provide any consulting services to any other MLB team, and I'm limited to what I can do on my blog. Because of my (non-disclosure agreement), I don't discuss any of the particulars of my work.
Theo Epstein talked about how, and I can't think of the phrase here, there are fewer places for teams to find an edge anymore. For example, the reporters who didn't read "Moneyball" but hate it anyway, always bring up the A's old obsession with OBP, etc. Is pitcher health one to examine? I know the Cubs are pouring money into video analysis, etc. Is health the great unknown for teams right now?
TT: It does seem as if we were as fascinated with Stephen Strasburg's performance as we were with his usage last year. I was anyway. I think Will Carroll will tell you that there's a tremendous amount of injury time for pitchers every year, which means a tremendous amount of money in the dugout. Keeping the money on the field is a win-win-win for the team, the player, and the fans.
How about defense? Everyone uses UZR or (range factor) to try and sound smart, and yet when the Gold Gloves come out, it's the same guys who won last year. I think Greg Maddux won again in 2012. How can one judge defense, or better yet, predict defensive improvement in players?
TT: The issue with the fielding stats that we have now is that we have to infer a lot simply because we aren't recording enough. You'd rather record the fielder's positioning rather than infer it. You'd rather know how many hops a ball takes to get to the shortstop rather than infer it. Basically, all the things we see and we know and we take for granted as a baseball fan isn't being recorded. Even something as simple as hangtime took forever to finally get recorded. You and I know looking at a seven-second lazy flyball is going to be caught by every outfielder in MLB, and is therefore noise. But, if the systems aren't being told that it was a seven-second flyball, it tries to guess based on other parameters on its difficulty, and therefore might suggest it had a 90 percent of being caught rather than 99.9 percent. Instead of that data being treated as noise, the fielding system treats it as valid useful data.
But, just because a metric has bias or noise doesn't mean we should discard it altogether. We need SOMETHING. As long as the bias and noise isn't too extensive, then something is better than nothing.
I'm on Twitter too much so I know how Cubs bloggers and many, many fans are really in love with Theo's new regime and the approach they're taking to rebuild the team. But a lot of fans, and reporters, are still pretty unsure about the acronym-heavy world of baseball analysis. I try, but everything blends together. I'm not a numbers person. (At his introductory press conference a reporter asked Theo, "What's the deal with that computer?" Which I thought was hilarious.) Can you inform the less-informed among us, I mean them, about some of the more important stats to look at when doing amateur analysis. You created the FIP (Fielding Independent Pitching) stat, for instance.
TT: FIP is great because it is such a simple construction, but it tells us so much about the pitcher. We don't have a problem with assigning different weights to each hit to come up with a slugging average for example. And that's what FIP is. FIP limits itself to looking at strikeouts, walks, and home runs, gives each a specific weight, and scales to ERA.
I don't know that I'd tout other stats like that one, but the one thing that is most important is to make sure you understand the context of the stat. Don't just look at RBIs, look also at how many runners are on base. And not just how many runners, but where are they on the bases. And not just where, but how many outs, especially for runner on third. Once you realize there's so much bias in a player's RBI totals you learn to move away from it, and focus on the skills that lead to RBIs.
If the Cubs win a World Series, will you reveal your identity and ride a float downtown with Steve Bartman? (Editor's note: This was a joke question he answered seriously. Which is cool.)
TT: You know, when I introduced Leverage Index to a larger audience, I did it the day after Game 6.
And what was clear then was that Bartman had very little effect to the game. There were three clear plays that had a far bigger impact: (Alex) Gonzalez flubbing a potential double play and getting no outs, (Derrek) Lee doubling as (Mark) Prior's last batter, and (Mike) Mordecai doubling as (Kyle) Farnsworth's last batter. It's easier to focus on the unusual play, but the focus should have been on the substantive plays.
Explain to regular fans, not the diehard saberists or the amateur FanGraphs junkies, why they should learn wOBA or FIP, and what they can gain from knowing these stats that you created? I think there's still a barrier for a lot of people, not to mention reporters, because the terms are so unfamiliar.
TT: What these metrics try to do is focus in on something specific, a subset of a player's performance, and present that as a number. OBP does that, by focusing in on a player's performance in reaching base, and so, treats a walk the same as a home run. SLG does that by focusing on a player to generate bases with his bat, and so, treat a HR as four times a single, while ignoring walks and steals.
So, with FIP, the focus is on those things a pitcher does that doesn't involve his fielders (outside of an occasional Mike Trout-stealing HR). Therefore, we want to focus on walks, hit batters, strikeouts, and HR. It's nice to see that Felix walks 7 percent of his batters and strikes out 23 percent and he doesn't give up many home runs. But, we'd like to express all those different aspects as a single number, much like OBP and SLG are single numbers. The question therefore becomes how to combine, how to weight, walks and strikeouts and HR. It's not as obvious as OBP and SLG. So, the FIP metric is suggesting that we need to weight the HR the most (13), the walk and hit batter next (3) and the strikeout the least (and in negative, at -2).
If we just weight it like that, we'll get a single number that properly encapsulates his non-fielder performance. But we still have a problem with the scale, since the number returned will be meaningless. We therefore turn it into an ERA-scaled number.
The fascinating part is that once you do that, and you look at a pitcher's career, his FIP number and his ERA are extremely close. You can go to Fangraphs, and look at Pedro (Martinez), or (Greg) Maddux or RJ (Randy Johnson) or (Roger) Clemens, and you'll be shocked how close their career ERA is to their career FIP.
Remember, we completely ignored every single ball in play. We ignored how a pitcher pitches with men on base or bases empty. We simply zeroed-in on one aspect of his performance: how does he do during those plate appearances that don't involve his fielders, which is some 20-35 percent of all his plate appearances. And we end up explaining a great deal of his results.
The biggest exception we came across over long careers are Tom Glavine and Javier Vazquez. Glavine ended up with results better than FIP suggested, while Javy ended up with worse overall results.
wOBA does the same thing in terms of trying to combine OBP and SLG into a single number, without the clumsiness and bad math of OPS.
How excited should Cubs fans be about the organization now? I think fans are more optimistic than most would be coming off a 101-loss season, because of Theo's reputation.
TT: Every team always tries to make the best moves possible, given the constraints at hand. The issue is always how to best value each piece, each move, for both the short- and long-term. And Cubs fans should feel confident that the front office has and will have the best people in place to make those decisions.
A Wrigley myth type question: Do you buy the day games affects performances one way or another?
TT: People are human, and that means that every variable you introduce has some effect, to one degree or other. The answer is never yes/no, but always "to what degree" is the effect, and if there's a certain player or groups of players more affected or not. And that's really what my job is all about.