From diamond to hardwood: lessons in data
March, 9, 2012
By Tom Haberstroh
Mike Ehrmann/Getty Images
Dirk throwing out a World Series first pitch isn't the only crossover between baseball and basketball.
What’s happening now in the NBA happened to the MLB about five years ago.
There’s no denying that the basketball world sits on the brink of a data explosion. A good chunk of the basketball community -- NBA execs, writers, students, casual observers -- who attended this past weekend’s MIT Sloan Sports Analytics Conference in Boston left with a feeling of fiery anticipation of what’s on the horizon.
Thanks to an invasion of super-fancy technology and tracking devices like STATS LLC’s SportVU, the basketball community is on the verge of something big and scary and wonderful. SportVU is a camera system being installed in NBA arenas that will track every movement on the basketball court. The ball, the 10 players, the referees, flying bats, everything. All of it will be digitally tracked and the results will be spit out in a report that features about a million data points per game.
These are exciting times in the sport and there’s no question that it can feel daunting as well. But we’re not alone in this journey. Because about a half-decade ago, a tidal wave of data plowed through another sport, baseball, and it’s never been the same since.
There’s a SportVU already in place for baseball and it’s called Pitch f/x. Back in 2007, a similar technology that is crashing the hardwood already hit the baseball diamond and it has altered the way analysts, teams, writers and fans digested the so-called national pastime. Since then, other products from the Pitch f/x company, Sportvision, have arrived on the scene. Pitch f/x tracks pitches, Hit f/x tracks batted balls, and Field f/x tracks the player movement on the field.
Stat geeks -- and I say that with the utmost respect for fellow numerically-slanted brethren -- pounced on the data and tirelessly crunched the numbers so we could make fun charts that you see in the mainstream today. Or instead of looking at batting average rankings, we can now glance at FanGraphs to see which starting pitcher’s slider has the most horizontal movement. And that’s just what’s out in public.
But the data wasn’t useful for just the nerds. Nowadays, in the palm of your hand, fans can follow a Dodgers-Giants game and learn just how fast Tim Lincecum threw his blazing 0-2 pitch to Matt Kemp, precisely how many inches it broke before it reached the plate and where it hit the catcher’s mitt. Not only that, you can watch an animated trajectory of the pitch within a few seconds after the pitch is released from Lincecum’s hand. All on your handy smartphone.
We don’t have anything like that in the NBA, but if you’re looking into the future of the basketball, take a glance at what’s going on in the MLB. Depending on who you ask, it appears that the baseball world, at least in sheer volume of data and what they’re doing with it, is about 5-10 years ahead of the basketball world. We’re catching up though, thanks to SportVU, Synergy Sports Technology and other tracking services.
So what can we learn from the baseball world?
1. Patience is a virtue
After talking to baseball folks in and around the game, it’s imperative that we preach patience. The revolution will not happen overnight. Having the data and being able to do something meaningful with it are two very different things. And it takes time.
Consider this. According to SportVU, each game produces about 800,000 data points for every game. There are 1,230 NBA games played in a full 82-game season (remember those?). Use the trusty multiplication function on your calculator and you’ll discover that we’re talking 984,000,000 data points in a regular season. Throw in the playoffs and we’re getting into the trillions. And you thought the box score had a lot of numbers.
The lesson is that there will be times early on where ambitious writers can find trends on the spreadsheet surface and do something about it. Take for instance Fangraphs and ESPN Insider writer Dave Cameron, who wrote to the Mariners pitching coach and asked him to let Felix Hernandez know that he throws too many fastballs early on, something he discovered playing around with the data. And it worked. Cameron, through Pitch f/x data, actually altered Hernandez’ pitch selection. Something similar could happen with say, Kevin Durant and his shot selection, but it’s going to take months and possibly years before we get to that place.
2. Computer geeks are the new market inefficiency
There’s a reason why Mike Zarren, the assistant general manager of the Boston Celtics, actually announced to the audience during the Basketball Analytics panel at Sloan that he was looking to hire someone who can build and manage a database from scratch. This really happened. As expected, a stampede of super-smart computer programmers and SQL experts rushed over to Zarren after the panel. Zarren survived. I think.
These quants are in demand. More of this will happen in the NBA and that wave has already happened in baseball (just look at the alumni list of Baseball Prospectus and Hardball Times stats guys – several are with teams now). When Pitch f/x fell into their laps, MLB clubs scooped up computer geeks faster than you can say, “Troy Tulowitzki.”
Because when you look at it, there’s a five step process that NBA teams will adopt in the coming years: Acquire the data, harness the data, analyze the data, translate the data, apply the data. Those last two steps might be the trickiest but the first three tasks will be the jobs of computer geeks. Sure, we could come up with tons of fun, but mostly trivial superlatives (who throws the fastest fastball? Which center jumps the highest for rebounds?) just by sorting a column in the spreadsheet. But the more important stuff comes when you have geophysicists trying to build a model that can detect how Jamie Moyer’s arm angle changes for off-speed pitches (the Rays actually did this very thing prepping for the World Series).
3. The myth of scouts vs. stats
With this data in hand, soon we’ll begin to answer questions like: Who’s the best shooter when given a foot of space to fire off his shot? Who tallies the most hockey assists in the game? Who is the most frequent dribbler across the league? Who’s the slowest baseline-to-baseline player in the game?
We could dabble in those questions from now until the end of time, but really, what can you do with that information? With data analysts, we can answer the “what” part of the question, but often times, the “why?” part is the one that matters. Sure, it could be helpful to know who scores the most when entering the paint, but diagramming and preparing for that is what will end up changing the NBA landscape.
And in order to apply the kernels of data, there needs to be a conversation with the scouts and the coaching staff. When the rise of pitch f/x and data analysts didn’t make scouts extinct; they brought them closer together. If a computer geek discovers that Jonathan Papelbon’s curveball generates more swings and misses on the outside part of the plate, good luck trying to tell him how to pitch. That’s where the scouts and managers (or coaches) come in. If they don’t listen or embrace the data, then how will the team ever get any use out of it?
Quants won’t replace advanced scouts in the NBA, just like they didn’t in MLB. It’s all about the quest for information. That’s why at the Baseball Analytics panel at Sloan, the father of sabermetrics Bill James was probably just as eager to hear what former baseball player Rocco Baldelli had to say as Baldelli was to hear James speak. It's a two-way street. Every team seeks information in all shapes in sizes because every team craves that next competitive edge.
The NBA will look a lot different in 2017 when SportVU and other technologies take their place in the game. But as we’re learning in baseball, there will always be a seat at the table for both scouts and the stats. We’ll never have perfect knowledge of the sport, but with a wave of data in our sights, we’re probably moving in the right direction toward that unreachable ideal.