Baseball Graph Details

Questions

Why baseball graphs?
So what's the theory behind your graphs?
Exactly how do you build your graphs?
Then why Win Shares?
How are Win Shares calculated?
Is it legitimate to calculate Win Shares inseason?
Where can I find other baseball graphs?
Who created your cool artwork?
What's your privacy policy?

Search "The Language of Baseball"

Answers

Why baseball graphs?

Baseball fans have learned a lot over the past twenty years. Bill James, and many others, have helped many of us better understand the essential dynamics of this game we love. Their insights into how runs are scored, how teams win and lose, etc. etc. have enhanced our enjoyment of the game tremendously.

Unfortunately, the notion of "sabermetrics" is still held in contempt by many baseball fans. Even though many sabermetric insights are simple common sense, most fans (and even sportswriters) choose to ignore them.

I believe one of the problems is the way baseball information is presented. Baseball analysts like to research things in minute detail, presenting their results in nuanced numeric tables. The problem is that most of these tables are unintelligible to the average fan. And very few people have tried to bridge the gap between the things we've learned about baseball and the way we present baseball statistics.

So that's what I'm trying to do with this site. Over the years, I've learned that information is best processed through pictures, or in the case of numbers, graphs. In the business world, accountants and actuaries use numeric tables, but real business decisions makers use graphs and conceptual charts. So I've tried to convey basic baseball information via a few graphs, focusing on team statistics.

The expert in this field is Edward Tufte, and I'd recommend his books to anyone who would like to pursue the same path.

So what's the theory behind your graphs?

Glad you asked. These graphs are organized around three fundamental truths that have emerged from baseball analysis.

First, the number of games that a team wins is generally attributable to the difference between that team's runs scored and runs allowed. Also known as the Run Differential. In other words, teams win games by outscoring the other team. Over a full season, the teams with the most wins are those that have achieved the greatest total differential between runs scored and runs allowed. Sounds simple, right?

Unfortunately, a lot of fans often overlook this simple fact. Certain myths, such as "pitching is 95% of the game" stubbornly persist. I believe that one of the reasons they persist is that fans can't easily see the difference between runs scored and runs allowed for their favorite team. So my graphs are designed around the simple concept of Run Differential.

Second, the ability to score runs comes down to two things: getting on base and moving around the bases. These are represented by two well-known statistics: On Base Percentage (OBP) and Slugging Percentage (SLG). This is the reason you hear some baseball analysts cite OPS (OBP plus SLG) as a batting metric.

This truth may be a bit less apparent to you than Run Differential. Many fans focus on the "triple crown" of batting: Batting Average, home runs, and RBIs. These three stats have been ingrained into most baseball fans' minds as the most important batting stats of all.

However, batting average is not nearly as powerful a statistic as it appears. Many times in baseball history, the team with the most runs scored has not been the team with the best batting average. Also, RBI's are situational in nature. Good hitters tend to have lots of RBI's, yes, but only if they come to bat with lots of runners on base. RBI's tend to be a function of the batters in front of a hitter as much as the hitter himself.

As it turns out, OBP and SLG are two elegant offensive statistics. If you take the statistical totals of any league in baseball history, multiply its OBP by its total bases (the key component of SLG), you will get a number that is almost always within 1% of total league runs scored! When you apply this math to individual teams, you usually get a number within 5% of team runs scored.

This is an astouding mathematical concept. The person who discovered this basic truth must have felt like Archimedes, running down the hall naked and shouting "Eureka." The essence of offense comes down to two simple averages. So I have drawn graphs and accompanying tables that highlight them.

Third, allowing runs to score is a function of fielding and pitching, right? Well, one of the newer insights of sabermetrics is that pitchers may not have a lot of impact on balls hit in the park. That is, once a ball is hit fairly by a batter (and stays in the park) the likelihood of a single vs. an out may not depend a great deal on who threw the pitch.

There is a lot of research occurring in this field, and firm conclusions are elusive. Voros McCracken's article is the one that started the brouhaha. Another good article is Tom Tippett's research.

Still, I've built graphs that are built around two metrics that divide responsibility for runs allowed between pitching and fielding. You can read more about the precise metrics below.

Bottom line, these graphs are designed to present a structured way for readers to see and understand each team's run differential and its contributing causes.

Exactly how do you construct your graphs?

Now we're really getting into it.

Runs Differential graphs are pretty simple, displaying Runs Scored and Allowed by team. Scoring runs is equally as important as stopping runs from scoring, so the scale of the two axes is the same. The only wrinkle is that I have adjusted them by the average of the last three years' Park Factors at Baseball Reference.com, to iron out differences between ballparks.

I've also added a number after each team name on the graph that is the difference between actual wins and projected wins, based on the Pythagorean Theorem, which is a formula that very accurately predicts wins based on Run Differential. Variances against the Pythagorean Theorem are often a function of "luck" and tend not to persist over time.

OBP and SLG are self-explanatory, I hope.

Although attributing runs allowed to pitching and fielding is certainly not a straightforward task, I've chosen to use two metrics that seem best suited for the task. The results of an at bat can be separated into two pots: those in which only the pitcher and batter play play a role (strikeouts, walks and home runs, broadly speaking) and those in which fielding also plays a role (Balls in Play, or BIP). To calculate the first pot of events, I used Tangotiger's Fielding Indendent PItching, which calculates the relative run impact of each event.

The calculation for FIP is simple: (13*HR+3*BB-2*K)/IP. The number you get from this calculation is the proportion of ERA that can be directly attributed to a pitcher. If you add 3.20 to FIP, this number answers the question: if the pitcher did not have the benefit of his fielders, how would he perform compared to an average team defense, including fielders?

For the second pot of events, I'm experimenting with several different metrics. On some graphs, I use Defense Efficiency Ratio (DER) as a proxy. DER, created by Bill James, is essentially a measure of the number of Balls in Play that are subsequently turned into outs by the fielders. DER is a function of fielders, pitchers, park and probably a few other things, but it's a decent indicator of fielding prowess. You can view the complete, up-to-date DER calculations by team at The Hardball Times.

Then why Win Shares?

Win Shares are the creation of Bill James, as articulated in his book of the same name. The basic idea of Win Shares is to credit individual players with the number of wins they contributed to the team, based on virtually everything they did while on the field: batting, pitching and fielding, even a little baserunning. Win Shares are the perfect complement to Baseball Graphs, because they calculate each of the sabermetric "truths" described above and attribute them to individual players on each team in one, simple-to-understand, number.

Each team's total wins is multiplied by three, and then distributed to individual players, based on their batting, fielding and pitching. There's no magic to the 3x multiplier, by the way. It's just done to create enough meaningful variance between players.

It took Bill James about 100 pages to describe the entire methodology. And while there are certainly some flaws that will be corrected in the future with better data (such as play-by-play data), or new methodologies, it's a pretty intriguing system. The most thorough critique I have found of the Win Shares methodology was that conducted by Tangotiger and Rob Wood. Warning: this link is a 40-page, very theoretical PDF document.

How are Win Shares calculated?

Okay, here's how Bill James calculates Win Shares. Ready?

First, you divide responsibility for a team's wins between the offense (batting and baserunning) and defense (pitching and fielding). You do this by calculating the team run differential through a method James calls Marginal Runs. You first calculate the average number of runs scored per team in the league. You next adjust your team's runs scored and runs allowed for the ballpark in which they played half their games (i.e. home games). Then you add together two figures: all runs scored over 52% of the league average (credited to the offense), and all runs allowed less than 152% of the league average (credited to the defense). This is total marginal runs.
Next, you take the percent of marginal runs contributed by the offense, multiply it by the number of wins times three. This is the total number of offensive Win Shares. You do the same thing for defensive Win Shares.
Next, you attribute offensive Win Shares to individual players. This is done through two key metrics: Runs Created and Outs Made. Runs Created is a formula built by James and refined over the years. It starts with the basic equation of OBP times total bases and then adds player credit for other factors, including stolen bases, caught stealing, grounding into double plays, batting average and home runs with runners in scoring position and the kitchen sink. Runs Created is calculated for every single batter, including pitchers (if they're in the National League).
Next, you subtract the league "background" Runs Created (52% of the league average) from each player's Runs Created based on the number of Outs Made by that batter, adjust it for ballpark, and credit each player with the result; essentially individual marginal runs created. Add these up for all players and use each player's percentage of the whole to allocate offensive Win Shares to each. Note that any player whose Runs Created are less than 52% of the league average runs created per out is credited with no Win Shares. This doesn't happen very often (except for pitchers).
That was the easy part. Now you've got to deal with the defense. The first step is to divide defensive Win Shares between pitching and fielding. This done through a complicated formula that accounts for FIP elements that can be attributed only to pitchers (home runs, walks and strikeouts) as well as a team's DER (Defensive Efficiency Ratio, adjusted for the ballpark) and other fielding statistics such as passed balls, errors and double plays. Typically, about 70% of defensive Win Shares are credited to pitching, and 30% to fielding. The Win Shares system is bound so that pitching never is credited with less than 60%, or more than 75%, of defensive Win Shares.
Next, you allocate pitching Win Shares to individual pitchers. This is accomplished through an even more complicated formula that starts with each pitcher's marginal runs not allowed (same approach as team marginal runs not allowed), wins, losses and saves. Special consideration is given to relievers by estimating the number of high-leverage innings they pitched (ninth innings with one-run leads are more important than first innings with no score) and something called "Component ERA" which is essentially ERA re-calculated according to the actual underlying run elements.
Finally, pitchers are deducted Win Shares if they are absolutely lousy hitters. Call this the "Dean Chance" factor. All these elements are then mixed together in a complicated formula to allocate pitching Win Shares to individual pitchers. As in offensive Win Shares, any pitcher who gives up more than 152% of league-average Runs Scored (adjusted for ballpark) does not receive any credit for pitching Win Shares.
One note: responsibility for unearned runs is split 50/50 between pitching and fielding.
Which leads us to the next, most complicated step: allocating fielding Win Shares to fielding positions, and then to individual fielders. The calculations differ for each position. Essentially, James has selected four defensive statistics to evaluate positions. Here they are by position, listed in order of importance:
- Catchers: Caught Stealing, Errors, Passed Balls and Sacrifice Hits Allowed
- First Basemen: Plays Made, Errors, Arm Rating and Errors by third basemen and shortstops
- Second Basemen: Double Plays, Assists, Errors and Putouts
- Shortstops: Assists, Double Plays, Errors and Putouts
- Third Basemen: Assists, Errors, Sacrifice Hits Allowed and Double Plays
- Outfielders: Putouts, Team DER, Arm Elements and Assists and Errors
Lots of things to note about the fielding calculations.
- First, the statistics are adjusted based on the number of innings a lefthander pitches for the team, which has an impact on which side of the field batters hit the ball to.
- Second, these stats are calculated as a proportion of the team's total, divided by the league-average proportions of the total. In other words, if a shortstop has 50 assists and his team has 100 assists in total, he receives just as much credit as the shortstop who has 100 assists and plays on a team with 200 assists in total. This is important, because it adjusts the fielding stats for the fact that fielders may be playing behind pitchers with certain tendencies such as giving up more ground balls vs. fly balls.
- Third, double plays are only factored in as a proportion of potential double plays. If teams don't have a lot of runners on first, they have less of a chance to turn double plays, and Win Shares takes this into account.
- Fourth, team DER is used to credit outfielders with fielding Win Shares because it is James' observation that outfielders have a much larger impact on DER than infielders. James acknowledges that there is some "circular logic" here.
- Fifth, there is a final element included in the formula to allocate fielding Win Shares to individual fielders. This element is called "Range Bonus Play." It particularly impacts outfielders in the following manner: if one outfielder handles more opportunities per inning played than the other outfielders on the team, he will be credited with more fielding Win Shares. This especially impacts centerfielders, who typically handle more chances per inning played than the corner outfielders.

Now, what was your question?

Well, you may have a question about how we compute Win Shares today. In fact, we have made a few changes to the basic James formula, which are outlined in this article at the Hardball Times.

Interleague play makes it unlikely that a league�s runs scored will equal its runs allowed (ditto hits, walks, etc.). In cases where Win Shares is unclear about using a league�s runs scored or allowed, we have generally used the allowed figure.

Win Shares explicitly states that the fielding points on certain scales for certain positions are bounded. For example, catchers' points on the 50-point scale are bounded between 0 and 50. However, as far as I can tell, other points are not bounded (example: catchers' points on the 30-point scale). This isn�t really much of a problem, except in the early parts of the season.

Is it legitimate to calculate Win Shares inseason?

There are a number of minor calculations in Win Shares that are based on an entire season. We've adjusted those calculations to handle inseason stats.

Some people seem to object to Win Shares being calculated in midseason. As a response, let me offer you this quote:

"Despite the 'Book of Values' given with this volume, you may wish to figure Win Shares for some other team, such as next year's San Diego Padres, as of the All-Star break, or your son's little-league team."

That's Bill James himself, on page 14 of Win Shares, giving implicit permission for inseason calculations.

Where can I find other baseball graphs?

There are several other resources for baseball graphs on the web. At the Hardball Times, I've got graphs are updated daily for the current season. This site has a wonderful set of historical baseball graphs, including trend rates over time.

Another I've found was created by a math professor in Indiana, and it includes some neat historical graphs (such as total wins over the century by the original eight teams in each league).

If you're a Rangers' fan, you also might enjoy these graphs.

Who created your cool artwork?

The banner and general look of this site were created by the talented Kasia. So far, Kasia's only exposure to baseball has been one viewing of Field of Dreams, and I haven't had the heart to tell her that catchers aren't lefthanded.

Some of the graphics are courtesy of Baseball History Info. I have also added colorized pictures of great ballplayers, courtesy of Portrait Matt.

What's your privacy policy?

I promise I will not divulge your identity to anyone or any other entity, even if they offer to pay me for it. Actually, I collect no information from you from this site but, if I did, I would keep my promise. Same goes for if you e-mail me.

Please note that I have used only publicly available data to compute the graphs and Win Shares. I have not violated any copyrights or trademarks. Please don't sue me.

That's it for now. I am constantly looking for ways to improve the usability of this site, so please send any comments or suggestions my way at dave@baseballgraphs.com.