Monte Carlo Simulations

Brief Intro

Oct 14, 2024

In the last few posts, I’ve talked about using Runs Created and Linear Weights to evaluate a player’s offensive contribution to their team. Today, we’re going to talk about how to use a Monte Carlo simulator, which opens the door to a much more nuanced, data-driven understanding of how hitters contribute to team success, compared to RC and using linear weights, especially when individual performances deviate from the norm.

Both Runs Created and Linear Weights are designed to estimate a hitter’s contribution based on a statistical relationship between a team’s overall performance (walks, singles, doubles, etc.) and the runs they score over a season. However, these metrics often break down when applied to individual players whose event frequencies are far from the typical averages.

Consider the example from the book, Mathletics, where they use a hypothetical player named “Joe Hardy.” Hardy hits a home run in 50% of his plate appearances and makes an out the other 50% of the time. This extreme performance pattern significantly challenges the accuracy of traditional metrics. For Joe, Runs Created predicts 54 runs per game, while Linear Weights predicts 36.77 runs per game, Yet, the value derived from simulations is around 27 runs per game. This stark difference highlights the inadequacy of the traditional methods for players with such unique profiles.

What is a Monte Carlo Simulation

A Monte Carlo Simulation is a mathematical/computational technique that uses repeated random sampling to obtain numerical results and make predictions about complex systems or processes. Its purpose is to model the probability of different outcomes in situations where there’s uncertainty or randomness involved.

How do they work?

A Monte Carlo simulation works by modeling the probability of different outcomes in a process or system that cannot easily be predicted due to the intervention of a random variable. It uses something called “random sampling.” Random sampling is used to generate multiple possible outcomes and calculate the average result. So, for example, take the calculation of the probability of rolling two standard dice. If you wanted to calculate this probability the brute force way, you would have to roll the dice a whole bunch of times yourself, say 36,000 times, if we consider that there are six sides to a dice, in which we have two of them, and we want to run this a thousand times to get a good sample size.

However, with a Monte Carlo simulation, we can reduce the number of rolls by randomly sampling the possible outcomes, knowing there are 36 combinations of dice rolls, and calculating the percentage of times that we get, say, a seven.

Who uses them?

There are a number of common applications for Monte Carlo simulations and perhaps the most well-known opposes in the area of portfolio management. By running thousands or even millions of simulations, investors can get a better idea of how their portfolio might perform under different market conditions. There are also other common applications, such as risk analysis, option pricing, and planning for spare capacity. However, a Monte Carlo simulation is applied in all sorts of fields, including physics (particle interactions), engineering (project management), climate science (weather predictions), etc.

How do we run one?

Monte Carlo techniques involve three basic steps. First, you set up the Predictive Model - which identifies both the dependent variable to be predicted and the independent variables (also known as the input risk of predictive variables) that will drive the predictions. Secondly, you specify the probability distribution - this is the probability distribution of the independent variables. Here, you can use historical data or an analyst’s subjective judgment to define a range of likely values and assign probability weights for each. And thirdly, we can run simulations repeatedly, generating random values of the independent variables.

We then do this until there are enough results gathered to make up a representative sample of the infinite number of possible combinations. You can run as many Monte Carlo simulations as you wish by modifying the underlying parameters you use to simulate the data. However, you’ll also want to compute the range of variation within a sample by calculating the variance and the standard deviation, which are commonly used measures of spread. The more you sample, the more accurate your sampling range, and then the better your estimation.

Going back to our Joe Hardy example, a Monte Carlo simulation can model an inning by assigning a random 50% chance of hitting a home run and a 50% chance of making an out. Each inning is played until three outs are recorded, and the total number of runs scored is tallied. By repeating this process thousands of times and averaging the results, the simulation produces a much more accurate estimate of Joe Hardy’s contribution: around three runs per inning, or 27 runs per game.

The beauty of the Monte Carlo simulator lies in its ability to handle complex, uncertain scenarios with ease. Whether you’re modeling a single player’s performance or an entire team’s, the method remains robust across a wide variety of conditions, providing accurate predictions where traditional metrics might falter.

Using Excel

Monte Carlo simulations can be implemented in software like Microsoft Excel, which offers a built-in random number generation function - RAND( ). This function generates a number between 0 and 1 with equal probability. By associating a range of outcomes (like a hit or an out) with different random number intervals, you can simulate various events in a baseball game.

For example, if RAND( ) generates a number between 0 and 0.5, we might record a home run; if the number falls between 0.5 and 1, we would record an out. This process is repeated until three out are recorded, and the number of runs is tallied. Excel allows for the automation of these simulations, repeating them thousands of times to create a reliable average estimate of the player’s impact. In the Joe Hardy simulation, running the model 1,000 times yielded a consistent result of approximately 3 runs per inning.

Or, if you have the budget, there are some really cool Monte Carlo products out there that you can play with, such as Lumivero’s atRISK.

Team Application

Monte Carlo simulators are also incredibly useful when applying them to evaluating your team. Consider the case of a team composed entirely of nine Ichiro Suzuki's from his incredible 2004 season. To simulate the team’s performance, we need to account for a broader range of events that can occur during each plate appearance. These events might include singles, doubles, errors, and ground ball double plays, each with its own probability based on historical data.

In a Monte Carlo simulation for a team, each plate appearance is modeled with probabilities for each possible outcome, such as hitting a single, grounding out, or hitting a home run. By assigning probabilities to all potential events - whether a hit by pitch, walk, or strikeout - the simulation can realistically play out each inning. The simulation tracks base runners, outs, and runs, producing a comprehensive estimate of the number of runs a team of nine Ichiros might score per game. The simulation for Ichiro 2004, for instance, produced an average of 6.92 runs per game.

Outlier Examples: Bonds & Pujols

The value of Monte Carlo simulations becomes even more evident when applied to extreme cases like Barry Bonds' 2004 season, during which he was intentionally walked 120 times. A team of nine Bonds would not face intentional walks, leading to an inflated prediction from traditional metrics. Adjusting the simulation to remove intentional walks results in a more accurate estimate of Bonds' performance: 15.98 runs per game.

Similarly, we can use Monte Carlo simulations to estimate the contributions of an individual player, like Albert Pujols, to a real-world team. In 2006, the St. Louis Cardinals scored 781 runs, but a simulation without Pujols projects only 706 runs. This difference of 75 runs can be directly attributed to Pujols, translating to an estimated 8.12 additional wins for the Cardinals that season.

Available for iOS and Android