Post 5 in the management series. In this post I will discuss the basics of financial modelling, and share rudimentary models for Monte Carlo simulations and sensitivity analyses. Finally, I will give a brief introduction to regression analysis.
Introduction to financial modelling
The purpose of a model is to help you make business decisions. Given the range of decisions managers are called to make and the different contexts in which these decisions are made, it is impossible to produce a model that can be adapted to any situation. However, there are some principles that I believe modellers should keep in mind across industries & domains:
- Only model when you have to: sounds obvious, but if a decision is a foregone conclusion, there is not much point in building a model to justify it. Yet outdated processes or useless bureaucracy often demand models be built when they are not needed. Push back — don’t waste time for the sake of ticking a box.
- Compare your model to the right base [more on this in a sec].
- Focus on the most important variables: typically, you will find that changes in one or two variables have a disproportionate effect on the model’s results. Focus on getting the assumptions for these variables right.
- Use common sense: many analysts get so much into the weeds of the models they are building that they forget to sense check the outputs. This is far more common than you might think — I’ve seen experienced managers take a glance at a model a more juniour analyst spent hours or days working on, only to dismiss it within seconds because it showed something like cannibalisation > 100%, market growth rates dropping from an average of 40% every year to -20% etc. Not that such results are not possible, but they are not likely; I’m the first to claim that our intuitions and expectations are very often wrong, and rigorous analysis is always welcome to challenge them, but if our intuitions are proven wrong, it’s worth checking why that was.
- Pay attention to design.
Points 2 and 5 can do with some elaboration:
Compare your model to the right base
Again: modelling is about making decisions. If you are trying to decide between action A and action B, then you ought to model the potential outcome if you choose A, the outcome if you choose B, and compare the two. The important thing to remember here is that doing nothing is a choice in itself.
Consider an example: suppose you work at Ford. Your consumer research team comes to you and says, ‘an increasing number of drivers care about their ecological footprint; we therefore want to consider launching a series of electric cars’. Of course, if you launch a new model, you almost definitely run the risk of cannibalising your own sales: in other words, there are some buyers who would have bought a car you already make, who will now buy the new car instead. This is fine if the two cars are equally profitable (remember this means same $/car, not same gross margin!), but if the new car is less profitable, then you might have a problem. Indeed, you run the financials and come up with this model (numbers in thousands):
Suppose that you’ve sold 100k cars over the past 12 months, making $5M in revenue, and $2M in profit. You compare these numbers to what would happen if you launched the electric car. According to this model, you can expect to sell 40k electric cars; however, 20k of those cannibalise the existing line-up; hence, you will only sell 20k cars more than you would if you did nothing. This means your revenue will go up, but your profit will go down, because the electric cars are more expensive to manufacture.
So, going by this model, you should not launch the electric car. But is the model right? No: it’s not necessarily reasonable to expect you will maintain your current financials if you do nothing; it is possible that if you do not launch an electric car, those 20k consumers who would buy it would buy a competitor’s vehicle instead. So, what you should do is adjust the ‘do nothing’ scenario to reflect that reality (if it is indeed the most probable result of doing nothing).
(That said, people who have incentives to push for a particular decision will always try to argue that the consequences of not taking that decision will be dire. In the example above, the head of the electric car division will certainly argue that if Ford does not launch an electric car, consumers will flock to competitors in the tens of thousands. Finance managers need to exercise judgement when estimating what will happen in the do nothing scenario.)
Pay attention to design
You want your model to be easy to use and update — not only by yourself, but by others (and your future self). I suggest adhering to the following principles:
- Colour-code dynamic-input cells (i.e. cells where the model’s user can make changes — e.g. expected volume uplift from a new product launch, or the new product’s price), hard-coded input cells (i.e. cells containing base data that are not subject to modelling — e.g. past 12 months’ data) and calculation cells.
- Cluster input cells as much as possible: you don’t want to have a bunch of input cells, followed by calculations, followed by more inputs.
- Break down the model into modules: for example, if you are evaluating the potential impact of an acquisition or restructuring, you may want to have separate tabs for the impacts on revenue, COGS, other costs etc.
- Incorporate checks — for example, ensure that the sum of the revenue for each division equals total revenue.
- Keep things simple: early in my career, I had a predilection for overly complex models —for example, ones that made use of VBA, or that relied on needlessly complex formulas. In retrospect, in the vast majority of cases, I could have reached the same conclusions using far simpler models (at the other extreme, one of my bosses tended to do his NPV calculations on napkins; find a good balance!).
- Ensure the model does what it’s meant to do: another mistake I often made early in my career was building models that did not answer the right question. For example, my first role at P&G was in global profit forecasting; one of my tasks was to consolidate inputs and produce the global profit forecast for a division, and built a ‘reconciliation’ showing the changes between the latest forecast and the previous one, to be shared with my finance director. Now, as anyone who has worked in a planning role knows, forecasts are messy: an in-market sales team has a forecast that includes revenue uplift from a product launch; their finance manager thinks this is inflated, so they adjust it down. As the year progresses, the sales team realises they were overly optimistic, so they lower their forecast; their finance manager reduces their buffer, so in fact, as far as the total forecast goes, there is no change. Now, when presenting this to my finance director, I should have just shown no change; but I was eager to go into the details, so I was showing changes in the base forecast, offset by changes in the buffer. Given the number of different markets and P&L elements, there were many such offsetting moves — and my reconciliation was impossible to understand. Luckily, I was given the right feedback and coaching, and I learnt to show what is useful to my audience.
OK, now onto some more exciting stuff.
Monte Carlo Simulations
The best thing about a Monte Carlo simulation is its name: it’s exotic, it evokes James Bond and high-stakes gambling, it sounds sophisticated; mention Monte Carlo simulations in a meeting, and you’ll see people perk up and pay attention.
A Monte Carlo simulation is nothing more than the aggregation of the results of hundreds or thousands of possible scenarios. To see it in practice, consider the Monty Hall riddle: you’re on a game show where the host asks you to choose one door out of three; behind one of the doors is a car, whereas the other two conceal goats. You get to keep whatever’s behind the door you pick. You choose door number one. Before revealing what’s behind that door, the host (who knows what’s behind each door) opens door number two, revealing a goat. “Now,” he says, “do you want to change your choice? Or stay with door number one?” What should you do?
The answer is that you should switch, because doing so doubles your odds of winning. You can prove this counter-intuitive result algebraically if you remember your Bayes, but you can also prove it using Monte Carlo: just simulate thousands of turns of the game where you switch, and thousands where you don’t, and see which one leads to more wins:
As you can see in this chart, when you switch, you end up winning twice as often as losing (which means your odds of winning are 2/3); when you do not switch, it’s the reverse. You can see the model I used for this simulation here.
(Note that the idea here is that the presenter does not open a door at random — he will never reveal the car; if he were to open a door at random, then switching does not make a difference. Again, you can prove this either algebraically, or with another simulation (which you can find in the link shared above).)
In a business context, you might want to model things such as the possible outcomes of a particular investment. There is no limit to the complexity of the Monte Carlo simulations you can run — you could incorporate variables beyond those directly relevant to the investment (such as the success of a product launch) to include things such as political instability, FX rate variation, or the impact of a pandemic.
Nor is the visualisation of the probability of each outcome the only advantage of Monte Carlo simulations: they can also help you understand under what conditions a particular outcome comes about. Let’s say you are modelling an investment decision, and the simulation shows that most outcomes are positive; however, one outcome — a very unlikely one — is not only negative, but catastrophically so (this could be the case with selling naked options, for example). Understanding under what circumstances this outcome can manifest is very important — first, there is a chance you can take active steps to prevent it, and second, you can do more work to make 100% sure that such an outcome is, indeed, unlikely.
The issue with Monte Carlo simulations is that they require you to specify the probability functions of the variables you take into consideration. If you get these wrong, then your simulation will be pretty much useless. To take a real-life example, suppose you were a banker trading sub-prime mortgage-backed securities; if you had assumed that the probability of (seemingly, as it turned out) unrelated mortgages all going bust at the same time was one in a quadrillion, then the outcome of your portfolio losing all its value might not even appear in a Monte Carlo simulation. This is where sensitivity analyses are more powerful.
A sensitivity analysis shows the impact of different combinations of two variables on an outcome of interest. For instance, suppose you are modelling the impact of a price change on profit for a particular product. This is what a sensitivity analysis could look like:
You may not know anything about this particular product, but a look at this table tells you that raising prices is probably not very risky: if you took a 20% price increase, you wouldn’t lose money even if you lost half your volume. On the other hand, if you decreased prices, even a 50% volume increase would not offset the profit loss. (Of course, it could be that this product is exceptionally price-elastic, and the smallest change in price leads to huge swings in volume; that’s why you need to consult people more knowledgeable than you).
I love sensitivity analyses for several reasons:
- They show impact, not probability: as mentioned above, sensitivity analyses show the impact of different values of input variables, regardless of the likelihood of each value. In the example of the subprime mortgage, a 7% default rate for mortgages would wipe out the entire value of some securities; because banks and funds holding these securities were leveraged to the hilt, such losses would in turn cause them to go bust. A 7% default rate was deemed extremely improbable — impossible even; but a sensitivity analysis would still show the impact of that level of defaults, and such a huge impact would attract more scrutiny.
- They avoid relying on point estimates: like many people, my career has involved many assignment changes; from one day to the next I’ve had to make decisions with no domain expertise. In such circumstances, making precise forecasts is impossible, and it is therefore much better to look at a range of possible outcomes. In one role, I had to evaluate the terms my multifunctional team was considering offering to a client; the profitability of the deal depended on all sorts of factors, and I had no idea what were the most likely values for each factor. By doing a sensitivity analysis, I showed to senior management that the deal would remain profitable in all but the most unfortunate combination of factors — which gave them confidence to approve the deal.
- They help plan when circumstances change: in the example I just gave, the sensitivity analysis did not just serve to understand under what conditions the terms we wanted to offer our client were profitable; they also served to help the sales team with their negotiation strategy. Suppose you are a manufacturer negotiating with a retailer: you are willing to offer better margins to the retailer, but in return you expect them to make changes that will help you boost volume (e.g. they will give you more or better shelf space). As you negotiate back and forth, you can refer to a sensitivity analysis to see how much volume increase is needed to justify margin dilution; the alternative is to keep updating a model live, which is more prone to errors (and less elegant).
One of the most common questions finance managers are asked is ‘what is the impact of A on B’ — for example, questions I’ve had to grapple with include whether in-store marketing activities (such as banners and other promotional materials) increase sales, whether sales effort has a meaningful impact on client spend, and whether there exists a psychological barrier that drastically lowers consumption once a certain price level has been exceeded (for example, it might be that consumer demand is inelastic for prices between $1-$2, but if the price of a product goes above $2, demand collapses).
Answering these questions is very difficult. These days, such analysis is undertaken by people far more qualified than me, who have training in statistics. So, here I will show how to run the most simple of regressions: a single-variable, linear regression.
A quick introduction: a linear regression looks at the relationship between two variables, the so-called dependent and independent variables. In simple terms, the dependent variable is one that has a causal effect on the independent variable. For example, if you are investigating the impact of temperature on ice cream sales, you’d label temperature the independent variable, and ice cream sales the dependent variable. This is because your hypothesis is that higher temperatures cause higher ice cream sales, and not the other way around.
A linear regression assumes that the two variables you are investigating are linked in a linear manner — i.e. that if the dependent variable changes, the independent variable will always change by the same proportion. For example, buildling on the previous example, linear regression assumes that if temperature goes up by 1 degree, ice cream sales will increase by $5k, regardless of the absolute temperature level. In reality, many variables are linked in non-linear functions; for example, it could be that for hotter temperatures, ice cream sales increase more rapidly — i.e. if the temperature is 20 C, an increase of one degree would lead to $5k more sales, but if the temperature is 30 C, a one degree increase would lead to $10k more sales. Or it could be that above a certain levels, higher temperatures lead to a decline in sales.
Now, here’s how linear regression works: you plot your two variables on a scatter plot:
You then draw a line going through all the dots on the plot; the line represents the relationship between the two variables. Of course, the number of lines one could pick from is infinite, so which is the right line? The answer is the line that minimises the sum of the squares of the differences between the actual dots on the chart and the line in question. In simple terms, we pick the line that is as close to as many points as possible:
Recall from your school days that lines are defined by an equation of the form
y = ax + b
In this case, y is the independent variable, x the dependent variable, b is the intercept, and a is the slope. In the context of regression, these terms carry the following meanings:
- x: the dependent variable — in our example, temperature.
- y: the independent variable — in our example, ice cream sales.
- a: the response of the independent variable (in our example, ice cream sales) to the change in the dependent variable. For instance, if a = 2, it means that a one-degree change in temperature results in $2k change in ice cream sales (the actual scale (whether we are talking $, $k, $m etc) depends on how we set up the data; here I assume we compared $k sales vs degree temperatures).
- b: the value of the independent variable when x is 0. For example, b = 10 means that we expect $10k in ice cream sales if the temperature is 0.
Let’s look at how to run a regression analysis in practice. You can find the example I’ll go through here.
Suppose you want to find out the impact of changing the price of a product on the number of units sold. Luckily, you have data on the number of units sold at different price levels — for example, it could be that you have experimented with different promotions for the product and recorded the sales at each price, or it could be that different retailers charge different prices for the product.
What you need to do is put the price levels on one column, with the corresponding sales on an adjacent column:
You can then use Excel’s or Google sheets’ functions to run a regression. Excel’s is much better, and you can read how to do it here. In Google sheets, you can use the LINEST formula, whose output looks like this (though, annoyingly, without the labels, which I added manually):
So what does this tell us? The important values to keep in mind are the following:
- The value of the slope: In this case, it’s -0.69. This means that when price increases by $1, we will sell 0.69 fewer units. Conversely, if the price decreases by $1, we will sell 0.69 more units.
- The value of the intercept: this shows what happens when the price is 0 — in this case, the regression suggests that if we do not charge money, we can expect people to want to receive 41 units.
- The R-Squared (Rsq in the table): this is where a lot of people become confused. You may have heard that R-Squared (whose value ranges between 0 and 1) denotes the strength of a relationship between two variables, and so, seeing an RSQ of 0.09 may suggest to you that these two variables have a very weak link. The intuitive way to understand RSQ is that it shows the variance in the independent variable that can be explained by the dependent variable. 0.09, or 9%, means that 9% of the differences in quantities sold can be explained by price (other factors can include things like marketing activity, number of stores stocking the product, supply shortages, seasonality, etc). But this does not mean that the link between price and demand is not strong! The correct way to interpret our result is that, everything else being equal, a $1 price change results in a 0.69 change in quantity sold (in the opposite direction).
But if we cannot use RSQ to check whether the relationship between the two variables is strong enough, how do we know whether the slope the model spat out is right? After all, the model will give a result no matter what we feed it.
To do this, we make a so-called null hypothesis: we say to ourselves, suppose the two variables were not in fact correlated. If that were the case, how likely would it be to obtain the results we did from our regression?
The LINEST equation gives us the F statistic, which can be used to that end; the F statistic can be used to check whether the entire equation (the slope and the intercept) were likely to appear by chance. Unfortunately, I am not familiar with this, so instead I will explain how to test whether the slope (which is, after all, the most important thing) could have appeared by chance.
First, calculate the t-value, by dividing the value of the slope by the standard error of the slope. Then, use Google sheets’ TDIST formula, using the absolute value of the t-value as the first argument, the degrees of freedom from the LINEST results table as the second argument, and 2 as the third argument. The result will be the probability that the output of the regression analysis was chance; the lower the number, the more confident you can be that the slope is actually a good predictor. In this example, the result is very close to 0, which means the regression is a good model for reality.
And as it turns out, the results we obtain here make perfect sense: the way I built the data used in the example was by using Google sheets’ random formula:
- I set prices using the formula 1 + randbetween(1, 10)/100 — this gives a random decimal value between 1 and 11.
- I set quantity sold as = randbetween(30, 50) — price * randbetween(1, 20)/20.
Evidently, by design, there is a relationship between price and quantity, though there is a lot of randomness injected. Still, the model identified the intercept as 41 (which makes sense, since quantity was set as a random number between 30 and 50, regardless of price), and the slope as -0.69, which also makes sense, as by design the slope was a random decimal between -0.05 and -1.
There are, of course, heaps of caveats when using regression: your variables may be both correlated to a third variable, that impacts both; the relationship between them may not be linear; there may be other factors that cannot all be constant; etc. So use this with caution.
As always, thoughts, feedback and questions welcome.