Calculate four measures for stock risk analysis in Excel (24:52)
Videos are available at one organized Quant 101 Playlist on YouTube (opens in a new browser window).
Welcome. Today's goal is to learn four measures for calculating historical stock risk using two absolute measures and two relative measures.
I'm Paul, and with so many measures of risk available, it is difficult to remember which one to use, and when.
So here we will walk through calculations of variance and standard deviation first for each stock, then covariance and correlation for pairs of stocks. These measures are instrumental in creating a covariance matrix. So this will lay the groundwork for the whole Chapter.
After this, we will leave stocks and focus on building portfolios with different weights allocated to each stock and finally create a chart of 11 different portfolios, explaining the concept of diversification mathematically.
This financial modeling tutorial sits within a series called Quant 101 on equity portfolio management. If you want a free data set to follow along with, head back to the System Setup tutorial. All tutorials use this one data set that sits on a tab called Returns.
Whether you prefer video or text, we have both, a video for the visuals and text for everything I say including cell formulas in Excel.
Okay, so what will we cover here today?
First, we take a subset 21 cells from that Returns tab which will be the focus throughout this whole chapter.
Second, we walk through the calculation of variance and standard deviation.
Third, we discuss the term demeaned returns and see how it simplifies these calculations of covariance and correlation.
Fourth, we perform stock risk analysis through interpretation and tack on a final measure R-Squared.
And in our next episode we move on to portfolios, starting with calculating portfolio return.
Okay, for Step 1 lets do a quick rundown of what were doing here and then get started.
So two tutorials ago we left off with a table of returns called a frequency distribution and visualized it with a histogram.
There we did so with all 60 monthly returns from our Returns data tab and used all four stocks and left off saying we'd return to calculate associated risk measures, so that's our objective today.
To simplify things here we will use a small subset because we need to go through the calculations manually, and later we can speed things up even further with Excel functions.
We will walk through the tables in a moment but basically, variance and standard deviation are generated using the same procedure. Variance is the raw output, and if you take its square-root, it becomes standard deviation. The benefit of standardizing it in this way is that the scale becomes easy to interpret.
The remaining two measures are also related, so after calculating covariance, we standardize it to the more interpretable measure correlation.
Covariance and correlation both measure the co-movements between pairs of stocks.
I should mention before we go, we are working with data that occurred in the past and here we call the timeframe historical.
There's the rundown.
In your spreadsheet, let's grab 21 cells from the Returns data tab.
Start with cell E9 and put a reference to cell
B6 from the Returns tab using
Copy that cell formula across two columns and down six rows, and this gives you the whole range required for this tutorial. So cells in the range from E9:G15 should be filled with references to the adjacent cells on the Returns tab from B6:D12.
Now to interpret, we have returns for two stocks over 6 months. We need not worry about whether there is a statistically significant relationship between the two stocks because our sample size is too small for that. Instead, we just want to focus on the calculations because later when we are interpreting the output using Excel functions, it will make more sense.
To visualize this we can use a scatter plot, or scatter chart, in Excel measuring the return for each month for both stocks. So during April 2003, Microsoft was up 5.62% and at the same time eBay was up 8.91%. A point on the chart shows these coordinates with Microsoft on the x-axis and eBay on the y-axis.
If this were a real study we might use all 60 data points, so there would be 60 points on the scatter plot and we would have much more reliable data.
The line, or line of best fit, is the mathematically calculated line that minimizes the distance from each observation and the line itself. We will interpret the slope of this line and the errors, or distances from the points and the line in later tutorials.
Okay, with that we are ready to move on to the calculations.
Let's start with the two related calculations for absolute risk, variance and standard deviation. Absolute in this context means that each stock has its own variance and standard deviation. Later, relative risk refers to risk or variability of one stock relative to another.
Variance and standard deviation are both created from the same procedure. One way to think of this is that there is one variance and one standard deviation per stock. Of course the time-frame measured matters, here our data is 6 months, but you could also have a 60-month variance too, which would be a different number.
It is up to the modeler to decide the length of the time period for measurement, and ideally you would like at least 30 periods.
The takeaway is that each stock has its own variance and standard deviation.
Okay, let's talk more specifically about variance.
Variance is a measure of dispersion, meaning it measures how far numbers in a distribution are spread out.
Let's get that visual nailed down again, from back in the tutorial on Stock Return Disributions where we created a frequency distribution. With the resulting histogram we could see dispersion, basically how far the observations were from the center.
We saw the eBay observations more in the tails and for Microsoft the observations were closer to the center. Well, variance basically quantifies all of this into a number, and a higher variance number implies higher risk.
Also, variance is bordered on the low end by zero. So you can't have negative risk, which makes sense. Variance of zero would occur when all returns were the same, say if Microsoft went up by 2.38% each of six months in a row.
On the other end of the spectrum, a large variance says that numbers are very different. The way this works is to think of a point near the middle and then measure all of the differences from this middle point. This goes into the calculation of variance as you will see.
In practice we use the mean as the middle point, and in statistics this is one of several measures of central tendency. Don't worry about that term now, I just wanted you to have heard it.
Let's walk through the calculation of variance first in the Math section. Start from the inside of the parentheses. Return minus average, then square it, then sum them together and divide by the observations.
Also, remember the PEMDAS mnemonic for the order of operation. It goes parenthesis, exponents, multiplication, division, addition and subtraction. Parentheses in Excel change this default order. If a formula gets confusing, start with the inner parentheses and move outward.
Now let's see it in practice.
Take the average return over a period so in cell F16
=AVERAGE() function and select
the range above from F10:F15, so the arithmetic
average for the period was 2.38%. Copy the cell formula for
F16 to G16 and it gives you the
average for eBay of 3.98%.
If you caught the tutorial on Return Calculations then you will recall why we are using arithmetic returns here instead of geometric returns.
Now that we have our averages, let's move over to columns H and I. Here we input the averages we just calculated so we can visualize the logical flow of the calculation.
In column H we want averages for Microsoft, so all
cells here should point to cell F16. There are two
ways to do this. You can either point to it using
=F16 manually or, as I did, type
=$F$16 which is a hard cell reference.
This tells Excel to keep the same reference to F16
so that when you copy cells down or across then Excel doesn't
automatically update these references for you, which is the default.
In column I we the same thing for the eBay averages, pointing to cell G16.
In the video, I put labels on top of each column to guide you, but they aren't required. Next, in column J we subtract the 6-month average from the monthly return and place it in Columns J and K.
Next, in column L we put the square of column
J using either one of two methods, the
^2 to raise it to the power of 2.
Excel also has the
which takes two inputs, the number and the
power. I typically use the first method because it
In column M we post the results for eBay, squaring column K.
Next, to get the variance we need to sum these squares, and then divide by the number of observations.
=SUM() function is fastest,
so in cell L16 we input
=SUM(L10:L15) and copy that over
to cell M16 for eBay.
Next we need the number of observations, often referred to as
n. Using the
function to select the range from L10:L15 in cell
L17 for Microsoft. Then copy cell
L17 to M17 for eBay.
I should note here that technicallly we are taking a sample and it would be more technicallly correct to use n - 1 for a sample calculation instead of a population calculation, but I don't want to make this any more confusing at this point.
Next, in cell L18, for our final variance figure,
let's divide that sum by the number of observations, so
=L16/L17. Copy that formula
over to K18 for eBay. We're all set, the variance for
Microsoft, using decimal place convention, is 0.0010 and for eBay it is
Before moving on, I'd like to tackle two common questions. First, when you square a decimal it gets smaller, right? So the number 0.0010 is not very appealing to the naked eye and frankly not that interpretable either. That we can resolve.
A second issue, if you take the time to think about it, the units of columns J and K are in percent, so when you square it, the units become percent squared. Messy huh? This is another issue that can be solved.
These issues explain why most people have an aversion to variance, and for good reason, but we still need to understand it. I suggest not trying to interpret variance at all. Instead, leave it in decimal format and ignore it until we standardize it. Standardizing it scales the units as well.
So how we do we standardize it? With standard deviation.
Standard deviation and variance are linked, in fact, think of them as permanently connected. If you have one you can easily derive the other. And standard deviation provides what variance doesn't, a more logical scale to interpret.
To calculate standard deviation you first calculate variance, then take its square root. That's it, simple. When you do this the units change from return-squared to return, making it interpretable because it is now in the same units as the original data.
You can start making statements about it and visually look at the number and get an image of the distribution in your head. Also, like variance, there is one standard deviation per stock after you decide on the measurement period.
Another point to note, once we square the numbers in column J and K, they become positive, always, algebraically, which explains why variance and standard deviation cannot be negative. Again, risk cannot be negative.
So, like variance, standard deviation is always zero or greater. With respect to periods, you are free to select the number of periods to calculate variance, but remember longer periods are better. Typically, in practice, using more than 30 periods is recommended, and 60 is even better.
Okay, that wraps up our absolute measures of stock risk for now, later we will interpret this data.
As with variance and standard deviation, covariance and correlation are inextricably linked, which means they cannot be disentangled.
Covariance is the concept that is necessary for us to move from the individual stock concept of risk measures, to portfolio risk measures.
As mentioned, covariance measures the co-movements between stocks. For each pair of stocks you calculate co-movements.
A simple analogy would be to think about a classroom. Each student has a relationship with each other student, but each relationship is unique, right? Same thing here.
So there isn't one covariance per company, there is one covariance per pair of companies.
Here's a quick example. Imagine, two stocks A and B, these could be students too, whatever helps you. Count the number of relationships between the two. A to B right, just one.
Now, what about for three stocks, A, B and C? There is A|B, A|C and B|C, so three. Very good. Remember B|C is the same thing as C|B, so there are three relationships for three stocks.
Take a moment to think about how many unique relationships there are between four companies, A, B, C, D.
If your answer is six, you've got it, great. This will be a lot easier after we visualize the covariance matrix in future tutorials. Pause here and write it down if it helps.
Okay, so we know both covariance and correlation measure the co-movements between stocks. Let's now pin down the calculation of covariance first.
The calculation of covariance is similar to that of variance in that you start out by taking the average return for each stock, then subtract that from each period's return. In fact, if you look at the first seven columns for Exercise 2 it matches Exercise 1 exactly. They are identical all the way to column K.
This is the concept called demeaned returns. Demeaned returns are simply the returns for each period with the arithmetic average subtracted out. So you only need to calculate demeaned returns once and then use it for all four risk measures. That's a convenient little fact not a lot of people know about.
From here forward, the calculation differs because instead of calculating one absolute figure for variance and standard deviation for each stock, instead here with covariance and correlation we are interested in the relationship between the pair of stocks.
So now, mechanically, instead of squaring each one individually, as we did above, we take their product, multiplying column J and column K, with the result in column L (see above).
This measures the co-movements on a period-by-period basis.
Next you sum up those products for each observation and divide by the total number of observations, just as we did earlier.
So in the end we have a Covariance between Microsoft and eBay for this short 6-period study of -0.0006.
Covariance, because you are multiplying, can produce negative products. And as a result covariance can be negative. Another point to note is that covariance, like variance earlier, is not standardized so the units are also in returns-squared, meaning it too isn't interpretable. So like I said earlier with variance, don't bother trying to interpret it. Focus on correlation because it is the standardized version of covariance. Let's see how this works.
Correlation also shows the relationship between the two stocks. The procedure for scaling covariance is a little sticky. Here's how it goes. We need to divide covariance by the product of the standard deviations of each of the two stocks.
So the formula in cell L34 looks like
Alternatively we could have taken the standard deviations from cells
L19 and M19.
The resulting measure creates a lower and upper bound of minus-1 and plus-1, meaning correlation will always fit within that range. The closer the number is to either end of the range, minus-1 or plus-1, the tighter the relationship between the two stocks. Or looking at our scatter chart, the tighter the points would be to the line.
Zero would imply that there is no relationship.
There is so much to learn about correlation, so we will revisit it later. Here I want you to remember that correlation measures co-movements like covariance, but it is easily interpretable.
And lastly, there is a correlation per pair, meaning you need to calculate correlation for each unique pair of stocks.
So that wraps up the calculations, let's interpret this and also tack on the R-Squared figure which is very common in stock risk analysis, including performance attribution.
Now that we've walked through the calculations, let's talk about risk for Microsoft and eBay.
First, we covered variance and said we don't care about interpreting it, so we left it in four-decimal format, as we will throughout the series. No need wasting our time with it.
Where you see standard deviation, on the other hand, I have changed the format to percent to signify that the units are the same as the original return measure.
Standard deviation is just the square root of variance and this gets
the units back to interpretable percent. Here we can use the caret
symbol again, using .5 as it is the same as 1/2 and as you will recall
from algebra, taking a number to the 1/2
power is the same taking the square root. You can also use the Excel
=SQRT() if you prefer.
Let's take a minute to interpret the standard deviation for Microsoft. Let's say you performed a long-term historical study of the variance of Microsoft and were comfortable with the length of the measurement period of say 60 observations, and you wanted to use the normal distribution to determine the range of possible return outcomes in the future.
Assuming a normal distribution of returns, which we discussed earlier, you could estimate that approximately 68% of observations fall within 1 standard deviation of the average, meaning the average monthly return 2.38%, minus 3.20% and plus 3.20%.
So here, you would expect 68% of monthly observations to fall within the range of -0.82% and 5.58%. You can go one step further and take two standard deviations and say that 95% of the observations should fall within two standard deviations and here the range is -4.02% and 8.78%, give or take some rounding.
Let's take a look at eBay. The average return at 3.98% was higher, but the standard deviation was also higher, at 4.26%.
Those standard deviations of 3.20% for Microsoft and 4.26% for eBay match what we had on the Cover tab, so our x-coordinates for standard deviation are correct. We can see that for this 6-month historical period, eBay had higher risk. In coming tutorials we will work on portfolios to complete the picture.
Now that we have a covariance and correlation between the pair of stocks, let's do a little interpretation. It's a good idea to look at the products here to see what is going on.
During the periods where the returns are in the same direction the product is positive, as in April 2003. When they are in opposite directions, as they were in September 2003, you can see how the product is negative.
What we are going after with covariance is the co-movements. The negative covariance here is indicating that the stocks moved in different directions for that period.
You wouldn't expect this for two stocks in the same sector, but again our measurement period of 6-months is very short and there are a lot of stock-specific factors which move stock prices over time.
Now moving on to correlation, which is the more interpretable statistic as you recall correlation is the covariance between a pair of stocks divided by the standard deviations of the pair of stocks multiplied together. This transitioned the uninterpretable covariance to the interpretable correlation.
To finish off, we see that the correlation between these two stocks over this short time frame was -0.45, meaning there was a negative relationship between the co-movements of the stocks.
Correlation has the nice characteristic in that the range of values is between -1 and +1. You could interpret -1 to mean the stocks are perfectly negatively correlated, meaning that when one moves up, the other moves down.
A correlation of +1 could be interpreted to mean that there was perfectly positive co-movements, so when one moved up, the other moved up. And zero is no relationship, or their movements are independent.
Last is R-Squared which is correlation squared, so in cell M34 we square cell L34. Here the sign goes away and the range goes from 0 to 1. This measure is often called the goodness of fit and mathematically shows how close the data points are to the line of best fit. Another term for R-Squared is the coefficient of determination.
Here because we had such a short period and only 6 observations we wouldn't expect a tight relationship, but this measure allow you to gain confidence in the relationship between the two stocks. We will pick up this conversation later, but I wanted to introduce it to you now.
As with earlier financial modeling tutorials, we saw that Excel offers different ways to make calculations and here we have functions for almost everything we did earlier.
So we could have calculated variance with
=VAR.P(), standard deviation with
=STDEV.P(), covariance with
=COVARIANCE.P(), correlation with
=CORREL() and R-Squared with
Because there is a distinction between samples and populations, the dot-P on the end refer to the population version of the calculation. Dot-S is for samples. Recall, when you know the whole set of data for your study, use the population version. And if you are using a sample to estimate a population, then use the dot-S version, which offers a more conservative estimate.
By way of summary, we tackled four measures of stock risk, and tacked on R-Squared as well. Walking through the manual calculations to help the concepts sink in, but will use functions from here forward. That way we don't have to memorize mathematical notation, as is taught in many courses.
In my view, people shortchange themselves of the ability to think critically about these measures and later, when things get more difficult, it comes back to bite them.
Please remember as a takeaway, the point about standardizing the raw measures and variance is to standard deviation as covariance is to correlation.
In the next episode we will shift our focus from stocks to portfolios and nail down the easier topic first, portfolio return, before moving on to portfolio risk, with the eventual goal of creating and explaining everything that goes into a chart of portfolios.
Here we're focused on portfolio management and our journey has just begun. Feel free to join the conversation on YouTube and forward a link to your friends who might benefit from this series, and our less than traditional approach.
Feel free to join us any time and have a nice day.
To learn faster make sure you are watching the videos visualizing how this works in Excel will certainly help.
See what else you can learn for free at our YouTube Channel. Subscribe here.