Time-series analysis of regressions in Excel (16:22)
Videos are available at one organized Quant 101 Playlist on YouTube (opens in a new browser window).
Welcome. Today's goal is to review time-series analysis and see changes in regression measures over time.
I'm Paul, and while Excel offers a convenient way to learn how to analyze stocks and portfolios for a single point in time, to go to the next level, we need to build a process that adapts to changes over time.
So here we will introduce two types of regressions common in Finance and focus on one, time-series analysis.
While you don't need to participate in the whole Quant 101 data analysis course to get value from this tutorial, it does open Chapter 5, where our end goal is to use linear algebra to perform portfolio return and risk calculations at scale and move beyond a handful of stocks to potentially thousands. This tutorial and Chapter therefore offers a bridge to statistical programming languages.
If you would like to read the video transcript and see Excel formulas we have an accompanying web page found at the first link in the Description.
Let's look at our plan for the day.
First, we will cover time-series and cross-sectional regressions at a high level.
Second, we focus on the details of a time-series regression over rolling periods.
Third, we make a case for visualizing regression output using a scatter plot.
Fourth, we tack on issues that often arise when using statistics to build financial models.
And in our next episode we will see how linear algebra can help us as the number of stocks grows.
First, as we touched on in our tutorial on linear regression there are two types of regressions commonly used in Finance.
The first, and most common, is called time-series analysis which will be our focus here, where a regression is performed for one security over many different time periods.
Almost everyone has heard of a stock's beta coefficient and it is derived from a time-series linear regression for one stock over multiple time periods, often 60 months.
In a cross-sectional analysis stocks are grouped into categories and we regress performance of those groups for one time period.
You can group stocks by sector, or rank them from cheap to expensive, or even by volatility from high to low. There are numerous ways modelers group and rank stocks. We won't cover cross-sectional regressions in Quant 101 but keep them in mind because they are very common in academia and often employ multi-variable linear regressions.
In fact, some of the most influential academic studies relating to stocks were derived from cross-sectional regressions. The Fama-French 3-factor model comes to mind.
Moving on the Step 2, let's add a couple of details about time-series regressions. These one-security regressions, again for many time periods can be for a stock, mutual fund, ETF, or whatever security you are including in your analysis.
Let's start by scoping out the size of time-series data sets, here and in the real and practical world.
Time-series regressions are typically large in that they extend back in time for many securities. Let me give you an example.
To estimate let's do some math. The data set size, or number of cells, is computed by multiplying the number of dates, multiplied by the number of instruments, multiplied by the number of variables.
Our sample time-series data set is for 60 time periods, multiplied by 6 instruments, times 1 variable and that equals 360 cells. Our variable here is the monthly rate of return.
Now let's do a similar calculation on a model created by a typical third-party risk model provider for a large US equity model, for example.
Here, there might be 60 time periods, times 5,000 stocks, times 30 factors, or 9 million cells.
Rolling periods are the rule when it comes to time-series regressions. In this tutorial we will look at a 15-month lookback period, which is far too short to be worthwhile statistically, but helps to visualize issues that arise.
To build the visuals, let's pull forward thirty returns from our Returns tab.
If you want to follow along with this free-to-download data set see the System Setup tutorial.
Let's bring forward some data. In cell J5 use
=Returns!B7. We are interested in the
stock Merck, so in cell K5 use
=Returns!F7 and in cell
=Returns!G7. Then copy that range of
cells down to row 34. (The table lists the
first 3 rows.)
What we will do here is show a window of time that rolls and monitor how the view changes. This is called a rolling regression. We will calculate the 15-month beta coefficient in column M for the period ending 6/30/04 and starting with the return from 4/30/03.
When we move on to the next month ending July 2004, the previous starting month of April 2003 drops off and our starting month becomes May.
In the middle of the table, so starting with the 15-month period ending
6/30/04 let's calculate beta. In cell M19 run a
making sure the stock Merck is the y-variable and the Market is the
Next, copy that down to cell M34. (The table lists rows 19 to 20)
Here we used the beta coefficient, but you could pull from the regression the alpha, correlation, even R-squared and track changes over time.
We covered how to generate a scatter plot in the tutorial on linear regression so head back there to see how we also added the line of best fit and the formula for the line and R-squared right on the chart.
The formula we typically prefer for Finance is in the form y = a + bx + e because it better aligns with the terms a for alpha and b for beta. The format shown by default in Excel's scatter plot is in a rearranged form y = mx + b where m is the slope and b is the intercept.
So the beta of 1.5088 in the chart lines up with the beta of 1.51 in column M rounded to two decimal places.
We also like to see the R-squared and it was 0.7554, a fairly high figure as we learned in our tutorial on interpreting correlation.
Again, keep in mind this is for one period in time, right? Let's now look another point in time, 15 months later.
So for Exercise 2, let's do the same thing except using the 15-month period ending 9/30/05.
Here we can make two observations, first the beta changed to 1.14 and the R-squared dropped materially to 0.3429.
To expand on that, let's create a line chart showing the change in the beta coefficient over this time-series.
There are several ways to do this and since we are only interested in the Date and Beta columns, I think the fastest way is to select the whole block from J19 to M34 first. Then click the Insert menu, followed by Line and select the first line chart.
Okay, now right-click the chart, then click Select data. In the Select Data Source dialog box click MRK and hit Remove and do the same for Market. From here you can edit this on your end to suit your style.
This chart shows the stock's beta coefficient for 16 time periods using 15-month rolling windows.
So what is the point here? We are attempting to illustrate how beta coefficients can change in time-series models.
See how Merck's beta starts at 1.51 then peaks at 1.73 and then begins its steady, inevitable decline to 1.14 at the end. Of course you never know when beta will peak. No alarm bell goes off, you just need the confidence to identify trends and good forecasting techniques will help.
I also wanted to return to the data point in September 2004 when Merck, underperformed by nearly 22 percent. This is important because resulting statistics can change materially in the presence of a material outlier like this. Notice how the beta changed from 1.40 to 1.62 just with the addition of that one data point.
Of course, estimation periods longer than 15 months are more common, so take these figures with a grain of salt.
The point was to illustrate how outliers can impact financial models if not taken into consideration. This explains why at times models don't perform as you would expect.
For example, a coming tutorial on CAPM will show how important beta is when setting expected returns for stocks to determine what is baked-in to market expectations. A poor estimate of beta can skew the estimate materially, negating a lot of quality work on the alpha side of the equation used as the objective function when optimizing portfolios.
This offers a logical transition to our next topic.
For Step 3 let me offer my case for always visualizing data before using summary statistics to make forecasts for the future. I have found visualizing the data as a first pass ensures that you are using the right test or making adjustments for distributions that don't fit linear models.
Keep in mind, an employer can find anyone to pull statistical measures but a differentiating skill for you is to build your analytical mindset and part of that comes from actually looking at the data.
In Step 4, we will walk through 11 issues associated with estimation of statistical tests. As we go through these consider how visualizing the patterns will help you identify the issues.
I won't dive too deeply into each one here and we covered the first few earlier, but keep these in mind and look up the terms on your free time to get a better feel for each one.
We saw the impact of an outlier earlier with the one month return on Merck's stock when investors were punished by the news of deaths due to the Vioxx drug in September 2004. We can see how the slope of the line of best fit, changed simply due to the inclusion of this one data point.
A curvilinear relationship occurs when there is a curve in the scatter plot. We saw one of these in the financial modeling tutorial on correlation.
The term 'linear' in linear regression means that we need data that can be estimated with a line. Data that takes the shape of a curve can be transformed at times to a linear distribution and visualizing this is often the first way you catch it.
Next, let's cover non-normal distributions. In a previous chapter, recall we created a histogram and it was helpful to see the distribution of returns on stocks.
In Finance, even though the occurence of returns in the tails is more common than the standard normal distribution would say, the assumption is that they are normal. Other factors you measure may not resemble a normal distribution at all and other tests will be more valid.
We talked about hidden variables and the example we used was the correlation between the sales of umbrellas and traffic accidents and how, while the two variables may be highly correlated, there is likely a missing variable, the level of rainfall.
For a related term, spurious correlation. Highly correlated variables may either be due to coincidence, like the hemline theory, or the presence of a third, unseen variable. We saw this in our example of the sale of cars and dishwashers that have high correlation, yet are more dependent on a third variable, economic growth.
Let's cover a few new examples starting with seasonality. Some company financial statements are impacted materially by seasonality.
In the United States, sales for retailers jump during the December holidays. Tax preparation software firms have higher sales in April and firms that sell seeds and fertilizers have higher sales in the springtime.
One method for adjusting for seasonality is to use annualized comparisions.
Next, log-linear patterns may be evident when growth rates are taken into consideration. Sales of a company may not exhibit a linear relationship, but one that is exponential.
The distribution of companies by market capitalization often takes this shape.
Next, autocorrelation, also known as serial-correlation and cross-correlation, is a difficult concept to grasp but very important in the modeling of stocks.
The easiest way to think of it is as a correlation coefficient. We're comfortable with that and we can see two visuals here for regressions over different periods of time of two variables, the return on Merck and the return on the Market, right?
Now, what if you calculated a correlation coefficient except you used Merck returns and regressed that against Merck returns lagged by one period? The result here would allow you to see if returns for one period are correlated with subsequent periods.
The academic literature on the topic is highly debated for returns as one of the assumptions of the Efficicient Market Hypothesis is that equity price returns exhibit no autocorrelation.
In another area autocorrelation is not debated, meaning it is statistically significant and that is in the area of forecasting risk. Higher periods of risk often cluster, so it is common to see significant autocorrelation and appropriate adjustments are made in equity risk models.
Multicollinearity occurs in a multi-variable regression when there is a high correlation between x-variables. We won't work with more than one independent variable in Quant 101 but imagine if we used the S&P 500, a benchmark of large US companies, as one x-variable, and another, like the return on the Russell 2000, a benchmark of small US companies.
In this case multicollinearity may exist because large-cap benchmarks typically have high correlation as rising and falling stock markets often affect all equities at the same time to some degree.
For heteroscedasticity it helps to look at a scatter plot. Imagine in the first chart if the data points further out to the right were further away from the line of best fit, as if it looked like a cone.
What this means is that the variance of the errors, increases or decreases at different points along the distribution. This is another pattern advanced modelers often review so they can make adjustments.
Okay, there you have it, a bunch of issues to look out for.
By way of summary, we went through the important aspects of time-series analysis and discussed how they differ from cross-sectional regressions. Time-series analysis is performed for each stock over multiple periods of time. Also, we covered the size of the data sets provided by third-party risk model providers.
We worked through examples of time-series regressions to see the impact of changes, especially when outliers are present. We also discussed some of the issues that can pop up with regression studies which makes the case for visualization pretty compelling.
For those who have been with us and for those new here what you will find in our tutorials is not only how to perform calculations but also how to analyze the inputs and outputs to make your financial modeling and time-series forecasting more predictive. Really, shouldn't this be the ultimate goal in the end?
In the next episode we will discuss how linear algebra helps financial modelers crunch millions of data points for the purpose of increasing the portfolio manager's confidence in return and risk estimates.
Stay tuned for more free opportunities to learn. Feel free to join us any time and have a nice day.
Examples provided above are best visualized in Excel, I encourage you to also watch the videos as they will speed up the learning process.
Please also check out our YouTube Channel and subscribe straight from here.