An introduction to Linear Algebra and Arrays in Excel (17:13)
Videos can also be accessed from our Quant 101 Playlist on YouTube (opens in a new browser window).
Welcome. Today's goal is to review the purpose and benefits of array formulas in Excel and then introduce concepts typically covered in Linear Algebra courses.
I'm Paul, and if you're like me then keeping track of all of the cell calculations for large data sets can be confusing and time-consuming.
So here we will see an underappreciated tool built into Excel that helps us eliminate intermediate calculations, thereby saving time and space.
You don't need to participate in the whole Quant 101 playlist to gain something from this tutorial. Our examples here relate to building fairly complex financial models capable of summarizing thousands of calculations into one cell and arrays offer a convenient solution.
If you would like to read along and see the formulas, a web page with the video transcript is found at the first link in the Description.
Here is our plan for the day.
First, we will introduce array formulas in Excel by covering what they help us do quickly.
Second, we cover tips associated with how to manipulate and use array calculations plus their advantages and disadvantages.
Third, we broaden the discussion to concepts in Linear Algebra used in the sciences like engineering, econometrics and even for video game programming by covering the structure of basic arrays.
Fourth, we demonstrate an Excel array function
=MMULT() and see how
built-in functions save us time.
And in our next episode we will review the mathematical operations addition, subtraction, multiplication and division with arrays.
Here we start with the underexplored skill within Excel many people associate with power users, but they're really fairly easy to master with a little practice.
We also make the distinction between array formulas in Excel and array functions, but will start with their purpose first.
Basically, an array is an arrangement of data. So any data stored in a spreadsheet is an array. Describing data as arrays, instead of by rows and columns takes us in the direction of programming structures.
Arrays allows us to lean on the computer to do much of repetitive work, all at once.
In a nutshell, array formulas allow you to eliminate intermediate steps by performing multiple calculations at once which will keep your spreadsheet more compact, speeding things up and thereby saving you time.
As we will see, array formulas can generate results across multiple cells or aggregate multiple calculations into one cell.
Arrays also offer a level of protection by warning you before editing cells within an array, which can be helpful for spreadsheets used by several users.
Let's start off with an example of an array formula in Excel with output in a single-cell and see how it eliminates the intermediate steps.
Here our focus is on financial modeling of stocks but these array calculations can apply to any data set really. You can grab our data without signing up for anything and our System Setup tutorial shows you how.
We'll walk through the calculation of geometric returns for one stock over a 4-month period.
The compounding of returns from one month to the next requires that we
add one to the monthly return and then multiply the returns together.
We can simplify this with the
=PRODUCT() function. We then
subtract one from the result which gives us the total geometric return
The slow way to do this and normally the first way people learn it in Excel is to use the laborious method of treating each cell independently and create two or three columns for the intermediate calculations.
I will show you the long way first then simplify the calculation in one cell using an array formula.
To start out, we pull forward dates and monthly returns for four months
=Returns!B7 in cell
E8 and copy that to F8 and down three
From there in cell G8 we add 1 and then copy it down 3 rows.
To show the accumulation of geometric returns for each period in column
H we copy over the first monthly return to cell
H8 by referring to it with
As we accumulate geometric returns we need to multiply the previous
monthly return by the current monthly return, so in cell
H9 we have
get the cumulative return with compounding for the first two-month
period ending in May at 1.0165, or 1.65%. We can copy that down two
The product in the last month, so July, is 1.0909 and if we subtract 1
in cell H12 with
the total in percent terms is 9.09% for the whole 4 months.
We could find the same figure using the Excel function
=PRODUCT(G8:G11)-1 in cell
So in the end, column G is used to calculate the return for the whole period and column H is used to show the growth-of-a-dollar for each month, ending with the final return of 9.09% that matches. Good.
This is helpful to learn how geometric calculations work but if we just wanted a total we could skip all of this and use an array calculation instead. Let me show you how.
In a second table let's pull forward dates and returns like we did earlier.
Now, let's shift our attention to the array formula in cell
K12. It says
We always start with inside the parentheses and here we use the
to specify the range from F8:F11 and with the
+1 Excel will add 1 to each element
of that range before proceeding with the multiplication. At the end, we
tell Excel to subtract 1.
Now to tell Excel this is an array calculation we finish it off with the
and this will surround the formula with curly braces, which we will
return to in a moment.
So there you have it, the same answer, 9.09%, but without all of the fuss of creating extra columns for intermediate calculations. Yes, you can't see everything that goes into it, and that makes some people uneasy, but with arrays you have to get comfortable that you did it properly and let go.
Arrays formulas in Excel also allow you to output a multi-cell array.
Let's say we wanted to duplicate the range of cells
G8:G11 in column M quickly, we could
select the range from M8:M11 and input
=1+F8:F11 and use that same
combination and Excel will populate the range for us, and also add the
curly braces to remind us that it was an array calculation.
Also notice how each cell in the output array has the same formula.
Now for Step 2 let's talk about tips for how to use array formulas including that point about curly braces and some of the advantages and disadvantages of array formulas in Excel.
Now that we know how to enter an array formula let's try to edit it.
Let's go to cell M8 and try to delete it. Excel sends the message that "You cannot change part of an array". This aspect of array formulas ensures that you keep a contiguous range intact. That way you're confident you don't have any rogue cells with incorrect formulas within the range.
So let's say we wanted to point the range in M8:M11 to
the range K8:K11 instead of F8:F11.
Here we need to delete the whole range and start over with
=1+K8:K11 followed by the keystroke
we get the same result, 9.09%.
Next, so now that we know what arrays do, what are their advantages? First of all, arrays speed up the time it takes to create a spreadsheet.
Second, arrays save space. Yes our spreadsheets here are small, but imagine if you were following five thousand companies and were trying to calculate geometric return the old way. With the new way you'd only need the range of returns, without the need for extra columns thus saving disk space.
Third, arrays improve accuracy. Of course, the less manual work you do the better. Plus, when you see an array formula across several cells, it all goes together, meaning you can't edit part of an array as we saw earlier.
Fourth, arrays offer a bridge to traditional programming languages for those who want to scale their models to handle bigger and bigger volumes of data. Arrays are the preferred means for storing data within a program as they aren't structured with cells like in a spreadsheet.
Keep in mind, arrays exist everywhere programming exists. So that smart phone you carry is packed with arrays, thousands or millions of them. The array is everywhere.
Now, for the drawbacks. First, creating and editing arrays is a bit cumbersome at first. We have to memorize that special keystroke combination and we have to delete the entire array and start over if we make a mistake.
Second, sharing the editing spreadsheets across teams requires that each user understands how arrays work.
So after taking all of this into consideration, plus the fact that most academic research on the modeling of stocks is performed using statistical programming languages that use arrays extensively, in the end, for us, the benefits outweigh the drawbacks.
That offers and nice bridge to our next topic.
For Step 3, let's shift the discussion from just arrays in Excel to the more broad-reaching topic and branch of Mathematics called Linear Algebra, which some refer to as matrix math or matrix algebra.
Linear Algebra is used in many fields like chemistry, cryptography, game development, geometry, graph theory, heat distribution and even Sociology. Also with data analysis gaining importance in the 21st Century, linear algebra is a must-learn feature in statistical programming languages for those who want to analyze big data.
Here specifically we focus on financial time-series studies leading to our exploration of risk models and Linear Algebra is broadly used to solve complex optimization problems and for generating covariance matrices. Large data sets are the norm for investments and arranging data in arrays here in Excel conveniently helps us learn how to scale investment practices down the road.
Let's go over common conventions and terminology. These may have different meanings across computer programming, Linear Algebra and Excel. On that let me introduce the first somewhat controversial point and that is on the term array itself. To many it is a generic term meaning scalars, vectors and matrices are all arrays.
Second, you commonly see arrays contained within square brackets and in Excel I used lines to identify arrays. In textbooks you may see them enclosed in parentheses.
Next, I should mention, always, I repeat, always use row-by-column convention. Row first, then column. An array with dimensions of 2-by-3 has 2 rows and 3 columns. Rows first, then columns.
This is consistent in the programming world as well.
A scalar has dimensions of 1-by-1 and is typically used to multiply, or scale, other arrays.
| 1 |
A vector is a one-dimensional row or column of data. In the first case we have a row vector with dimensions of 1-by-3, with 1 row and 3 columns.
| 1 2 3 |
A column vector is a one-dimensional column of data, here with 3-by-1 dimensions it has 3 rows and 1 column.
| 1 |
| 2 |
| 3 |
A multi-dimensional array is called a matrix. It can be a 2-row and 3-column matrix.
| 1 2 3 |
| 4 5 6 |
Or a 3-by-2 matrix.
| 1 4 |
| 2 5 |
| 3 6 |
In this Playlist we have been working with a covariance matrix for stock returns and it takes the shape of a square matrix with an equal number of rows and columns.
With that, let's move to Step 4 and use a built-in array function to measure the return on a stock portfolio.
The calculation is the sum of portfolio weights to each stock, in this case Microsoft and eBay, times the rate of return for each stock.
As covered earlier with our calculation of geometric returns we saw that Excel combined many calculation and returned a single-cell result. That is what we want here in cell K31.
Let me first make the distinction between array formulas and array functions.
When we made the calculation of geometric return earlier we used an array formula, building it ourselves.
Array functions on the other hand are those that are programmed to perform specialized calculations that pack a bunch of calculations together. Many also follow the rules of Linear Algebra. Examples include:
=FREQUENCY()- to create frequency distribution
=TRANSPOSE()- to transpose an array
=MINVERSE()- to invert an array
=MMULT()- to multiply arrays
=MDETERM()- to find the determinant of a matrix
Earlier in the Playlist, we used the
=FREQUENCY() function. There we
created a bins_array, which provided breakpoints to separate the
stream of 60 returns for each company into bins which we plotted using
Think about the manual effort it would require to create a table like
this. With the
it takes a split second.
Again, these are all submitted using the
While we are here, another time-saving technique in Excel is naming ranges of data.
After naming a range, you can perform calculations with the names and let Excel go find the data. Names are basically substitutes for a range and this is similar to programming.
Watch the Name Box in the top left, when I highlight the cells in this row vector we can see the name change to "weights_vector" because I already gave that range a name in my spreadsheet.
The quickest way to name a range is to highlight the range
E31:F31 and in the Name Box type
"weights_vector" followed by
To edit or delete named ranges go to Formulas, then Name Manager and here you see all named ranges for the whole spreadsheet file.
Okay, let's use the named ranges and calculate portfolio return in
cell K31 with
followed by the
keystroke combination and there you have it, 3.18%.
What Excel did was take the sum of the 50% weight of Microsoft times its return of 2.38% plus the 50% weight to eBay times its return of 3.98%. So it performed two multiplications and one summation in a one-cell output range. Again, the curly braces identify it is an array calculation.
If you haven't worked with named ranges before now, they come in handy and I think you will find many uses for them. They are not to be feared.
There you have it, portfolio return calculated using the rules of Linear Algebra and an array function in Excel.
So that's it for arrays, not bad huh? Let's wrap this up and head off to see more operations in the next video, including the rules for cell dimensions.
By way of summary, we saw how array formulas can streamline and make our spreadsheets smaller and faster with single-cell and multi-cell outputs.
We walked through the procedures, advantages and disadvantages of using arrays in Excel followed the branch of mathematics called Linear Algebra. We saw the general term array and other structures called scalars, vectors and matrices.
Of course we could keep going with Linear Algebra, but since our focus is on the financial modeling of stocks and measuring portfolio return and risk we only need a few operations. Excel offers all we need as a learning platform before replicating all of this in a statistical programming language like Python, MatLab or R.
In the next episode we will explore rules associated with addition, subtraction, multiplication and division before finishing Chapter 5 with Portfolio Algebra where we tack on the calculation of portfolio risk. Then we tackle CAPM in Chapter 6 and Portfolio Optimization in Chapter 7 using arrays and named ranges, almost exclusively.
A material reason for the whole founding of FactorPad was to disseminate knowledge about how stock portfolio risk is calculated and managed at Institutional investment shops around the world. So if this interests you then please connect with us.
Feel free to join us any time and have a nice day.
To learn faster make sure you are watching the videos because they supplement the text-based tutorials found here.
To be reminded of upcoming tutorials you can subscribe straight from here.