Learn to get your Excel system set up for data analysis
Watch the Video
Install Excel Data Analysis ToolPak and Solver Add-In (17:27)
Videos are available at one organized
Quant 101 Playlist on YouTube (opens in a new
Welcome. Today's goal is to set up our Excel system and cover other
helpful topics related to using a spreadsheet for data analysis.
I'm Paul, and I get frustrated by how long it takes to get systems
ready before you can get to the fun part, analyzing data.
So here we will walk through the considerations for different operating
systems and Excel versions, going step-by-step, to save you time.
If this is your first time with us, this tutorial sits within a
series of 30 financial modeling tutorials, called Quant 101. The way it
works is I ask you to duplicate all of these models on your system as
a way to learn fairly advanced Excel skills and to learn about modeling
stocks. So if you are sticking around for that, I will also cover how
to download and import a free data set used throughout the series.
For those watching the video on YouTube, if you want to slow all of
this down and see the instructions in print, everything I say here is
duplicated on a web page. The first link in the video's Description
section goes straight to it.
Let's start with an overview.
First, we will run through advantages and disadvantages
of spreadsheets for financial modeling.
Second, we review my setup and cover appropriate versions of
our spreadsheet of choice, Excel, for Windows and Mac users.
Third, we will install the Data Analysis ToolPak for statistical
analysis and the Solver Add-in for optimization.
Fourth, we cover the one dataset we use throughout the playlist
and how to freely download it so you can get started right away.
Fifth, I'll quickly discuss what institutional investors use instead of
spreadsheets, meaning statistical programming software and their
advantages and disadvantages.
And in our next episode we will review the three methods for
calculating returns on stocks.
Step 1 - Why a Spreadsheet?
Before we get started with the technical part, let's take a moment to
cover the advantages and disadvantages of spreadsheets which will help
us later in our discussion of comparisons with the programming world.
a. Advantages of Spreadsheets
First, for advantages, spreadsheets are:
They are cheap or free and many people in Finance have already
memorized many of the functions.
Easy to learn -
The grid and cell format, with columns named with letters, and rows
named with numbers, make it easy to draw relationships and make
quick calculations, even on modestly large data sets, without
having to learn programming.
Easy to visualize -
The ability to visualize and customize tables, charts, and fonts make
spreadsheets a viable tool for creating presentations. After all,
there is a comfort level with being able to see, touch and edit the
Built-in functionality -
Most beginner and intermediate-level financial and statistical
operations can be performed in a spreadsheet. In later tutorials you
will see how conditional logic can be written in cells. Also Visual
Basic for Applications, or VBA, can be programmed to automate common
tasks. It is also easy to update cells in a spreadsheet live, like
security prices with links to financial data providers.
b. Disadvantages of Spreadsheets
Now, for the disadvantages, spreadsheets are:
Error prone -
By many estimates, as many as
80% of all spreadsheets contain errors. Adding rows and columns can
invalidate cell references, requiring significant modifications. On
top of this, many firms do not train users of spreadsheets even when
mission-critical functions reside in their work.
Rarely Audited -
Auditing spreadsheets is difficult. The cell-by-cell structure
requires a review of each cell to be 100% certain of accuracy. Many
large financial organizations build elaborate spreadsheets
because the users do not have traditional programming expertise.
The programming concept of version control is difficult to maintain
with spreadsheets, especially in those used by several people.
Difficult for complex models -
While many tables can be stored in a spreadsheet, the relationships
and enforcing referential integrity as is done in a database is not as
robust here. So complex models can break down in a spreadsheet.
There have been several high-profile examples of public firms who
incurred large trading losses as a result. In addition, faulty
reports from academia have led to similarly embarrassing situations.
Formatting is slow -
As you will see, the added flexibility associated with making data
look pretty in a spreadsheet can get in the way of being able to
quickly generate a chart to visualize statisticial relationships, for
c. Why Excel?
So why are we using Excel?
First, a spreadsheet is helpful for organizing and visualizing our
analysis, and for learning the concepts before moving on to
programming, should that be the path you choose.
Microsoft Excel still leads in market share over Google Sheets and
Apple Numbers, however the lead is narrowing.
Any search for technical jobs in the field of Finance will generally
require some Excel experience, so this is our spreadsheet of choice.
Step 2 - Versions of Excel
Let's move on to the versions of Excel and operating systems.
a. My setup
On my end, I have a Windows 7 operating system with a local
installation of Excel 2010. This obviously isn't the latest version,
but any later version should suffice.
b. Versions of Excel for Windows and Mac
For the Mac, common Excel versions include 2011 and 2016. For Windows,
currently the versions are named 2010, 2013 and 2016.
Some of the functionality across versions may not be exactly the same,
but will be close. Other popular spreadsheet options, like Apple
Numbers and Google Sheets are available, but here we won't take the
time to translate functions from one to the other.
Step 3 - Excel Add-Ins
Okay, let's get these Add-Ins installed.
a. Set up your spreadsheet
Everyone should run through steps one to four, and if you don't have
the necessary Add-Ins installed already, then we will move to Step 3b.
Run Excel locally -
First off, run a locally-installed version of Excel, meaning one
installed on your hard drive. This will allow you to work while away
from the Internet. Also, some advanced functionality, like Solver, is
only available on the locally-installed version, as of the time of
this recording. So the version of Excel included with online Office
365 may not work for all of your data analysis needs.
Create a working directory - Next, create a working
directory for your files. The name doesn't matter, but short names
are preferred. Personally, I stay away from spaces and use the
underscore character instead.
Create a spreadsheet file for Chapter 2 -
Third, a quick comment for those looking to complete our series, I use
the convention Quant_101_01_Getting_Started for the
name of this Excel spreadsheet file for Chapter 1, but yours doesn't
have to be as descriptive. In future tutorials name the tabs the same
as mine as it will save time and will help to keep cell references to
other tabs straight. It will be much easier to mimic what I'm doing.
Also, it is a good idea to use the same columns and rows as me, if
you want to make this go a whole lot quicker. You don't need to
style your pages with colors and different backgrounds like mine,
but it is a good idea to spend some time making it look nice as some
day you may want to show off your project to a potential employer.
See which Add-Ins are already running -
Fourth, see which Add-Ins are already running. To do this, click on
the Data menu and if you see
Data Analysis and
Solver you are good to go.
b. Install Add-Ins
Now, let's move on to step 3b, and a quick note, the Add-In name for the
menu item is called Analysis Toolpak. It provides a
variety of statistical functions; including, analysis of variance,
correlation, random number generation, regression, t-tests and a
The menu item Solver is installed
using the Add-In named Solver Add-in. We will use it
to find a spot on a curve, with x-and-y coordinates, that maximizes the
ratio of return over risk, for a portfolio through a technique called
Now if you don't have these pre-installed on your version of Excel,
follow these steps.
See which Add-Ins are running -
Manage Add-Ins -
Click Manage: with the dropdown
Excel Add-Ins, then the
Select both Add-Ins -
Check boxes for Analysis ToolPak and
Solver Add-In, then hit the
If Solver doesn't show up -
Locate and navigate to a directory with the file
solver.xlam and click
Validate that Add-Ins are running -
Click on the Data menu item and you
should see them.
Step 4 - Data set
Let's move on to Step 4 to get that 61-row and 7-column data set.
a. We will use the Returns tab for every chapter
Here, we need to download the data file and create a data tab called
Returns. The goal will be to include this data
tab with each Chapter file. So basically, you will
carry it along with you, attaching the same tab to each spreadsheet,
which will prevent us from having long references to other files. Also,
creating links and troubleshooting will be much easier.
I should mention this is an available data set with four stocks already
selected. Later, in Chapter 2, I will show you how to generate returns
should you want to customize your own data set.
b. Download data and create the Returns tab
For now, to get the sample data set, follow these steps:
Navigate to factorpad.com -
First, navigate to factorpad.com and follow
any of the links to the
Quant 101 page which
includes descriptive information, some Q&A, download instructions
and a link to the outline of the videos.
Review instructions -
Where it says Download Data Here review the
instructions. You can examine the
returns.txt text file with a left-click, and you
can see it is a tab-delimited text file, which can be easily imported
Save the file -
Right-click on the file and all browsers give you the option to
Save File As..., which may be
worded differently depending on your browser.
Create a Returns tab -
Click on the last tab and double-left-click on the name and give it
the name Returns and now you have a blank tab.
Open returns.txt -
Click File, then
Open and then select the file
returns.txt to open the
Text Import Wizard.
Text Import Wizard -
The Text Import Wizard shows the first rows of data and if it is
jumbled we need to let Excel know it is delimited so select
Delimited and then the
Excel identified that the file was tab-delimited, so you can now
see how it properly aligned the columns for the import, seven in
total, the date and six columns of data for each date. If it doesn't
say so already, make sure to select
Tab-delimited and then click
Select date format -
For the final step, we can instruct Excel to import the first column
as a date by changing the toggle to
Date and the format
MDY identifies the month, day and year. Then
click Finish. This creates a new
file that you can save using File,
then Save As in case there is a
Copy data to Returns tab -
Select the whole range of data and hit the shortcut
Ctrl-c, then navigate to the
newly created tab, going to cell B6, then use the
shortcut Ctrl-v to paste the data.
Starting at B6 is important because as you follow
along with what I'm doing it will save you time to keep the same
cell numbers. Okay, we're all set.
Create a template -
As mentioned earlier, we will create a new spreadsheet for each
Chapter and carry over that Returns tab for
each spreadsheet file. One way to do this is to make a copy of this
whole file, calling it template.xlsx.
So each time we start a new Chapter, we can just make a copy and the
Returns tab is already there for us. This saves us
from having links across multiple spreadsheets, which I have found to
be more difficult to manage, plus we will run some unique
calculations on the tab depending on which Chapter file we are working
c. Use decimal place convention
One more note about this data. If you noticed, all returns are in
decimal notation. We have to be consistent, or down the road our math
will get messy. So we will always keep the data in decimal format, and
use cell formatting to present it. So a return that looks like 0.01,
is the same as 1%, but we'll always keep it as 0.01. In Excel, you can
always visualize it how you like.
Step 5 - Why not programming?
Next, I want answer a question that comes up. Why not teach the series
with programming instead of a spreadsheet?
Earlier we covered advantages and disadvantages of spreadsheets. This
series intentionally uses a small data set so we can benefit from
several of those advantages from earlier. This means we can visualize
and work with the data to learn the concepts, without having to write
In the real world this is done with a statistical analysis package like
MatLab, R and Stata, or straight in traditional programming languages
like Python, Java or C.
Our sample data set covers returns for four stocks, an index and
Treasury Bills over monthly periods for five years. Imagine how large
the data set would be if you covered 5,000 stocks and daily returns,
and this leads us to a discussion of the advantages and disadvantages
of programming, because if you want to take your knowledge further, to
the Institutional level, then understanding how this is done at scale
a. Advantages of programming
Programming has the follow advantages.
Higher pay -
Learning these same concepts in a programming language might double
your salary versus what you could do in a spreadsheet.
Institutional-level firms are willing to pay for this.
Version control -
Programming allows you to hold everything constant, roll-back updates
and maintain all of the code in one working state called a release.
Similar to when those Apps on your phone update. Rather than code
being in a constant state of change, version control allows for a
clean version to be published, then another round of work can be
started on another version, which helps to isolate bugs.
User input validation -
Programming also allows you to validate the data input by the user.
Think about how Excel was written to protect users from coding.
Everything you do in Excel – a drop down menu, a dialog
box, a formula bar – sits on code. Almost every communication
device you use – a phone, a tablet, a computer – is built
in code that prevents you from making mistakes. Your brokerage
accounts, bank accounts and social media Apps are all written to
ensure that everyone follows the rules.
Speed is a factor as well as statistical programming languages can
help you analyze big data quickly.
Through looping, like in a monte carlo simulation, you can run
hundreds, or thousands, of cases through automation.
b. Disadvantages of programming
Of course programming also has disadvantages.
Steep learning curve -
It can take years to learn and implement these concepts in a
Less standardization -
There are a variety of programs, versions and interfaces people use
in programming so there are fewer experts to focus on the
improvement of one program. This is especially the case with
The cost associated with an implementation using programming is much
higher when you consider the cost of retaining talent, planning and
rolling out releases, versus ad-hoc analysis performed in a
No GUI -
Giving up the graphical user interface is a major hurdle for most
people and focusing instead on learning the ins-and-outs of a
programming language by editing text files is a reason Data
Scientists and Financial Engineers are in such high demand.
So to summarize, we discussed spreadsheets and why they're well-suited
for our first exploration of quantitative equity portfolio management.
We discussed Excel, set up the required Add-Ins then imported a sample
data set and created a template for future chapters.
Then we closed with what our next steps could be, with programming,
should you have a passion for portfolio management and want to take
your career to the next level.
Step 6 - Next: System Setup
In the next episode we will start Chapter 2, with our Returns tab in
place and start walking through 3 methods for calculating return:
arithmetic, geometric and logarithmic.
Feel free to join us at any time, and have a nice day.
This text was written to be used in conjunction with freely-accessible
videos on YouTube.
I encourage you to check out our YouTube Channel. Subscribe straight