FactorPad
Faster Learning Tutorials

Financial Datasets & Software: Buy or Build?

While most institutional investors use Bloomberg, S&P, FactSet or Thomson Reuters, is the trend towards free or open source software (FOSS) too powerful to ignore at this point in your career?
  1. Data & Analysis - Introduce services offered by providers and why the topic is timely.
  2. Perspectives - Outline industry players and our scope here.
  3. Requirements - Describe what a firm needs to succeed in today's competitive marketplace.
  4. Buy or Build? - Detail pros and cons plus recent trends.
  5. Learn more - See how you can gain quantitative skills faster.
by Paul Alan Davis, CFA, October 27, 2018
Updated: October 27, 2018
Most open-source builders haven't had the benefit of looking behind the curtain at for-profit proprietary data vendors, so sharing this view is part of my mission. Keep reading.

Outline Back Next

~/ home  / career talk  / buy or build


Should you buy Bloomberg, S&P Capital IQ, FactSet, Thomson Reuters or build using Python or R, SQL Databases, Google Cloud or AWS with Web Scraping?

Intermediate

As with nearly all of our content, you can access them both in print and in video.

Career Talk in Video

Bloomberg, S&P CapIQ, FactSet, Thomson Reuters: Buy or Build? (25:11)

Videos can also be accessed from the Career Talk by FactorPad Playlist on YouTube.

Video Script

Introduction and Outline

Welcome, I'm Paul, this is Career Talk and today's topic is:

  • Financial Datasets & Software: Buy or Build?

You likely know that today most financially-trained professionals in the investment space employ services from the big four vendors: Bloomberg, S&P, FactSet and Thomson Reuters.

On the other end of the spectrum, so those with scientific and technology training, more and more, are building processes with the Python and R programming languages. They host SQL databases in the Cloud (Google Cloud and Amazon Web Services AWS) and through web scraping collect free data from Twitter, the SEC EDGAR or Federal Reserve FRED websites. Then with inferential statistics, artificial intelligence and machine learning algorithms they make data-driven decisions.

Here we discuss both angles with the goal of not only answering whether to buy or build, but also offer suggestions for how to best align your career in the future.

Here's the plan. First, I'll describe services offered by Financial Data Providers and why the topic is timely. Then Industry Perspectives and our scope here. Third, we see how institutional firms keep score as we look to the future. Fourth, we decide whether to buy or build and fifth, I'll show you where you can learn more.

This exercise is especially useful for those who haven't seen what is available from the big four data providers (Bloomberg, FactSet, S&P CapIQ and Thomson Reuters). I've had some experience with each one. I'll also point you to free datasets later.

(Note: this is not investment advice)

1. Data and Analysis Tools

Let's start with the opportunity today and then cover the main services people in Finance require, specifically those who manage other people's money.

The Opportunity

Since the 80s, we've all witnessed Tech firms eliminate industries and jobs, like travel agents, book stores, taxi cab drivers, bridge toll collectors, bank tellers and stock traders. The trend will continue and in the investment space two major inroads have already taken place.

  1. Robo-advisors undercut traditional relationship-selling financial planners, brokers and wealth managers.
  2. Index funds and smart beta offerings grew wildly due to their lower fees and targeted approaches.

What makes this disruption possible is free and open source software and the Web 2.0 sharing economy; so social media, mobile devices, Apps plus the inexpensive computing power distributed globally in the Cloud.

So not only do people share faces, pictures, opinions and every boring moment of their lives, but they also share valuable data. Now with transparency movements within government entities and publicly-accessible APIs all of a sudden lots of valuable financial data is available to the public. All you need to know is how to harness it.

Add on top of that, anyone can go to the Cloud and either from the Linux command line, or programmatically fire up a row of servers at Google Cloud or Amazon Web Services (AWS), process the data, then shut the servers down, thereby minimizing costs and eliminating the need to maintain hardware.

Don't get me wrong, what is available for free pales by comparison to datasets provided by Bloomberg, FactSet, S&P and Thomson Reuters, my point is that the trend to share data has started and will continue. Active managers already struggle to outperform their benchmarks using currently available financial data, so proactive firms are seeking non-traditional data sources and new methods of analysis.

This is the reason I'm helping investment advisors and why I spend my spare time creating freely accessible tutorials, to spread the word.

So there is the opportunity and why I think this topic is so timely and important.

The five services

Before I break out the five main services, let's first see why all professional investors regardless of philosophy hook in to these data providers.

Four Investment Philosophies

I use a depiction of four approaches firms use to select investments for our area of focus, liquid public securities, and specifically stocks.

Whether a firm is passive, fundamental, technical or quantitative, it doesn't really matter. I've expanded on this elsewhere. All have merits, no one philosophy is best and each has its strengths and weaknesses.

Most firms I've come across employ several, if not all approaches, so we can boil the philosophy discussion down to a combination of these four approaches, with these questions.

  • What degree does the firm use company fundamentals?
  • Does the firm use technical patterns?
  • Do investment decision-makers perform statistical analysis?
  • How closely do they hug a benchmark?

So all firms in their attempt to outperform their benchmark need datasets and analytic software tools to select, allocate, trade and measure their effectiveness. Their investment committees and Boards hold them to it, right?

Now let's look at what services professional investors need.

Charting

First, depending on how much emphasis is given to technical patterns, charting is a service relied on by many investment professionals, particularly traders.

In one quick glance, investors can summarize what happened in the past. Now whether this is helpful in the future is a hotly debated topic in the practical and academic world.

In my experience chart-following traders have more influence on the market than academically-trained investors are willing to admit, especially in times of market stress and euphoria. (See the Career Talk video Day Trader Success: A myth or reality?)

News

Next, data vendors present News from the most granular company and industry levels to more top-down regional and country levels.

Much of what we used to call news is absurd these days. Isn't it? The recent circus surrounding media-darling Elon Musk's attempt to shake out short sellers of Tesla stock provides a great example.

"The media's interest in linking Elon Musk's $420 Tesla takeout offer to cannabis is entertaining in the short-run. In the long-run though, Tesla has been a pricey cult stock for years. What's more interesting to me is academic research concerning when company management takes on short sellers, how these types of studies are designed, risk-adjusted abnormal returns and whether it is possible to build a factor to profit from this anomaly. Otherwise, it's just noise." - Paul Alan Davis, CFA

My point here is, does following the constant flow of news lead to better performance? Or simply the feeling and appearance of being more well-informed?

It is difficult to know for certain without a way to measure it, so in a moment I'll show you what, in the end, matters most to institutional investors. What I call a Scoreboard.

We can all agree that the barrier to present news has dropped steadily since the late 1990s, bringing quality down with it. So with this glut of information, what is an investment decision-maker to do?

Technology-trained individuals are employing computers to process text, like sentiment models based on tweets, but as we know computer algorithms can be blunt and error-prone if not employed smartly.

In my experience, most professional investors process news informally. They may know what the President tweeted about protectionism and trade wars, and they may need that for small talk, but is their view incorporated into the selection and allocation of investments? If so, how is that measured?

Due to a lack of a systematic process to measure it, inconsistent ways to collect it, or biases that creep in to the process, to me, most news is simply noise.

We'll return to this point later.

Financial Statements

Next, vendors provide tools to process and analyze financial statements.

To highlight the importance of financials, I think Benjamin Graham's quote is appropriate here. ("In the short run, the market is a voting machine but in the long run, it is a weighing machine.")

Most long-standing firms do incorporate financial statements into their process, so data providers hire thousands of individuals in developing nations to pour over statements from some 30,000 public companies. They build custom databases so institutional investors can build their own weighing machines.

Imagine the man hours involved in scouring 300-page 10-K annual reports for not only the income statement, balance sheet and statement of cash flows, but also the supplemental information. This is a monumental task and something they get paid handsomely for.

Some of this information is available on the SEC website through EDGAR but building the hooks necessary to make that information useful would be too costly for all but the largest firms. This goes a long way to explain why there are four main competitors in the space.

We'll talk about cost later, but basically it boils down to: you get what you pay for when it comes to financial statements.

Portfolio Analysis

Portfolio Analysis refers to tools used for performance attribution. This helps to answer questions like:

  • Where were portfolio bets placed?
  • What were return and risk measures on factors?
  • What caused outperformance or underperformance?

Most investors don't have the knowledge to build these systems from scratch so instead they rely on data providers.

Risk Management

Next, financial data providers also provide access to proprietary or third-party risk models and optimization programs. Users can access vendors such as BARRA, Axioma and Northfield and pipe in Risk Management applications directly.

We'll have a look at important risk measures in a moment.

2. Industry Perspectives

Before that, let's get a visual of the industry players and zero in on our scope.

I use this graphic (see video) in Career Talk to depict the two industries we're talking about and where this topic fits.

Individuals Professionals Institutions

With a background in Finance and Technology, I categorize content along two scales and try to help Subscribers climb learning curves faster. So content is geared for those with the following educational backgrounds, job titles, skills and designations.

The orange circle zeroes in on where I feel this topics fits, meaning mostly to those with Financial schooling, at the Professional level and trying to build Institutional-level processes.

That said, the reason for this slide is to segment the market into three groups. Costs for the big four providers are too prohibitive for Individuals. Then other considerations like budget, sophistication level, contingency planning and sharing resources all come into play as firms grow. This is when they begin to evaluate services provided by the big four.

3. The Institutional Scoreboard

Okay, so now that we know it is Professionals and Institutions who subscribe, let's talk about what that Scoreboard I mentioned earlier looks like.

Financial Data Providers  (the big four)
Bloomberg, S&P Market Intelligence, FactSet, Thomson Reuters

To apply this to our context I'll do a two-minute backgrounder and borrow visuals from another presentation and point you to it at the end.

Benchmark

Aspiring Institutional investors assign investment products to a benchmark for comparisons. They know its constituents, metrics and how it is rebalanced and weighted.

Peer Group

Second, they evaluate performance of investment products against similar managers in what is called a peer group.

Returns

Third, they know how to filter through misleading charts that look impressive to the beginner, but under further analysis, with shorter sub-periods or rolling-periods, may look less compelling.

Risk

Fourth, they know how to analyze risk with measures from a linear regression or rolling regression that describe how the track record evolved over time.

I call this the Scoreboard, because it is what separates winners from losers. It can't be manipulated, and in the end, it is all that matters for those who care about risk-adjusted performance.

An Active Process

This is all about is building an active process, and as covered in a previous Career Talk, Is Passive Investing an Illusion?, all investors must make an active decision at some point.

In a well-thought-out active process, all bets are measured, whether they come from sources such as financial statements, news feeds or price patterns.

From there we can break current portfolio positioning into alpha and beta. Beta is the return and risk associated with bets to factors, like the overall market, or more granularly to country, size and style.

Alpha is what is left over and it helps determine if the manager benefitted from luck or skill. It captures the result of active bets that differ from the benchmark.

Also, many investors don't realize that, with a risk model, bets can be measured in the future as well as the past, and we cover this in the Quant 101 series.

Okay, with that we're ready for the answer to our question.

4. Buy or Build?

Let's walk through the buy versus build decision and if you're familiar with how Career Talk segments work, I give my view based on a 25-year career to kick off a conversation. So if you agree or disagree make your case in the Comments section on YouTube.

Ease and speed to production

Okay, point number one, to me, by far the easiest way to get up and going quickly is to buy.

Built-in datasets

Ease means that a variety of proprietary and third-party datasets are already built-in. So indexes, benchmarks, analyst estimates, regulatory filings and security pricing are already connected. Some come with the subscription and others can be added, like risk models, optimizers, custom industry data and sell-side analyst reports.

Installed software and mobile Apps

Most applications come as installed software, except S&P CapIQ has a web interface so you can bring it on the road. Bloomberg is famous for its terminals, and in some cases mobile Apps provide limited but useful functionality, like minute-by-minute portfolio performance and company news.

GUI interface

The intuitive graphical user interface means that employees need not be programmers to be productive. Yes there is up front training and the services are vast, meaning most people only scratch the surface at first and learn as they go.

Training

Of course one of the knocks with training, I hear, is that many of the support staff are green, meaning consultants are hired out of college without a lot of real world experience.

Investment decision-makers must then supplement initial training by leaning on colleagues and pouring over the documentation, which is generally pretty good. Overall, the learning curve isn't as steep as with programming, but building something worthwhile does take time.

Switching costs

You have to remember switching costs are high, so too are profit margins for data providers. So if you think about it, they want customers to intertwine their business as much as possible with highly complex software. Their incentive is to sell additional services, not to produce risk-adjusted returns, right? This is good to know before entering long-term contracts.

Is customization a good thing?

In the end, with so much customization available, each individual builds his or her own view of the world. This can be a good thing or a bad thing depending on your perspective.

Customization may be good for a fundamental or technical manager because their process is subjective, but maybe not so for a quantitative or passive manager whose process is more systematic and benchmark-focused.

Cost of datasets and flexibility

Next, let's talk about cost and flexibility.

Cost

What the big four data providers cost is a very common question punched into Google Search.

  • How much does a Bloomberg terminal cost?
  • What is the price of a FactSet subscription?
  • How much in an annual subscription to Thomson Reuters Eikon?
  • What does S&P Capital IQ cost?

Finding the price for a financial data provider isn't like when you buy a plane ticket; it's more like buying a car. Data providers are reluctant to print the price, meaning, they will negotiate and try to squeeze as much out of the investment firm as possible, after taking into consideration factors like how much they anticipate the firm to grow, present and future data needs, the team and other currently installed software from competitors.

Remember, there are few players in the industry, switching costs are high, so data providers know that once a firm signs up, its investment decision makers are very reluctant to leave. In the end data providers are trying to maximize income and negotiation is a part of the process.

That said, it is easy to spend 10 to 20 thousand USD for a basic installation. If you add constituent-level holdings from benchmarks, risk models, optimizers, custom industry datasets, and sell-side research, it is easy to spend 50 thousand dollars per year just for one seat.

When I consult with advisors, I assure they have this information so they don't overpay.

Flexibility

I mention flexibility here because when building out a system you have to take the good with the bad. Some vendors are more restrictive than others.

Cost makes it prohibitively expensive to subscribe to multiple services so you're often locked in to the data provider, thereby limiting flexibility.

For example, you could easily build an optimizer in Python or R for example, but hooking it up through the data provider can be very expensive, or not possible without a lot of workarounds.

Human-validated financials and datasets

Now third, let's follow up on the topic from earlier on financials and datasets, plus industry classifications.

Financial statements

I will say there is a lot of value in the fact that a human has gone through financials and categorized them properly. Some firms even offer a reward for identifying incorrect data, which improves quality.

One observation about standardized financial statements is that there really isn't a standard, across service providers anyway. Bloomberg maps items from the balance sheet, income statement and statement of cash flows one way, and FactSet does it a different way.

One database has over 1500 different classification items for the financial statements, which is great for the fundamental firm that requires granular subjecive data, but also can require the researcher spends weeks understanding and reconciling the schema.

Companies in different industries and sectors have different ways of structuring financial statements as well, the Banking industry is an example.

While I haven't done so myself, some firms scrape financials from the SEC website in the US. However, I've heard legitimate concerns about accuracy and development costs overruns.

Industry classifications

Also, an industry classification scheme, the most common being GICS (Global Industry Classification Scheme), is an overlooked component that is vital when comparing companies across and within industries. Valuation ratios for Utilities and Tech companies vary and without adjustments a portfolio might be filled with Utilities because they're typically cheaper. This is the reason for industry classifications.

According to a Wikipedia article, GICS has four levels, from the top it starts with 11 sectors, then 24 industry groups, 68 industries and 157 sub-industries. S&P categorizes all public companies using this taxonomy and sells it as a service. Some firms categorize companies to multiple industries, which helps with conglomerates like General Electric. Other competitors include SIC codes, and NAICS, plus those provided by the data providers themselves as well as index providers.

To the quantitative manager, whose focus is on systematically evaluating hundreds or thousands of stocks, financial statement granularity and industry groupings may not be as important. That said, eventually industry-level exposures will need to be taken into consideration, as with other systematic bets, like size and style.

So to summarize, in my view buying human-validated data from the big four is the way to go, especially if financial statements are important.

Build repeatable and shared systems

Point number 4 is about building a repeatable and shared system.

This is a difficult one. While I've worked with firms that build extremely complex active investment processes with a data provider, I believe that in their make-up, scientifically and technically-trained individuals have a better chance of building a shared and repeatable system.

Repeatable systems

Through their training to earn PhDs in Finance and Econometrics, a Masters in Financial Engineering or other STEM educational programs, individuals are more inclined to build a process using the statistical programming language they learned in college. These may include MatLab, C++, Java, or if you're up to the latest trends, Python or the R programming language with data stored in SQL databases.

In academia and at large institutional firms, researchers are cordoned off so they can focus on building processes instead of having have to worry about the day-to-day profitability of the firm, gathering clients and chasing AUM (assets under management).

They understand the metrics: alphas, betas, ICs, IRs, optimization tools, multi-variable regressions and factor models.

I don't have a STEM background, so I'm not biased. I come from the other direction, financially-trained, with an MBA and as a CFA Charterholder (Chartered Financial Analyst) I learned the subjective investment process first.

The way I see it now however, especially for my audience at FactorPad, I have to vote for the builders on this point.

Shared systems

What about building shared systems? I think it goes without saying that firms now are more inclined to build systems rather than rely on the subjective view of a star manager, like a Peter Lynch or Bill Gross from the past.

Here again, the technology-trained professional with a background in software testing, building repositories and monitoring the difference between versions has a leg up in my view. So those who have experience sharing their code on GitHub and collaborating on projects are best positioned in the future.

That said, we've all seen academics blow up investment products due to an overreliance on statistics, and that is where a combination of the two types of minds, financially-trained and scientifically-trained, can create a compelling combination.

Career prospects and skill portability

And finally, let's cover career prospects and skill portability.

Career prospects

New techniques typically under the realm of data science, like Machine Learning and AI, are fairly new in the investment space, so job postings are likely skewed as a result. From what I've seen there recently, more firms are reaching outside for people with these skills rather than promoting from within.

These positions are in the highest demand, with high pay and high levels of career satisfaction, according to surveys.

Saying you can build financial models of stocks in Excel with links to Bloomberg for example, just isn't as compelling as it used to be. Saying you can build a Scoreboard is much more convincing.

Skill portability

Next is the point about skill portability, and again, this comes from observing job postings.

Let's say, for example you spent thousands of hours, over several years to learn code-like syntax in FactSet's Universal Screening application to select stocks. Then a job comes up with a firm that uses Bloomberg? Other than the concepts that are portable, you'd have to throw away all of that domain knowledge and learn how to do the same thing in Bloomberg, right? This isn't ideal.

A more forward-looking way is to use open-source tools like Python or R programming to process proprietary or third-party datasets so you can keep building skills throughout your career. To me this is an aspect overlooked by firms when deciding whether to buy or build.

So with that, in my view, with career advancement and my typical audience in mind, I vote for build. And that tips the scale to build over buy in a close call.

Of course there is no one answer for every firm and that's where I'd like you to chime in and let others know your opinion in the YouTube comments section.

Sources of free datasets in finance

Before we go, here are those free datasets I mentioned (links open a new browser window).

I find the last three interesting. Quandl offers third-party datasets for free and on a subscription basis. SimFin is focused on the automated collection and distribution of financial statements and improving their accuracy through crowdsourcing.

Finally, Quantopian offers a training ground for Quants who are already comfortable with data analysis in Python. I hope to have time for a Quantopian tutorial series, so let me know if you'd like to see that.

5. Where Can You Learn More?

Okay, so where can you learn more about the scientific approach and that Scoreboard from earlier?

Currently, about 1,000 times a day visitors review some 330 web pages at factorpad.com and 250 videos on YouTube, and here are current measures and demographics (see video).

What underpins all of this is my view that more investors should, and eventually will, care more about risk-adjusted performance. The trends all point in that direction.

So my learning content on the Tech side includes web development and data analysis tools like Python and the Linux command line.

On the Finance side, the slides from earlier were from a tutorial titled The 10 Steps to Writing a Pitch Book for Institutional Investors.

If this scientific approach piques your interest, likely the best resource to try out is the freely-accessible course Quant 101, a series of 30 tutorials, with a run-time of 10 hours. I use it to teach a college course on financial modeling.

The point with Career Talk is to bring up thought-provoking topics so you can share your viewpoint. I'd love to hear what you think in the Comments section on YouTube.

Of course reach out to me if you'd like me to help your firm.

There you have it. If you have questions or feedback please leave a comment and subscribe for more of the scientific approach, in Career Talk.

Have a nice day.


Other Related Resources


What's Next?

Subscribe to our YouTube Channel for other shortcuts to enhance your career.

  • To see the Career Talk main page, click Outline.
  • For a discussion on whether passive investing is an illusion, click Back.
  • To see the presentation on how to write a pitch book, click Next.

Outline Back Next

~/ home  / career talk  / buy or build



 
 
Keywords:
investing platform
investing data provider
financial data provider
factset vs capital iq
thomson on analytics
bloomberg subscription
factset
bloomberg terminal cost
bloomberg terminal price
capital iq competitors
bloomberg machine
bloomberg terminal alternative
bloomberg portal
factset terminal
thomson reuters
qa direct
thomson one
datastream
best online investing platform
thomson reuters eikon
aws cloud
google cloud
python for finance
r programming
financial data provider
financial analytics
s&p global market intelligence
financial modeling
snl financial
financial datasets