FactorPad
Build a Better Process

Which Operating System is Most Common in Data Science? Linux, Mac or Windows?

Here we look at the intersection of survey figures from a variety of sources.
  1. Our Stack - Introduce our choice of software.
  2. Terminal - Discuss PuTTY and the SSH Protocol.
  3. Clients & Servers - Review systems at a high level.
  4. Operating Systems - Analyze market share statistics.
  5. What's next? - See our next task.
face pic by Paul Alan Davis, CFA
Updated: February 21, 2021
Now let's see what is most common among Data Scientists.

Outline Back Tip Next

/ factorpad.com / tech / full-stack / data-science-operating-systems.html


An ad-free and cookie-free website.


The Operating Systems Used for Data Science

Beginner

Video Tutorial

Videos can also be accessed from our Full Stack Playlist 1 on YouTube.

Which operating system is most common in Data Science? (5:00)

Code Examples and Video Script

Welcome. Today's question: Which operating system is most common in Data Science? Linux, Mac or Windows?

I'm Paul, and if you're like me, then you're frustrated by how long it takes until you actually analyze data, so I'm sharing a few things I learned about operating systems to quicken your pace.

I'm using a different approach here, a Full Stack approach, for those interested in getting here (Advanced) to Data Science.

So I'll start with a picture of the stack and describe how it simplifies things.

Then we'll move to the Linux command line and use a text editor that sits on the server, instead of the client. Where we will talk about just that, clients and servers.

Next, I'll point you to, and summarize, a third party source for client and server operating system market share as we zero in on an answer.

Then return to see what's up next in our Project on Server and OS Setup.

Step 1 - Our Stack

Okay, so as you can see, our stack has 4 layers and at the bottom, or closest to the metal, is the Operating System. Our distribution of Linux is called Debian, and on top of that, we'll add databases, Python with the appropriate statistical packages and finally the presentation layer.

"Client" here refers more to user and isn't the same as clients and servers, which I'll touch on in a minute.

The reason for a Full Stack approach is to hold operating system variables constant, so the process is simple, sharable and scalable.

Step 2 - The Terminal

Here is my login to the server through a program called PuTTY, which allows the client on my desk, running Windows, to access the server securely using a Protocol called SSH, for Secure Shell.

login as: paul paul@192.168.0.15's password: The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Fri Jan 20 22:10:35 2017 from 192.168.0.5 paul@fullstack:~$ _

Step 3 - Clients & Servers

I input my login name and password on the server located in my office here at this IP address (192.168.0.15). Okay, let's look at a few points here. First, the operating system name, my last login from another address (192.168.0.5), which is my Windows client. The name of the server, or hostname, is called fullstack.

So this is all on my client and this is a window to the server. Right?

Step 4 - Operating Systems

Let's shift gears and talk about operating systems. In the YouTube Description is a link to a Wikipedia page that I'll be using for this part of the discussion, and I'll try to supplement it later, as of this date (January 20, 2017).

paul@fullstack:~$ cal January 2017 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Now in a directory called notes, let's open a file I started in a text-editor called nano, and nano sits on the server.

paul@fullstack:~$ cd notes paul@fullstack:~/notes$ nano video0004.txt

Now, I caution you, there is a lot of information here from a variety of sources like Gartner, StatCounter and W3Techs, plus self-reported information from Google on Android, Apple on iOS and macOS, and then also Microsoft on Windows. So take it with a grain of salt. I'll note some potential biases and collection methods as we go along.

GNU nano 2.2.6 File: video0004.txt Operating Systems - mobile (phones and tablets) - desktops (desktops and laptops) - servers (servers, supercomputers, mainframes) 1) Operating Systems - Gartner 2015 - Android 54% - Apple 12% - Windows 12% - Other 22% * based on device shipments 2) Desktops - StatCounter 2016 - Android 1% - Apple 11% < - Windows 86% < - Linux 2% * based on browser statistics 3) Servers - W3Techs 2015 - Windows 32% - Linux 68% < * web, mail and DNS servers 4) Web Developers - Stack Exchange 2016 - Apple 26% < - Windows 52% < - Linux 22% < * Web Developer Survey in English ^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos ^X Exit ^J Justify ^W Where Is ^V Next Page ^U UnCut Text ^T To Spell

What I'm after is market share for Data Science, and that's nearly impossible to gather, so let's estimate it with a few tables.

As for operating systems, let's look at the three groups: mobile, desktops and servers. Assuming Data Science professionals like keyboards and don't code on a mobile device, we'll focus here (Desktops and Servers).

Overall, operating systems are dominated by Android, but that's because of the phone market, so let's keep digging.

Next, focusing on Desktops, the client of choice for our data scientists, well numbers here support the position that when people buy a desktop computer, most retain the operating system.

Now, server data is difficult to measure, and one method is to look at public-facing servers, which shows a two-party market.

And finally, web developers, which likely cross over with data scientists, it looks like this (Web Developers). Stats on FactorPad's YouTube Channel rhyme with tables 2 and 4.

So overall, what is the takeaway? Well we're in pretty good company here. Throughout, I will be developing on Apple and Windows desktop clients, and a Linux server.

Step 5 - What's Next?

Join us for the next question: "How do you prepare for a Debian Linux installation?" in video (tutorial) 5.

Have a nice day.


What's Next?

Our YouTube Channel is growing. We would love for you to be a part of it. Join our Twitter handle @factorpad and email address for reminders.

Outline Back Tip Next

/ factorpad.com / tech / full-stack / data-science-operating-systems.html


data science operating systems
operating system statistics
learn data science
data science tutorial
mac in data science
windows in data science
linux in data science
most common operating system
statcounter survey
w3techs survey
stackexchange survey
web developer survey

A newly-updated free resource. Connect and refer a friend today.