What is Data Science?
So now that you're familiar with what machine learning is and we've walked through our first high-level case study, let's talk about what data science is.
Guide Tasks
  • Read Tutorial
  • Watch Guide Video
Video locked
This video is viewable to users with a Bottega Bootcamp license

Now machine learning and data science are similar and they do have a number of overlapping characteristics. However it is a separate field of study. That's why it's called machine learning and data science. And so in this guide I want to dissect exactly what a data scientist does and we're going to start off by talking and showing another Venn diagram.

So I'm going to draw three circles. And we're going to see the intersection of where data science comes in.

large

So right here we're going to have three different fields of study. The first one is going to be mathematics. So a data scientist typically is very good with math specifically with statistical analysis.

So a data scientist is going to be able to understand the mathematical symbols they're going to be able to work with concepts such as standard deviation and whereas a machine learning developer might simply have a high-level understanding of what the algorithm does a data scientist is going to understand the math behind those algorithms and so that's a very important characteristic from a data scientist.

Now the next category that a data scientist has to have is domain expertise.

So I'm just going to say right here domain and what domain expertise means is that a data scientist has to understand whatever domain they're working in or whatever industry that they're working in. And we'll walk through a little bit of a case study after we've talked about these three categories.

And the third and final category is development or coding. So a data scientist is going to have to be able to perform at least some basic level of programming because pretty much every type of algorithm out there if you want to implement it as a data scientist you're going to have to understand a number of key programming techniques.

You're going to have to be fluent in some type of programming language such as Python or r you're going to have to be able to build out preprocessing scripts because as we're going to go further into more advanced topics you'll see that one of the most important elements that a data scientist works on is clean up data because many times you're going to be dealing with raw data that you can't really trust.

And so you're going to have to be able to build out programming scripts that go through scrape the data that you want ignore the data that you don't need so that you can make sure that your algorithms are working with the right information.

So now that we know these three categories and we can see that a data scientist has to fall within the middle of these. So as a data scientist you will want to fall right in the center where you have the mathematical knowledge you have the domain expertise and you have the coding acumen.

Now myself as an instructor I can help give you the math and the coding skills the domain expertise is going to be up to you. That is going to be whatever industry that you decide to work in. It's not enough to simply understand the math and the coding without having knowledge of the domain then you're not going to know what data that you're working with.

You're not going to understand what data is important versus what data should be ignored. So with this knowledge and this understanding in place let's talk about a case study on how a data scientist works and we're going to take a friend of mine named James who is working to be a data scientist and a machine learning developer.

He has worked for years for a coffee roasting company. So what he wants to do is to be able to take this entire package here and help the roasting company improve their sales. So he wants to be able to look at elements such as historical sales trends and then convert that data to understand when marketing campaigns should be run and where to place new stores and all kinds of different decisions that have to be made that machine learning and data science can help accomplish.

So how would James accomplish this? Well, first and foremost he's going to have to understand the mathematical concepts that are underlying all of the algorithms so he is going to have to understand concepts such as building out predictions and working with complex probabilities and those kinds of ideas.

He is also going to have to be able to program in the language such as Python to be able to integrate all of those algorithms those mathematical algorithms. He's going to have to be able to take the data clean it up apply all of those algorithms to the data in order to build the recommendation.

Both of these concepts are important. But James is going to bring the third circle here to the table this was going to make James even a better data scientist for the coffee roasting company than I would be because I don't understand the coffee roasting business.

I personally don't have that domain expertise but he does and so be if you can combine all three of these categories, have a knowledge of the industry, understand the math that is needed in order to work with algorithms, and then actually implement those algorithms. That is the full package for a data scientist.