Section Introduction for Supervised Learning Algorithms

What it essentially means is that the algorithm is supervised which means that you as a developer you're giving guidance to that algorithm and we're going to go through in detail exactly how that works. But before we do that let's take a look at artificial intelligence and let's look at a little bit of a tree. So we've already talked about AI this is the broad set of technologies that machine learning is nested inside so AI is connected to data science. And then we also have machine learning right here.

large

Now inside of machine learning, there is a whole host of algorithms and technologies. And so in the next few sections of the course, we're going to talk about a few of those offshoots so there are supervised algorithms and then there are unsupervised.

large

And so there is going to be a pretty clear distinction on when you're going to want to use one versus the other and that's part of the reason why I constructed this course the way I did was because I want to lead with an example first approach a case study first approach so that when you see some type of situation or you're asked to build out some type of program that implements machine learning you're going to have the mental framework for understanding where the algorithm fits into this entire ecosystem.

Now extending our tree right here inside of supervised algorithms we have a number of types of supervised algorithms. So here we can branch this off further and here we have classification algorithms and we also have what are called regression algorithms

large

Hopefully, you can read my writing if not you can check the show notes to have all of the correct names but these are two of the major categories inside of supervised learning algorithms. So what exactly is the difference between these two? In all actuality, there is a little bit of overlap between the two. And there's even quite a bit of debate inside of the machine learning community. On if some algorithms fit into bucket A or if they fit into bucket B. And when we get into the examples hopefully that part is going to come a little bit more clear.

I personally like to take a very base case approach whenever I'm trying to understand a complex topic. And so what I've done is I've just created two questions to ask and that is what helps me choose one set of algorithms over the other. So if I ask the question. What is it? that tells me I should be using a classification algorithm. However, if the question is what should it be? Then I have a regression problem on my hands.

large

So I know that part is probably not clear yet but I wanted to give you two questions to ask yourself and that's going to tell you if it's a classification issue or if it's a regression issue let's take even a deeper dive and analyze a few examples. We're going to get into an entire slew of case studies and so that's going to give I think much more clarification. But before we do that let's just look at a few base case examples.

So classification let's ask what is it a great example for this and it's one of the most popular is spam. Say that you're building out an e-mail management system like Gmail or outlook you want to know is an e-mail coming in spam or not. If it is then we want to throw it in the spam folder or not. We want to place it in the inbox and we need an algorithm that can quickly scan through the message pick up keywords and then categorize it accordingly.

Then moving on to the second question what should it be? One of the most popular examples of this is if you're building out some type of home management system such as Trulia or Zillow and you have all of these real estate listings and you want the ability to predict how much a home should be listed for. So you can take in all these criteria you can take in square footage and neighborhood all of those kinds of variables and then based on the historical trends. So looking at the sale comps and all the other listings that is going to build out your system so that when a new home goes on the market your system can as accurately as possible based on all of the variable and all the historical information it can make a prediction.

Notice how that answers what should it be versus what is it. Now those may sound like they are not that different and they really aren't. There are some very small subtle differences that will separate a classification algorithm from a regression one. And I really think the best way of understanding that is by going through the case studies.

Now that you have a high-level view of artificial intelligence down to machine learning to supervised algorithms and then all the way down to classification and regression algorithms let's get into those examples.