Overview of Decision Trees
In this lesson, we're going to walk through the decision tree algorithm and this is one of my favorite machine learning algorithms to work with and there are a number of reasons why.
Guide Tasks
  • Read Tutorial
  • Watch Guide Video
Video locked
This video is viewable to users with a Bottega Bootcamp license

One it's one of the more logical approaches to machine learning that I've ever discovered and it's because it takes a condition based approach to building a prediction.

large

So if you are familiar with programming and you're familiar with. If else type of logic and you like that type of logic and you think that way then you're going to like decision trees because that's exactly how they operate. They operate off of a series and off of a tree of conditions.

And it also is helpful because out of all of the algorithms that are out there this is one of the only machine learning techniques that actually shows you the logic behind its prediction and it does it in a visual manner as you will see shortly. The category that this fits into is both on the regression side and also on the classification side.

large

So decision trees can be utilized in a number of scenarios and they are one of the algorithms that I'll usually go and I'll pick out to use whenever I'm presented with the machine learning problem.

So let's take a look at the definition of a decision tree. Scikit-Learn says that decision trees are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision roles inferred from the data features.

large

So what that essentially means is it takes in a number of features and then it builds conditions and based off of those conditions those if else type of conditions then it will build its model that way. Now this description this definition of decision trees is relatively straightforward and the only element that I really want to pick out here is one that we've discussed before which is non parametric and the reason why we want to do it is because this is a keyword that will trip up a number of students if they've never heard it before so we will discuss what that is again.

Remember that non-parametric means it's not required to follow normal distribution rules.

large

so many algorithms are based on high mathematical kinds of concepts and formulae where they are derived from statistics and the entire focus is on finding the standard deviation between points or vectors or anything like that.

Well, a decision tree is not like that it doesn't work with normal distribution rules, instead, it works with conditions. Earlier in this guide, I mentioned how I will pick the decision tree in a number of scenarios and it's one of the first ones that I'll think of using in many different use cases.

large

The reason is because it has a number of potential applications, Mortgage systems is a very popular type of application for Decision Tree and the reason is because mortgage systems are required by law at least in the US to give a rationale for why someone did not get approved for a loan.

So in other words, if you go and you try to get a mortgage and you have poor credit or you have any kinds of issues like that the government requires that the bank gives a reason for why you can't get a loan. A decision tree gives a full set of the reasoning logic. So whenever you're running through the decision tree model and you're taking that new mortgage applicant it's going to tell you exactly why that individual was not able to get that mortgage and so that is something that's incredibly helpful and in some cases, like mortgage systems, it's actually required by law.

It is also helpful for recommendation engines so it allows you whether you're building an Amazon e-commerce type of application where you want to know if a user is likely to build or to buy certain products because of past historical purchases then it can work well for that. I've implemented a decision tree into a content recommendation system for devcamp. So there are all kinds of different recommendation engines that the decision tree can be fit into Decision trees are also popular for image recognition.

Now, whenever you're piping data into a decision tree for image recognition. So say if you want it to recognize faces inside of a photo then it's going to be a much different type of implementation because you're going to have to go and have some type of preprocessor that goes and grabs all the pixels of that image and then it's going to be able to have that stored in its knowledge base. So that's going to be different than say the content recommendation engine or a mortgage system but it's still possible even though they are completely different use cases a decision tree is very flexible and it's able to do that.

The next use case may sound like a little bit different but it is one that we're actually going to be using in this guide for the case study and that is a fleet management system. So we're going to walk through exactly how we can leverage a decision tree to see how we should manage our entire fleet of vehicles. And the reason why I pick this out for the case study is because this comes from a real-world application that I built out for a fleet company.

So there was a company that wanted to use machine learning in order to see when trucks should be retired from the fleet or if it's worth it to keep investing in them. And this was a perfect situation for a decision tree and we're going to walk through exactly what I built out for that. And this is something that I think is very helpful because this is something that you may be asked to build and there are a lot of similarities between what I created and implemented with this case study and what you may be asked in your own job or in the company that you're working for.

The last use case I'm going to give is a sale prediction engine. This is very popular for decision trees. Imagine that you have a large volume of data on who has purchased your product and you want to be able to tell if a new lead that comes in is likely to purchase or not. And if you have historical data that captures data such as how old they were or their location of the country that they lived in or any kinds of things like that you could also imagine other elements such as their education level, income level, all of those kind of things if you have that data that is incredibly powerful because you're able to pipe that into the decision tree it can build this predictive model and then what would be very difficult to do if you wrote it out by hand or if you tried to do this with out having a machine learning tool.

The Decision Tree can do that in a very short period of time so it can tell you if that new lead that came in if they look like one of the people that purchased versus one of the people that didn't purchase and that can help you and help your company target its advertising campaigns and where it should focus where it should be marketing and those kinds of elements.

So now that we've walked through the use cases, let's talk about the pros and the cons to working with decision trees.

large

The first pro is that the decision trees are very transparent and I've mentioned this several times but this is definitely top of the list this is one of the top reasons people use decision trees and that is that it is a white box that gives you visual logic which is something that you can't say about many other machine learning algorithms. Imagine trying to do this with a neural network or with logistic regression or anything like that it's very hard to understand what that machine learning algorithm is doing behind the scenes. And so what we can do here is we can actually have some intuition on why it gave that prediction and that can also help and make sure that we are making the right decision.

So there are many times especially early on as you are implementing a machine learning algorithm that you don't even quite know if it is accurate or if it's not and that takes a lot of work it takes a lot of experience to perfect that skill. A decision tree is a great option especially as you are learning about machine learning because you're able to see why it made a certain prediction and that may give you a level of knowledge that you couldn't have with other algorithms and it will help you to refine your skills a little bit faster so that is incredibly helpful it's very high on the pro list.

Next is that decision trees are numerical and categorical. And what that means is with many other algorithms they do not work well with string-based data or anything that would be considered like a category. And so what you need to do is you need to implement preprocessing techniques where you take say you have all these categories so say you are trying to build an algorithm for social media application and you had 20 categories. You would need to convert each one of those categories to a numerical value for a certain algorithm so you'd need to have category one with an integer 1 category two would have an integer 2 so on and so forth in order for the machine learning algorithm to work because many machine learning algorithms are really just statistical formulae that we've translated into code but with the decision tree it works much different in the decision tree is much closer to pure programming than it is to statistical analysis.

That is a reason why you can use both numerical and categorical data and that is tied perfectly with the third pro which is that there is limited preprocessing you do not have to spend anywhere near as much time preparing the data for a decision tree compared with some of the other options.

And then lastly the performance on a decision tree is quite good. I have had many times where I was trying to debate between using a decision tree or having to go with a full-on neural network and there were many cases where Decision Tree gave the same level of performance and accuracy but it was much more cost efficient to implement and it was much easier to and also like I've mentioned I was also able to see the logic behind the decision. So it made me more confident that the system was behaving the way it should.

So now that we've talked about the pros let's look at the cons

large

and both of these cons are connected in a sense the top one is that decision trees can be subject to overfitting and what overfitting means and this is going to be a term you hear quite a bit in the machine learning space is it is a problem where say that you have a data set so say you're trying to create some type of predictive model for the stock market and you have a large dataset say that theoretically, you could have all of the data in the world packed into your program.

Now that's not something you could actually do, you would not have the time to allow it to perform the program by the time the stock reached its next level. Let's just imagine theoretically that you could do that. Well, that would not actually result in a great prediction and the reason is because what overfitting means is where you have all kinds of data that may not actually be associated with the prediction but the decision tree does not know that no machine learning algorithm would know that.

So imagine that you could just take this stock market predictor and it saw that there was the Phoenix Suns basketball team every time they won the next day the stock market went up. There is no connection between those things whatsoever, but if you took in the wrong data points and you slid those into the algorithm and the algorithm noticed those patterns it would start to pick up on that and it would start telling you to purchase stock every single time the Phoenix Suns win. Even though there is no correlation between those things what so ever, there are just coincidences and so that is something that you need to be cognizant of whenever you're implementing a decision tree is that you're picking out the right data points because if you give it too much data then this fits right in with the next con.

Then certain classes or certain data elements can mistakingly dominate. Just like if for some reason your stock market analysis tool started looking at all of the basketball games and it started to see trends with those and how they affected the stock that has nothing to do with it the stocks are determined by all kinds of different things such as buying patterns and how much money a company is making or losing. And so we don't want to have classes and data elements in our training set that could mistakingly dominate and produce an incorrect prediction. So now that we've walked through the definition the use cases the pros and cons of decision trees let's get into our case study.

In this case study like I mentioned we're going to walk through a fleet management system implementation and this is something that I built for a medium-sized firm there were about 1000 trucks in their fleet and they have the issue where they did not know if it was the right time to retire or to keep investing in a truck. So when you have a fleet of a thousand vehicles that can be a very difficult decision to make and it can be a very expensive one if you make a mistake and you say you put money into a car that should be retired and then it keeps on bleeding money the longer you keep it in the fleet.

And so what I implemented was a decision tree that went through all of the historical data. So we saw data like this right here

large

and this is just a few examples we actually had thousands upon thousands of records for their historical data and we are able to see different elements such as the year the make the mileage the fuel the fuel type the repairs the services. And then lastly you can see this is a very important component here.

This is the status and this is if it is active or if it's retired and this is what we're looking at to help the decision tree make its final prediction so the end goal is that if I provide a new vehicle. So say we have a vehicle that is up for some pretty expensive repairs the goal of the system was to be able to give a prediction of if it is worth it to keep investing in this vehicle or if it's time for it to be retired. So if we'd actually save money by just simply purchasing a new car as opposed to putting more money into the old retired one.

Now another thing that I will mention about decision trees is that this is another machine learning algorithm where domain expertise is very important. And the reason for that is I'm not a truck fleet type aficionado I'm not an expert on fleet management systems or on trucks and how long they should last if I was asked to build this by myself. I would pick out the wrong elements I have just not been in that industry long enough to understand what makes a truck a good investment versus what doesn't. And so when I was working on this I worked specifically with the fleet management team. That team had several members with years of experience in the fleet space and so they were able to go through the data with me and to tell me which elements were important and which ones should be just completely ignored so that I wouldn't make the mistake of overfitting and I wouldn't allow certain classes to dominate the predictions so we were able to get the data like this have it in a typically I would start with it in a CSV file.

And then from there, I would pipe it into the decision tree and it would give me the prediction on if it is time to have it be retired or if it's worth repairing it.

large

And so the inside of that funnel what it does is it builds the tree so it takes all those conditions and walks through all of them and then it builds the model itself and then it generates its weights based off of what it thinks is the most important part of the data.

large

And in many of the previous walkthrough is whenever you would have the machine learning algorithm work you'd end up with a visual that looks something like this

large

where you had this graph and on the graph, you had different data points and then you could find clusters you could find patterns in the data.

A decision tree does not work like that. Instead what you're going to see is something that looks a little bit more like this.

large

And so as you can see the decision tree generated this set of conditions and then it assigned weights to those conditions and I have simplified this quite a bit so that it doesn't look too complex but whenever you do this in a real-life application you're going to see a pretty large tree.

But these are the basic concepts so imagine that it takes in because it has all the training data it takes in a new truck and it's trying to decide should it be retired or not?

Well, the first thing it does is it checks to see what does the mileage look like? If it's under a hundred thousand miles should we keep driving, or should we continue down the chain? And so it looks at the data and it noticed that there were no trucks that were under a hundred thousand miles that had been retired. So because of that, it assigned a very high weight to that it put it at the top of the tree. And so whenever a truck came in that was under that mileage marker it simply said keep driving because we don't have any kind of data to say that a truck that is under this threshold should be retired.

If it is not so if it's over 100,000 then it checks to see was consistent maintenance performed on that truck because that was one of the other elements that was considered very high in the priority chain. And so it says it was every 3000 or 5000 or however many miles that you were supposed to perform maintenance in the history of this vehicle was consistent maintenance performed? If yes then keep driving. And if not then maybe it's time to retire. And so this was a very small version of the decision tree that was generated from this particular application.

So, in summary, a decision tree is a very powerful tool in your machine learning and data science arsenal. It gives you the ability to see behind the curtain and to be able to see what the decision tree is actually doing and how it's making those predictions and how it's generating the weights and because of that you're able to have transparency to understand what your program is doing and then that allows you to refine it from there.

So imagine that in and this did happen as I was building this out imagine that the decision tree was not giving the right predictions and it was saying that certain cars should be retired when they really shouldn't be and vice versa. Well because you're able to look into the data and because you're able to see this visual of why it's making its decisions then I was able to go in and alter what the weights were so I was able to go in and add data or take the data away from the training set until it finally was building a model where it was giving accurate predictions.