Overview of Neural Networks
So far in this section of the course we have focused on what are called shallow learning algorithms and we're going to transition in this guide and we're going to discuss one of the most powerful tools that you can have in your machine learning arsenal and that is the neural network.
Guide Tasks
  • Read Tutorial
  • Watch Guide Video
Video locked
This video is viewable to users with a Bottega Bootcamp license

large

Now a neural network has a specialty and a primary usage of complex classification.

large

And as we go through the case study and the definition you're going to see that neural networks allow you to combine many of the tools that we've seen so far along with a number of other algorithms and machine learning processes that you can utilize to create some pretty impressive intelligent agents. And so let's take a look at the definition here.

large

And so what Wikipedia says neural networks are is that they are computing systems inspired by the biological neural networks that constitute animal brains. Such systems learn progressively improve performance on tasks by considering examples generally without task-specific programming.

So to boil that down what it means is that a neural network attempts to mimic the way that a human brain operates. So instead of operating similar to say naive Bayes where it simply takes a set of probabilities multiplies them together and then gives a recommendation what a neural network has the ability to do is actually learn. So it is the closest thing that we have so far to being able to mimic human intelligence we are not at a stage yet where we can create truly intelligent agents obviously or else we would probably have a very different world than we live in today because none of us would have jobs because all the computers would have taken our jobs.

However what a neural network can do is it can provide a very robust system that can learn from historical data and then run through all kinds of various simulations much faster than the human brain and also much faster than the typical algorithm. So let's talk about a few of the common use cases.

large

Self-driving cars are one of the best ones because self-driving cars need to have a very large dataset of historical actions. So a self-driving car needs to know what needs to happen when it sees a human or an animal or a stop sign or red light and it has to be able to adjust. But even beyond that self-driving cars also have to have a level of intelligence where they can adjust and have dynamic behavior.

So take for example if you're in a self-driving car that sees a green light but then recognizes that a pedestrian is stepping out and is jaywalking even though it's a green light that self-driving car if it is programmed properly and if it's a neural network is operating the way it should. It is not going to run over the pedestrian it's going to realize that the pedestrian takes precedent over the green light and that's a pretty basic example but you can add layer upon layer of complexity and the neural network can process those. That's one of the very powerful tools that it has at its disposal

Now another great use case is the set of complex recommendation engines and that's going to be our case study that we're going to go through in this guide is being able to analyze a situation that has all kinds of different potential outcomes and it has more variables than we can really keep track of in even a traditional learning algorithm and so it can combine all of those systems and then give a set of recommendations based off of what it learns.

Another great use case is image recognition. Now your very first question might be why in the world do we need a neural network to perform image recognition when we've already talked about support vector machines because we talked about how an SVM can analyze images, handwriting, and different elements like that and give a recommendation and classify those. And that is a fantastic question and in fact, that is the exact question that I want you to ask because when we get to our pros and cons you're going to see that a neural network is not always the answer and it shouldn't be the first algorithm or the first service that you go to whenever you are needing to build intelligent behavior into your systems.

So when would you want a neural network to implement a image recognition service? Well some of the best examples of this are when you need even more complex behavior than simply recognizing handwriting to see if a 9 is a 9 and a 1 is a 1 and to classify those elements the same way we did in the SVM guide a neural network can combine all kinds of various probabilities along with that classification. So, for example, imagine a scenario where you do not only need to classify handwriting you may also need to be able to process handwriting in all kinds of different languages. So a one may look one way if the writer is writing in English but if they're writing in a completely different dialect they may write a number or even a letter that looks a lot like a one a support vector machine is not going to be able to process that very well.

It's going to look at that letter or that number it's going to look at that one and it's going to compare it to its historical data set and it's going to say this looks exactly like a one. This is how it's going to be categorized where a neural network we'll look at the entire system and say OK this looks like a one. But now let's compare all of the other letters and characters around it and see what the language is and then we're going to build our probability based off of that. So a very common use case for neural networks is being able to not just perform one advanced task but also layering on multiple types of systems of probabilities multiple algorithms and combining those together to create intelligent agents.

In the last use case that I have on the list here is very similar to the example I just gave and that is natural language processing also referred to as an LP. Now the reason why I wanted to separate it out is because technically an LP is a completely separate field from machine learning. However, you are going to see quite a bit of overlap in your work. So, for example, Amazon's Alexa, Google home, and Apple's Siri are all forms of natural language processing in action. Whenever you ask Siri what the weather in your hometown is it is leveraging natural language processing.

But while it does that it is also going and it is being routed through a very complex neural network that can process your request so it can understand your speech patterns and it will rip apart the words that you say it will guess your intention and then it'll create a prediction and give a recommendation. And so that wraps up a number of different technologies all in one to bring tools like Siri to life. So that's a list of some of the common use cases.

Now let's talk about the pros for using a neural network.

large

First and foremost it is incredibly powerful. So neural networks are quite a bit beyond the scope of the other shallow learning algorithm simply because they can handle so many different scenarios. I'm also going to add a second pro here and say that it is simple to implement. But as you may notice right here on the slide I have an asterisk here. And the reason for that is because there are many neural networks that are simple to implement. However, they are not always simple to implement the right way.

And so we're going to talk about that when we get to the con list. However, I wanted to list it here because there are a number of services and essentially neural networks as a service. A great example is tensor flow, tensor flow is one of the most popular neural network services out there and it does allow you to perform incredibly complex task without very much work on your side. You can perform tasks such as wiring up your system to upload a picture directly to the server and it will return back what it thinks that picture is.

So if you upload a picture of an apple it's going to say we're about 85 percent certain that this is a picture of an apple and you can build those type of services directly into your own programs. Now that is because the tools like tensor flow and there are a number of other big data Machine Learning neural network types of systems out there. It's because those companies spent a lot of time and a lot of resources in order to make that process straightforward.

If you were to do this by yourself they are not easy to implement whatsoever you have to think that you're essentially trying to create or recreate how the human brain operates and that is not an easy task.

So now let's look at the cons

large

First and foremost a neural network is incredibly resource intensive. And when I say resource intensive I mean that on a few different levels one it is resource intensive from a server or a computer perspective whereas with shallow learning algorithm such as naive Bayes where you can simply run it directly on your own system and it is very easy. So you're going to see when we get into real code implementations that you can simply run it and it's going to run and operate on your computer without any extra hardware and any extra configuration and that is great. However, a neural network can not run like that a true neural network needs to actually run on outside system.

So usually you need to have an entire network of servers that are all interconnected and can work with each other and process large amounts of data and we can't even typically run a neural network using the CPU you on a computer or even a server. So neural networks require a very specific type of hardware. They require the GPU to and if you're not familiar with computer hardware that's fine the CPU is your processing unit. So that is what your computer pretty much runs on that's what is right in the middle of the motherboard and it runs and is essentially the brains for your computer. Now the GPU though is very different.

That is the graphics or graphical processing unit which you wouldn't think would be what you would run a neural network on. However what developers realized is that GPS use were built in a way where they had to handle massive amounts of data and flow and be able to have the data flow through right through the system very quickly. And so a GPU is what neural networks are run on. And so it's resource intensive from a hardware and architecture perspective. And because of that it is also very expensive.
So if you are wanting to build out your own neural network you're going to have to write a pretty large check for the hardware.

Now the next con on the list is that a neural network is essentially a black box. And if you've never heard that term before the way that I like to understand it is with many other shallow learning algorithms it's relatively straightforward to see what is happening in order to take our input and generate a recommendation. We haven't discussed it yet but a algorithm called a decision tree will actually outline and describe why it gave a recommendation.

It will say I took all these inputs in, I asked all of these kinds of questions, built these probabilities and that's the reason why I'm giving you this recommendation. A neural network is not like that it is a black box which means we pipe in our data it performs all of its magic and then it gives a recommendation. However we don't have a lot of insight into why it gave that recommendation and the reason for that is because a neural network has to perform so many tasks that in order for it to explain how it came up with a certain recommendation or a certain probability it would take just as much time for it to come up with that rationale.

And so what a neural network does is it simply performs the tasks and it leaves you a little blind to understanding how something is occurring and in many cases that's perfectly fine. Take our example of having your application upload an apple to tensor flow and having tenser flows say yeah it's about 85 percent accurate that this is most likely an apple. You as a developer don't really care how the neural network came up with that recommendation. You're just happy that it's accurate. And so in many cases that is perfectly fine.

However, there is one quote that I absolutely love and that is from an anonymous Professor online that said a neural network is the second best way to solve any problem. The best way is to actually understand the problem.

large

That is a little bit humorous at least from an academic perspective but it is also very true and that is because many people simply try to pipe in all of their data into a neural network and you try to let the neural network do the thinking for you and that will work sometimes but it is not a silver bullet. And so if it were me and I'm giving you the recommendation my personal belief is that you should try as hard as humanly possible to build out your own solutions without using a neural network first.

Not even because that will be the final solution your final solution could very well be to simply pipe your data to one of the neural nets and then have it give the recommendation. But during the process when you're implementing algorithms such as naive Bayes or logistic regression or linear regression or decision trees when you're doing that you're going to start to understand your data and your problem domain even better. And because of that, it's going to make you much better at knowing how you should form your data before you give it to the neural network.

So my personal recommendation would be to have a staged approach where you don't simply run to a neural net. The second you get your data or the second that your boss asks you to build out a recommendation but instead try to dissect the data a little bit. Try to figure out how you could implement a set of shallow learning algorithms and implement those first. See how accurate you can get that data.

You may discover that you don't even need a neural network or you may discover that you have simply learned a lot more about your problem domain and you're going to be able to then leverage the power of the neural network after that and the last con that I will discuss is also related to the resource intensiveness of neural networks and that is slow performance. Neural networks can perform very slowly depending on what kind of data you're giving especially if you're giving millions tens of millions or even billions of data points. It is not outside the realm of possibility for a neural network to take days or even weeks to generate a recommendation.

Now that is not common usually, a neural network can have all of its data and take in an input and then give its recommendation relatively quickly. So tools like tensor flow and some of those as long as the problem that you're giving them is simple enough. It is seemingly instead. But with that being said I did want to add in slow performance as a potential con depending on how much data that you're given.

So now that we've taken this high level overview of neural networks we've talked about the pros the cons and one of the biggest takeaways I want you to have is to understand that a neural network should not be your first choice because there are so many other algorithms out there and those other algorithms might be better for helping you understand exactly the solution that you're trying to build out. But since we've covered all that now let's get into our case study and this is going to be a fun one where we are going to walk through what it looks like to build a baseball strategy engine with a neural network.

large

Now I picked this as a case study for example for a number of reasons and I'm going to describe these first because I want you to have a good mental framework for the machine learning community and use cases. And in this specific case for neural networks. So I've performed a number of consulting jobs for major league baseball organizations and neural networks are used extensively throughout all of baseball so the example I'm giving you is not going to be a contrived example, in fact, it is a question that I was specifically brought on by a team to help answer.

And so I want to kind of give that to you so you understand that what I'm going to be telling you is not just an example I thought would be fun it is actually something that you could potentially be hired to figure out in your own career. So the scenario that I'm going to bring up is a scenario where we need to figure out the best pitch to throw a batter.

So imagine that we are working for the baseball team and we are about to play a team and we have the lineup of the players so we know who are pitchers going to face.

large

And as you can see here on the bottom left-hand side these are the statistics for our top pitcher here which his name is Justin Verlander and the bottom is Mike Trout who's one of the best hitters in baseball. So this proposes a little bit of an interesting quandary because if you look at these data points this is actually minimal. This is not going to give us enough data to go on.

All we have is a historical match up of 20 at-bats that's what the AB stands for. With two hits that's what the H stands for. Two home runs an average of 100. And some of the other data points you can see there now as helpful as that may seem 20 at-bats doesn't really give us what we need. We don't know what pitch to throw the hitter we simply have an idea on the historical data.

So let's look at some of the other items that Major League Baseball teams look at when they're going through the data and the statistics

large

they need to check hitter performance against a specific pitcher with pitch speed pitch type pitch location. Then from there they need to look at the pitcher and see their performance against a specific hitter with the pitch speed, pitch type, and pitch location. Then you also have to manage conflicts and we're going to see what that looks like later on. Then you might also have to see how they perform in day versus night games and then you also have to take into consideration what field they're playing on.

So if you're playing in Yankee stadium that has very short fences then that may also change the type of pitches that you're going to throw the hitter so you have to analyze all of that data as well along with the pitch count. Now if you're not familiar with baseball whatsoever then I can simply give you a high-level overview. If you have a pitch count of three balls and one strike versus zero balls and two strikes. You're going to throw most likely a completely different pitch in a different location and a different pitch type.

So that is a very important set of elements you need to take in all of these in order to make a true and accurate recommendation. As we are hired by the major league baseball organization we have all of this data so we might have this in spreadsheet form.

There are all kinds of different ways that you can get this kind of data. Now if I were to hand this to you and say I want you to go build this and if you looked at it naive Bayes wouldn't really be a good fit for this. A support vector machine really isn't going to be able to do this. We talked about how support vector machines can manage a few dimensions but look at this we're looking at all kinds of different dimensions and managing conflicts that is a very critical task and the algorithms that we've looked at so far don't do a great job of that because they don't look at the entire world.

They only look at a very limited sample size of data just to give you another visual right here. In addition to all the elements we just described these are heat maps.

large

So this is where and this is what baseball teams and statisticians look at. Right here we have heat maps to see how well a hitter does against hitting certain types of pitchers. So RHB stands for a right-handed batter versus right-handed pitcher. So you can see this is just taking historical data of a hitter and then seeing how well they do on certain pitch types and then those numbers you can see deal with velocity.

So I'm not giving you this is a case study to try to teach you about baseball. If you're feeling a little confused right now because of all of the different elements I've thrown at you that's kind of my a little bit sneaky point. I do not want you to feel like you have to become a baseball statistician or anything like that right now. Instead what I want to really stress is that this is a lot of data. There is no way that we as humans could take all of this in and create recommendations. But even beyond that it would be very challenging for us to take all of this data and then correlate it all in a way where it could fit in with some of the shallow learning algorithms we've talked about so far.

And extending that to another example

large

let's say that our star pitcher right here his best pitch so this is a pitch he gets guys out on the very most, is this inside fastball. Well, what happens when he faces a hitter who his very best pitch the pitch that he hits for home runs and gets the most hits on is in the exact same place.

large

So our neural network needs to be able to take all of those elements into account. And if you remember when I said managing conflict this is an exact example and this was, in fact, this was the exact reason I was hired was because they wanted to figure out the best way to implement this so that they could manage conflicts for when a pitcher and a hitter happen to have the exact same hot zone. And so this is a very common problem not just in baseball but just in general.

So if we go and we try to create our normal system where we take all of our data and we input it into the funnel.

large

Our goal is to have a pitch to throw. Well, this is going to be a very challenging task for any of the shallow algorithms that we've talked about so far. We need something a little bit bigger. And so we are going to now be using a neural network instead. So instead of a little funnel, we're going to have a factory.

large

And the reason why I like this is because the factory analogy makes a lot of sense to me because a factory is not simply one process.

large

If you were to go into a factory you would see that you have all kinds of different processes happening at one time. Imagine a car factory like this.

large

You don't just have one line that everything goes through and everyone performs the same task. That's like a shallow algorithm such as naive Bayes where it just runs a set of probabilities through the system with one statistical formula. And that's how it gives its recommendation.

Instead a neural network is like a factory where right here a factory that's building a car has some type of machine that comes and it brings the body of the car another machine puts the seats in another one puts the engine in another one will put the block of the engine on top of it and it's going to put all of the different parts of that car together. There are all kinds of different processes that come together in order to build a car.

A neural network is the same type of process instead of simply having one algorithm or one process that the data gets run through like naive bays or support vector machine a neural network creates stages.

So as you can see right here

large

with these green dots on the left-hand side, you can imagine that these are the inputs and based off of those inputs the data is going to get routed to different fields. And so the way that it's described in the machine learning community is these blocks inside or these circles inside are called hidden fields. And for the sake of simplicity even those may not look simplistic at all but for the sake of simplicity we can imagine that there are only two layers right here but in all reality there could be hundreds of different layers and the way it works is that when the data gets input it goes into one of those little green dots and then it goes in it's rounded to each one of those other dots.

And so as it's going through these the system is a learning. And so it takes all of the historical data and then it takes in these new inputs and based on that it runs it through all of the historical data. So it's a model of the world is what all of the new inputs are processed through and then it traverses all the way until the very end where it gives its recommendation so it's going to go through hitter performance and pitcher performance.

large

It's going to manage conflicts it's going to take in all of the parameters and then out of that, it is going to make its recommendation.

So, in other words, this is a very simplistic version of it

large

this isn't the way a neural network works I'm going to give that as a caveat but this is a way that I would like you to try to think of it at least in a high level where it's going to take in pitch types, pitch location, field dimensions, and at each layer of the neural net it is going to build its probabilities off of that. So it's going to keep on getting more and more certain about what the right answer is going to be. So going back to our case study and trying to figure out the pitch that Justin Verlander should throw Mike Trout it's going to take in all of the historical data. It's going to take in everything that we provide to the system and then based on that based on how all the data gets merged together it's going to put it in that black box and then it's going to output what it thinks based on its view of the world is going to be the best option.

And that my machine learning friends is a high-level overview on how neural networks work in the machine learning space.