What is Machine Learning?

We're going to talk about some of the components that make up machine learning along with various input and then output. And hopefully, that could give you a good overview and a good mental framework for what machine learning developers do.

First and foremost let's talk about the components that make up machine learning. And the main component you're going to find is a set of algorithms. Now, these algorithms have all kinds of variations so we have the ability to have algorithms that create clusters of data so that we can analyze and predict patterns. We also have other types of algorithms where we say we want to find a certain type of a result.

So for example, going back to the sales example if you have a lot of sales data and you want to predict what lead is going to have the best chance to translate into a paying customer you have algorithms that can perform those kinds of tasks. So the main component that makes up machine learning is a full set of algorithms.

Now when we go through the algorithms section we're going to see what those groupings look like. And we're going to talk about all of the different kinds of case studies and implementations that you can use to build those systems.

The next key component in machine learning is the input. So the input is exactly what it sounds like. It is the data that gets input into the system. The way that it works is you can get data from all kinds of different sources and getting the data kind of falls into the realm of data science a little bit which we're going to talk about in the next guide.

But the input can come from SQL databases it can come from raw HTML it can come from tabular types of data such as Excel or CSV files. One thing that you're going to discover as you go through and you become a more advanced machine learning developer is that you're going to have to deal with all kinds of different data inputs.

So far we've covered two types of components. We've talked about the algorithms and we've talked about input. Now the last component we're going to discuss is the output. So the output is the final result any time that you're building a machine learning algorithm you are doing it to perform some type of task.

Now in regular web development or in mobile development, you're building systems that maybe you simply want to show something on a page or add a record to a database. But when it comes to machine learning algorithms and these types of programs usually you're going to want to build a recommendation as your output returning back to our sale example say that you have that full set of historical records of everyone that has purchased your system and you want to see if a potential lead is going to be a sale or if it's going to be someone who's not interested in the product.

If you can compare your current sale with all of your historical data the output is going to be that recommendation back when I was attending Texas Tech University one of my favorite professors took a very simple approach to data science and machine learning.

Anytime I would go and I approach them with a question he would always be able to drill down to the most simple issue and he would always take it down and ask me what was your input and what was your output. And that's a reason why I wanted to kind of wrap this entire guide into the set of components and really focus on the input and the output so that you can build that framework in your mind where you realize that when you're building these systems that's what you want to drill down to.

You want to see what data do I have? How can I work with that? and what is the desired output look like?

Now you have a high-level understanding of the three components that make up machine learning. Now let's walk through a true case study where we look at some details on how this could work in a real-world application.

As you can see right here we have a little diagram.

large

We have a process and this includes all three of our components. So we have a funnel and we have these three little dots and then we have a recommended article when it comes to the components we just discussed. Each one of these little dots so Article 1, 2, and 3. These are the input the funnel is the algorithm and then the output that recommended article.

Now let's walk through how I was able to implement this type of functionality into a few products that I was building. Looking at the dev camp Web site each one of these guides has a lot of metadata associated with it. It has a title it has tags it has a lot of content as a transcript from each one of these videos and so because I had all of that data I was able to build a recommendation system that would look at an article like this one right here and then it was able to tag that.

So I was able to scan through the document and use a source article. I was able to pick out the three keywords

large

so I was able to say OK Python has used seven times in this article. The word list is used 28 times and the word extend is used 11 times. So this is my source article that I'm wanting to build a recommendation for. So the goal of this entire program is I want to give students a set of recommended articles if they want more information but I want these articles to be dynamic so I want the best article to be listed first. This is very similar to how the Google search engine and pay drink system works.

Right here we have our source article. We have these keywords and I have it limited in this case study simply to 3 in the system I built out actually used 10 of the most popular keywords and then I had an entire library of content that was what the machine learning algorithm was actually learning from. And so we have all of these articles.

I only list 3 here but it actually has in the database it has thousands of different articles and so we have Article 1, 2, and 3 and inside of each one of these articles in the metadata what I did is I ran the exact same probability thru it so I looked for all of these keywords such as I checked in Article 1 Python was listed 13 times list was used 12 and the word extent was used 5. And then looking at Article 2 and 3 we can see that we have different counts we have different values for each one of those keywords.

large

If you're ever curious about how Google was originally built this was one of the biggest elements that they integrated and the system was a wait. It was able to analyze an entire page count up the words and build up a probability index to say what that article is most likely talking about. If we look at this system right here and we look at these three examples you can see that article 3 is most likely the most likely solution and it should be the recommended article and the reason why is because if you look at each one of those keywords Python is not listed as much in Article 3 as it is in Article 2.

However, Article 3 has the word list and extend dramatically more than in either Article 1 or 2. And so the way that the system worked is it went through and it was able to dynamically calculate and read through all of that data much faster than a human could. And based on a algorithm probability scoring matrix it was able to go through and say I think in this case and in the case of this source article this other article is one that should be recommended.

If you go to a different spot on the website you'll see a different recommended article. And it is all happening dynamically and it's all leveraging machine learning.

So hopefully this gives you a little bit of a glimpse into the processes that can take place and the types of systems that you can build by leveraging machine learning in your own applications.