The Goal of Statistical Analysis
In this guide, we're going to discuss the goal of statistical analysis. I think we would be remiss if we went straight into some of the mathematical formulae without understanding the long-range purpose for why we even use stats.
Guide Tasks
  • Read Tutorial
  • Watch Guide Video
Video locked
This video is viewable to users with a Bottega Bootcamp license

One of the main things and goals that we try to achieve when we work with statistical analysis and further on into data science and machine learning is we try to separate the signal from the noise and that was actually one of the most popular machine learning and data science books in the past decade.

What that essentially means is we have so much data out there and what statistical analysis allows us to do is to remove our own personal feelings about a specific topic and instead let the data speak for itself. So let's walk through a case study to just talk about what that goal looks like.

Let's imagine that we're building out some type of machine learning algorithm for analyzing if a potential job candidate should be in our company and if they are what field of practice they should be.

Imagine a scenario and I'm going to draw a graph right here where we have all kinds of different clusters of data so we might have a set of employees that all have computer science degrees over here and they have a certain set of specialized knowledge and skills and they reside in this cluster right here

large

This is a real machine learning algorithm that will eventually go through and will be one of our case studies. But this is just giving you a high level example on how important statistical analysis is because say we have all of these groupings and then we have a different part of the company that has users with accounting degrees and they have all other kinds of knowledge skill sets and experiences. And this is in another cluster.

large

When you have a job candidate who comes in and they apply for a job how do you decide what part of the company they're going to be the best fit in? It would be the logical approach to just say oh if it's a computer scientist they come here and if it's an account if they have experience in accounting they go right there and in many cases, a personal feeling may be accurate.

However in statistics we don't really care as much about our personal feelings and we let the data do the talking. So if you have a job candidate come in and they fill out what their degree was in and what their experiences were and their full list of skill sets.

What stats will allow us to do is to find out in which camp they fall into because this may seem common sense mainly because I've made a very high level. But imagine a real world scenario. This is a common problem and it is hard to find where individuals should be placed in companies or even if they should be placed in a company period.

Imagine a scenario where a very confident and seemingly successful person comes in and they apply for a job. You have all this historical data. You know what a successful employee looks like in different parts of the organization. But what happens if this other employee and his stats and all of the data around him are is actually over here?

He may seem like the right person just based on your personal feelings. Maybe you like them but what if none of the data agrees with making him a part of the company. And that comes into where dight data science comes into play where we can let the data help make the decision for us.