The goal here is to supplement understanding in a concise way for new students in this field. I think it’s essential to have a gist of what you should expect, so you could use that mentality to aid your learning.

We’re Looking For Patterns Most Of The Time

The core of data science is statistics, and a lot of it is about identifying patterns or repeating trends and understanding why it happens.


That’s why for many tutorials, they will suggest training 70-80% of the data, and then set up a test to see how accurate these predictions are. This is to see how well-formed our hypothesis is in terms of confidence level.


This is basic but it needs to be said.

We Are Quantifying Things That Don’t Make Numerical Sense.

Simply, we are attributing a number to an action or an event.


Take for example, a Facebook ‘like’ button. In a social context, we click the ‘like’ button to things that we resonate with. It’s not a number to many users.


But in a business perspective, how much does a ‘like’ weigh in terms of:

  • Visibility to our brand or assets
  • Sales funnel
  • Whatever your business goals you want

We make sense of actions that seem arbitrarily insignificant, and draw insights from them by quantifying them into numbers. We then set up test and correct our weights to these numbers as we find correlation and causation to the business goals which we want.


When I took up data science tutorials years ago, I didn’t understand a dime about why we are classifying flowers by their petals’ length and width, or why was I putting a number to certain actions or events. Once I realize we’re supposed to quantify and make sense of actions and events, it became clearer how data science and machine learning could improve the world, and it certainly helped me to improve significantly.


Therefore, we iterate our testing, validation and improve our models as we gather more data for predictability.

Why The Massive Amount Of Regression And Classification Techniques In Tutorials?

When you begin learning data science or machine learning, there are a ton of new techniques that you’re going to be bombarded with. But in short, different context calls for different solutions.


I was at Amazon Re:Mars 2019, and one of the best workshops which I attended is about rethinking recommendation systems. Usually, recommendation systems are built on a technique called collaborative filtering.


In the workshop, the speaker discussed the limitations of collaborative filtering. Amazon tried to use deep learning to determine features, and then use these features to empower decision-making. So they were steering away from traditional methods and it worked well for their recommendation systems.


But if you’re going through a data science or machine learning course, it’s more difficult to share what kind of solutions could improve your models and accuracy in your results because context is king. Thus, one of the ways to help new students is to give them an arsenal of tools to use, and they have to figure out through experimentation and simulation.


That’s why it’s important to know what to use, and when. And you’re learning how to break away from the status quo, and come up with solutions which others have not gone into. This is what makes data science, AI and machine learning shine, and that’s why this is a fun career to be in if you like to tinker things.


In a nutshell, if you’re a new student, it’s a lot to get used to. But once you ride through the initial rough waters, you’re going to be fine.