Machine Learning - Basic Starting Notes

Machine Learning Problem Framing - 

Define a  ML Problem and propose a solution

  1. Articulate a problem
  2. See if any labeled data exists
  3. Design your data for the model
  4. Determine where the data comes from
  5. Determine easily obtained inputs
  6. Determine quantifiable inputs


We have major three types of models:

  1. Supervised Learning
  2. Un-Supervised Learning
  3. Reinforcement Learning : There is no data requirement of labeled data, and the model acts like an agent which learns. It works on foundation of a reward function. Challenges lie in defining a good reward function. Also RL models are less stable and predictable than supervised approaches. Additionally, you need to provide a way for the agent to interact with the game to produce data, which means either building a physical agent that can interact with the real world or a virtual agent and a virtual world, either of which is a big challenge.


Type of ML Problem Description Example
Classification Pick one of N labels Cat, dog, horse, or bear
Regression Predict numerical values Click-through rate
Clustering Group similar examples Most relevant documents (unsupervised)
Association rule learning Infer likely association patterns in data If you buy hamburger buns, you're likely to buy hamburgers (unsupervised)
Structured output Create complex output Natural language parse trees, image recognition bounding boxes
Ranking Identify position on a scale or status Search result ranking


In traditional software engineering, you can reason from requirements to a workable design, but with machine learning, it will be necessary to experiment to find a workable model.

Models will make mistakes that are difficult to debug, due to anything from skewed training data to unexpected interpretations of data during training. Furthermore, when machine-learned models are incorporated into products, the interactions can be complicated, making it difficult to predict and test all possible situations. These challenges require product teams to spend a lot of time figuring out what their machine learning systems are doing and how to improve them.


Know the Problem Before Focusing on the Data

If you understand the problem clearly, you should be able to list some potential solutions to test in order to generate the best model. Understand that you will likely have to try out a few solutions before you land on a good working model.

Exploratory data analysis can help you understand your data, but you can't yet claim that patterns you find generalize until you check those patterns against previously unseen data. Failure to check could lead you in the wrong direction or reinforce stereotypes or bias.