Supervised Learning Flashcards
What is Supervised Learning?
A type of ML where the model is trained on labeled data, learning from known answers.
Supervised Learning relies on a dataset that includes input-output pairs.
What are key features used in Supervised Learning?
- Buying Price
- Maintenance Cost
- Number of Doors
- Seating Capacity
- Luggage Boot Size
- Safety Rating
These features help the model learn and make predictions based on labeled data.
What is Predictive Modeling?
When ML learns patterns from data to make predictions.
Predictive modeling is a core aspect of Supervised Learning.
What is the difference between Regression and Classification?
- Regression → Predicts continuous values (e.g., house prices)
- Classification → Assigns data into categories (e.g., spam or not spam)
Understanding the difference is crucial for choosing the right model.
What is Linear Regression?
A method to find the best-fit line Y = mx + c, where c is the intercept and m is the slope.
Linear regression is used to model the relationship between two variables.
What are the limitations of Linear Regression?
- Not good for non-linear relationships
- Not good when there are too many outliers.
These limitations can affect the accuracy of predictions.
What is a Decision Tree?
A flowchart-like structure where each decision leads to an outcome.
Decision Trees are intuitive and easy to interpret.
What is the process of creating a Decision Tree?
- Pick the best feature
- Split the data into groups
- Keep splitting until groups are pure.
This process helps in making decisions based on the data.
What is Random Forest?
A collection of multiple decision trees to improve accuracy and reduce overfitting.
Random Forest is an ensemble method that enhances model performance.
How does Random Forest work?
- Train many Decision Trees on random data subsets
- Use different features at each split
- Combine all tree predictions.
This method helps in averaging out errors from individual trees.
What is k-Nearest Neighbors (k-NN)?
A method that classifies new data points based on the ‘k’ closest points in the dataset.
k-NN is a simple yet effective classification algorithm.
What is the process for k-NN classification?
- Store the data
- Choose k
- Measure who’s closest
- Pick the k nearest
- Count votes & classify based on majority
The choice of ‘k’ can significantly impact the classification result.
What is a limitation of k-NN?
It is slow for large datasets.
The computational cost increases with the size of the dataset.
List the main concepts of Supervised Learning.
- Uses labeled data
- Regression vs. Classification
- Linear Regression
- Decision Trees
- Random Forest
- k-NN
These concepts form the foundation of supervised learning techniques.
What is the goal of Linear Regression?
To find the best-fit line that represents the relationship between variables.
This goal helps in making accurate predictions based on input data.
True or False: Decision Trees can overfit.
True
Overfitting occurs when the model learns noise in the training data.
Fill in the blank: Random Forest is an army of _______.
[Decision Trees]
This metaphor highlights the ensemble nature of the Random Forest algorithm.