ML Part 1 Flashcards
(25 cards)
What is supervised learning?
A machine learning task where the model learns from labeled data.
What is unsupervised learning?
A task where the model finds patterns or structure in unlabeled data.
What is the difference between regression and classification?
Regression predicts continuous values; classification predicts discrete labels.
What is reinforcement learning?
A type of learning where an agent learns by interacting with an environment and receiving rewards or penalties.
What is a model in machine learning?
A mathematical function or algorithm that maps inputs to outputs.
What is overfitting?
When a model learns noise in the training data and performs poorly on unseen data.
What is underfitting?
When a model is too simple to capture underlying patterns in the data.
What is the bias-variance tradeoff?
The balance between underfitting (high bias) and overfitting (high variance).
How can you reduce overfitting?
Use regularization, more data, cross-validation, or simpler models.
How can you reduce underfitting?
Use more complex models or add relevant features.
What is accuracy?
The proportion of correct predictions out of all predictions made.
What is precision?
The proportion of true positives among all predicted positives.
What is recall?
The proportion of true positives among all actual positives.
What is the F1 score?
The harmonic mean of precision and recall.
What is a confusion matrix?
A table showing true vs predicted classifications (TP, FP, FN, TN).
What is a train-test split?
Dividing data into a training set and a test set to evaluate generalization.
What is k-fold cross-validation?
Dividing data into k parts, training on k-1 and testing on the remaining fold, repeated k times.
Why use cross-validation?
To get a more reliable estimate of model performance on unseen data.
What is the purpose of a validation set?
To tune model parameters before final evaluation on the test set.
What is data leakage?
When information from outside the training set is used in model training, leading to unrealistic performance.
What is a machine learning pipeline?
A sequence of data preprocessing and modeling steps applied consistently.
What are the stages in a basic ML workflow?
Preprocessing → training → validation → testing → deployment.
What is model deployment?
Making a trained model available for use in production environments.
What is model inference?
Using a trained model to make predictions on new data.