quiz 4 Flashcards
(32 cards)
What is Predictive Data Mining?
A form of supervised machine learning that uses input data to predict a known outcome.
What are the steps in the CRISP-DM process?
- Business Understanding
- Data Understanding
- Data Sampling
- Data Preparation
- Data Partitioning
- Model Construction
- Model Evaluation
- Deployment
What does a Confusion Matrix represent?
It tracks how well the model is classifying outcomes.
What are the components of a Confusion Matrix?
- True Positive (TP)
- False Negative (FN)
- False Positive (FP)
- True Negative (TN)
What is the formula for Accuracy?
Accuracy = (TP + TN) / Total
What does Precision measure?
Precision = TP / (TP + FP)
What does Recall / Sensitivity measure?
Recall = TP / (TP + FN)
What is Specificity?
Specificity = TN / (TN + FP)
What is the F1 Score?
The harmonic mean of Precision and Recall.
What is the default cutoff value in classification?
0.5
What happens when the threshold is lowered?
More ‘yes’ predictions, higher recall, lower precision.
What is a Lift Chart?
A visual tool that compares a model’s ability to rank positives vs. a random guess.
What does ROC Curve plot?
True Positive Rate (Recall) vs False Positive Rate (1 – Specificity).
Why not use linear regression for classification?
Linear regression produces continuous outputs and doesn’t bound predictions between 0 and 1.
What transformation does logistic regression use?
The logit transformation.
What is the logistic model equation?
ln(p / (1 - p)) = β0 + β1x1 + β2x2 + …
What is k-Nearest Neighbours (k-NN)?
A method that classifies new observations based on the closest training points.
What distance metric is commonly used in k-NN?
Euclidean distance.
What is the effect of choosing k = 1 in k-NN?
Highly sensitive, may overfit.
What are key features of Classification and Regression Trees (CART)?
- Non-parametric model
- Tree-based structure
- Splits based on informative features
What is overfitting in the context of CART?
Fitting the training data too well, resulting in high variance.
What is the purpose of pruning in CART?
To improve generalisability based on validation error.
What is an Ensemble Method?
Combines multiple weak models to create a stronger model.
What is Monte Carlo Simulation?
A technique to model uncertainty using random values from probability distributions.