#10 - Machine Learning Flashcards by Justyna Neblik

What is Lift in model evaluation?

Lift measures the performance of a model compared to a random choice model. It shows how much better the model is at prediction.

How well did you know this?

Not at all

Perfectly

What is model fitting?

Model fitting indicates how well a model fits the given observations.

How well did you know this?

Not at all

Perfectly

What is sampling and its main advantage?

Sampling involves selecting a smaller, representative subset of a dataset to perform analysis. It saves time and resources while still providing meaningful insights, especially with large datasets.

How well did you know this?

Not at all

Perfectly

What is probability sampling?

A method where each member of the population has a known, non-zero chance of being selected. Use when statistical accuracy is important. Avoid if you have limited access to the full population.

How well did you know this?

Not at all

Perfectly

What is simple random sampling?

Each item has an equal chance of being selected, like drawing names from a hat. Use when the population is homogenous. Avoid with very large datasets due to complexity.

How well did you know this?

Not at all

Perfectly

What is stratified sampling?

The population is divided into subgroups (strata) and samples are taken from each. Use when key groups must be represented. Avoid if strata are not well defined.

How well did you know this?

Not at all

Perfectly

What is clustered sampling?

Population is divided into clusters, then a few clusters are randomly chosen. Use for geographically spread data or when a full list of individuals is hard to obtain. Avoid if clusters are not internally diverse.

How well did you know this?

Not at all

Perfectly

What is non-probability sampling?

Not all members have a known or equal chance of being selected. Use in exploratory research. Avoid when statistical generalization is needed.

How well did you know this?

Not at all

Perfectly

What is convenience sampling?

Samples are taken from an easily accessible group. Use for quick insights or pilot testing. Avoid if you want unbiased, generalizable results.

How well did you know this?

Not at all

Perfectly

What is quota sampling?

Samples are selected to match specific proportions of characteristics. Use when you need a balanced sample but can’t do random selection. Avoid if sampling bias is a concern.

How well did you know this?

Not at all

Perfectly

What is snowball sampling?

Subjects recruit future subjects, good for hidden or hard-to-reach populations. Use in social research or rare populations. Avoid if your study needs a representative, unbiased sample.

How well did you know this?

Not at all

Perfectly

What is supervised learning?

A type of ML where the model learns from labeled data. Use when you have input-output pairs. Avoid when labels are not available.

How well did you know this?

Not at all

Perfectly

What is unsupervised learning?

The model learns from unlabeled data to find patterns or structure. Use for clustering, dimensionality reduction. Avoid if task requires specific output predictions.

How well did you know this?

Not at all

Perfectly

What is semi-supervised learning?

Combines a small amount of labeled data with a large amount of unlabeled data. Use when labeling data is expensive. Avoid if you have plenty of labeled data.

How well did you know this?

Not at all

Perfectly

What is reinforcement learning?

The model learns by interacting with an environment and receiving rewards or penalties. Use in robotics, gaming, real-time decision making. Avoid for static datasets without feedback loops.

How well did you know this?

Not at all

Perfectly

What is classification in ML?

Study These Flashcards

A supervised learning task where the output is a category or label. Use for tasks like spam detection or image labeling. Avoid if output is numeric or continuous.

What is regression in ML?

Study These Flashcards

A supervised learning task where the output is a continuous value. Use for predicting prices or trends. Avoid if outputs are discrete classes.

What is clustering in ML?

Study These Flashcards

An unsupervised learning task to group similar items together. Use when exploring structure in data. Avoid if specific labels or targets are needed.

What is dimensionality reduction?

Study These Flashcards

Reduces the number of input features while preserving key information. Use to speed up models or for visualization. Avoid if interpretability of original features is critical.

What is data cleaning?

Study These Flashcards

Removing or correcting wrong, incomplete, or inconsistent data. Use before training. Avoid skipping, as dirty data leads to poor models.

What is normalization?

Study These Flashcards

Scaling values to a specific range, usually [0,1]. Use with distance-based algorithms. Avoid if algorithm is scale-invariant.

What is standardization (Z-score scaling)?

Study These Flashcards

Centers data around 0 with unit variance. Use for algorithms assuming Gaussian distribution. Avoid if interpretability is needed with original units.

What is encoding categorical variables?

Study These Flashcards

Converting categories into numeric form using label or one-hot encoding. Use for ML algorithms. Avoid one-hot on high-cardinality features.

What is handling missing data?

Study These Flashcards

Techniques include removing rows, filling with mean/median, or model-based imputation. Use with care depending on how much is missing. Avoid deleting data blindly.

What is feature engineering?

Creating new input features to improve model performance. Use when domain knowledge can add value. Avoid overfitting by adding too many features.

What is feature selection?

Choosing the most relevant features for the model. Use to improve accuracy and reduce overfitting. Avoid removing features blindly.

What is outlier detection and treatment?

Identifying values that are significantly different from others. Use to prevent skewed models. Avoid if outliers are meaningful.

What is overfitting in machine learning?

Overfitting occurs when a model performs well on training data but poorly on new, unseen data. It happens due to low bias and high variance. Example: decision trees are prone to overfitting.

What is underfitting in machine learning?

Underfitting happens when a model is too simple to capture the data patterns and performs poorly even on training data. It is caused by high bias and low variance. Example: linear regression is prone to underfitting.

#10 - Machine Learning Flashcards

(29 cards)