Questions Flashcards by Paula Alvarez

What is the difference between supervised and unsupervised learning?

Supervised learning uses labeled data; unsupervised learning uses unlabeled data.

How well did you know this?

Not at all

Perfectly

Explain the concept of overfitting in machine learning. How can it be mitigated?

Overfitting is when the model performs poorly on new data; it can be mitigated by using techniques like cross-validation and regularization.

How well did you know this?

Not at all

Perfectly

What is a confusion matrix and what are its components?

A table used to evaluate classification models; it includes true positives, true negatives, false positives, and false negatives.

How well did you know this?

Not at all

Perfectly

Explain the difference between precision and recall.

Precision is the ratio of true positives to predicted positives; recall is the ratio of true positives to actual positives.

How well did you know this?

Not at all

Perfectly

What is cross-validation and why is it important?

A technique for assessing model performance; it helps detect overfitting and ensures good generalization.

How well did you know this?

Not at all

Perfectly

What is the purpose of regularization in machine learning models?

To reduce the model complexity and prevent overfitting.

How well did you know this?

Not at all

Perfectly

How does the k-means clustering algorithm work?

It partitions data into k clusters by minimizing the distance between points within each cluster.

How well did you know this?

Not at all

Perfectly

What is gradient descent and how is it used in machine learning?

An optimization algorithm used to minimize the loss function by iteratively updating the model’s parameters.

How well did you know this?

Not at all

Perfectly

Which of the following is an example of an ordinal feature?

Education Level (e.g., High School, Bachelor’s, Master’s)

How well did you know this?

Not at all

Perfectly

What is the main difference between categorical and ordinal features?

Ordinal features have a meaningful order; categorical do not

How well did you know this?

Not at all

Perfectly

Which of the following is an embedded method for feature selection?

LASSO regression

How well did you know this?

Not at all

Perfectly

Which of the following methods uses a model to evaluate subsets of features?

Wrapper

How well did you know this?

Not at all

Perfectly

Why is overfitting a problem in machine learning?

The model does not generalize well to new data

How well did you know this?

Not at all

Perfectly

What is the main objective of a Support Vector Machine?

To maximize the margin between classes

How well did you know this?

Not at all

Perfectly

Which term is optimized in the SVM cost function?

Hinge loss with L2 regularization

How well did you know this?

Not at all

Perfectly

What does the SVM hyperparameter C control?

The tradeoff between margin width and classification error

How well did you know this?

Not at all

Perfectly

Why does SVM not output class probabilities?

Study These Flashcards

It only provides a separating hyperplane

What is a known disadvantage of decision trees?

Study These Flashcards

They tend to overfit the data

What happens when a decision tree is trained on noisy data?

Study These Flashcards

It increases variance

Why are ensemble methods like Random Forest used with decision trees?

Study These Flashcards

To reduce variance and improve generalization

Why can’t decision trees extrapolate?

Study These Flashcards

They create discrete predictions based on seen data

Which parameter in Gradient Boosting controls learning rate?

Study These Flashcards

Eta or Learning rate in most API

What is the purpose of using subsamples in Stochastic Gradient Boosting?

Study These Flashcards

To reduce variance and increase robustness

What happens when you decrease the learning rate in Gradient Boosting?

Study These Flashcards

You need more trees to maintain accuracy

Which of the following is a core idea behind boosting?

Training models sequentially with error correction

Which clustering method works better with non-spherical cluster shapes?

DBSCAN

What is the key limitation of K-Means clustering?

Requires specifying number of clusters in advance

What are the two main hyperparameters in DBSCAN?

eps and min_samples

Which method is best if we need a hierarchy between clusters?

Hierarchical Clustering

Which rule is used to compute gradients in deep networks?

Chain rule

What is the main cost of the backward pass in neural networks?

Higher computational cost (speed & memory)

Which library is most commonly used for deep learning in Python?

TensorFlow

What is the common strategy when applying deep learning to vision tasks?

Use pre-trained weights and fine-tune

Which network is most commonly used in Image2Image translation tasks?

U-Net

Why are multi-resolution features important in dense prediction tasks?

To capture both local and global information

Which of the following is often true in computer vision tasks?

Careful metric selection is critical

What is tokenization in NLP?

A process of splitting text into smaller units such as words or subwords

What is the purpose of word embeddings like Word2Vec or GloVe?

To represent words as dense vectors capturing semantic relationships

Which architecture is the foundation of models like BERT and GPT?

Transformer

What is the attention mechanism used for in NLP models?

To dynamically weigh the importance of different words in a sequence

What is the primary difference between GPT and BERT?

BERT uses a bidirectional transformer; GPT uses a unidirectional one

Which of the following is a common task in NLP?

Named Entity Recognition (NER)

Questions Flashcards

multiple choice (42 cards)