Questions Flashcards

multiple choice (42 cards)

1
Q

What is the difference between supervised and unsupervised learning?

A

Supervised learning uses labeled data; unsupervised learning uses unlabeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the concept of overfitting in machine learning. How can it be mitigated?

A

Overfitting is when the model performs poorly on new data; it can be mitigated by using techniques like cross-validation and regularization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a confusion matrix and what are its components?

A

A table used to evaluate classification models; it includes true positives, true negatives, false positives, and false negatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the difference between precision and recall.

A

Precision is the ratio of true positives to predicted positives; recall is the ratio of true positives to actual positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is cross-validation and why is it important?

A

A technique for assessing model performance; it helps detect overfitting and ensures good generalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of regularization in machine learning models?

A

To reduce the model complexity and prevent overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the k-means clustering algorithm work?

A

It partitions data into k clusters by minimizing the distance between points within each cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is gradient descent and how is it used in machine learning?

A

An optimization algorithm used to minimize the loss function by iteratively updating the model’s parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following is an example of an ordinal feature?

A

Education Level (e.g., High School, Bachelor’s, Master’s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the main difference between categorical and ordinal features?

A

Ordinal features have a meaningful order; categorical do not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following is an embedded method for feature selection?

A

LASSO regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following methods uses a model to evaluate subsets of features?

A

Wrapper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is overfitting a problem in machine learning?

A

The model does not generalize well to new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main objective of a Support Vector Machine?

A

To maximize the margin between classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which term is optimized in the SVM cost function?

A

Hinge loss with L2 regularization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the SVM hyperparameter C control?

A

The tradeoff between margin width and classification error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why does SVM not output class probabilities?

A

It only provides a separating hyperplane

18
Q

What is a known disadvantage of decision trees?

A

They tend to overfit the data

19
Q

What happens when a decision tree is trained on noisy data?

A

It increases variance

20
Q

Why are ensemble methods like Random Forest used with decision trees?

A

To reduce variance and improve generalization

21
Q

Why can’t decision trees extrapolate?

A

They create discrete predictions based on seen data

22
Q

Which parameter in Gradient Boosting controls learning rate?

A

Eta or Learning rate in most API

23
Q

What is the purpose of using subsamples in Stochastic Gradient Boosting?

A

To reduce variance and increase robustness

24
Q

What happens when you decrease the learning rate in Gradient Boosting?

A

You need more trees to maintain accuracy

25
Which of the following is a core idea behind boosting?
Training models sequentially with error correction
26
Which clustering method works better with non-spherical cluster shapes?
DBSCAN
27
What is the key limitation of K-Means clustering?
Requires specifying number of clusters in advance
28
What are the two main hyperparameters in DBSCAN?
eps and min_samples
29
Which method is best if we need a hierarchy between clusters?
Hierarchical Clustering
30
Which rule is used to compute gradients in deep networks?
Chain rule
31
What is the main cost of the backward pass in neural networks?
Higher computational cost (speed & memory)
32
Which library is most commonly used for deep learning in Python?
TensorFlow
33
What is the common strategy when applying deep learning to vision tasks?
Use pre-trained weights and fine-tune
34
Which network is most commonly used in Image2Image translation tasks?
U-Net
35
Why are multi-resolution features important in dense prediction tasks?
To capture both local and global information
36
Which of the following is often true in computer vision tasks?
Careful metric selection is critical
37
What is tokenization in NLP?
A process of splitting text into smaller units such as words or subwords
38
What is the purpose of word embeddings like Word2Vec or GloVe?
To represent words as dense vectors capturing semantic relationships
39
Which architecture is the foundation of models like BERT and GPT?
Transformer
40
What is the attention mechanism used for in NLP models?
To dynamically weigh the importance of different words in a sequence
41
What is the primary difference between GPT and BERT?
BERT uses a bidirectional transformer; GPT uses a unidirectional one
42
Which of the following is a common task in NLP?
Named Entity Recognition (NER)