Interviews Flashcards

1
Q

What’s the trade-off between bias and variance?

A

Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you’re using.

Variance is error due to too much complexity in the learning algorithm you’re using.

If you make the model more complex and add more variables, you’ll lose bias but gain some variance — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between supervised and unsupervised machine learning?

A

Supervised requires labeled outcomes, unsupervised does not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is KNN different from k-means clustering?

A

K-Nearest Neighbors is a supervised classification algorithm, while k-means clustering is an unsupervised clustering algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain how a ROC curve works.

A

The ROC curve is a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds. It’s often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define precision and recall.

A

Recall is also known as the true positive rate: the amount of positives your model claims compared to the actual number of positives there are throughout the data. Precision is also known as the positive predictive value, and it is a measure of the amount of accurate positives your model claims compared to the number of positives it actually claims.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Bayes’ Theorem? How is it useful in a machine learning context?

A

Bayes’ Theorem gives you the posterior probability of an event given what is known as prior knowledge.

Bayes’ Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is “Naive” Bayes naive?

A

Naive Bayes is considered “Naive” because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the difference between L1 and L2 regularization.

A

L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s the difference between Type I and Type II error?

A

Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I error means claiming something has happened when it hasn’t, while Type II error means that you claim nothing is happening when in fact something is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s a Fourier transform?

A

A Fourier transform converts a signal from time to frequency domain — it’s a very common way to extract features from audio signals or other time series such as sensor data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is deep learning, and how does it contrast with other machine learning algorithms?

A

Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What’s the difference between a generative and discriminative model?

A

A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data. Discriminative models will generally outperform generative models on classification tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What cross-validation technique would you use on a time series dataset?

A

Instead of using standard k-folds cross-validation, you have to pay attention to the fact that a time series is not randomly distributed data — it is inherently ordered by chronological order. If a pattern emerges in later time periods for example, your model may still pick up on it even if that effect doesn’t hold in earlier years!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is a decision tree pruned?

A

Pruning is what happens in decision trees when branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model. Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning and cost complexity pruning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which is more important to you– model accuracy, or model performance?

A

This question tests your grasp of the nuances of machine learning model performance! Machine learning interview questions often look towards the details. There are models with higher accuracy that can perform worse in predictive power — how does that make sense?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What’s the F1 score? How would you use it?

A

The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst. You would use it in classification tests where true negatives don’t matter much.

17
Q

How would you handle an imbalanced dataset?

A

1- Collect more data to even the imbalances in the dataset.

2- Resample the dataset to correct for imbalances.

3- Try a different algorithm altogether on your dataset.

18
Q

Name an example where ensemble techniques might be useful.

A

Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data).

19
Q

How do you ensure you’re not overfitting with a model?

A

There are three main methods to avoid overfitting:

1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.

2- Use cross-validation techniques such as k-folds cross-validation.

3- Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting.

20
Q

What evaluation approaches would you work to gauge the effectiveness of a machine learning model?

A

You would first split the dataset into training and test sets, or perhaps use cross-validation techniques to further segment the dataset into composite sets of training and test sets within the data

21
Q

What’s the “kernel trick” and how is it useful?

A

The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products. Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data.

22
Q

How do you handle missing or corrupted data in a dataset?

A

You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value.

23
Q

Do you have experience with Spark or big data tools for machine learning?

A

Spark is the big data tool most in demand now, able to handle immense datasets with speed.

24
Q

Pick an algorithm. Write the psuedo-code for a parallel implementation.

A
25
Q

What are some differences between a linked list and an array?

A

An array is an ordered collection of objects. A linked list is a series of objects with pointers that direct how to process them sequentially. An array assumes that every element has the same size, unlike the linked list. A linked list can more easily grow organically: an array has to be pre-defined or re-defined for organic growth. Shuffling a linked list involves changing which points direct where — meanwhile, shuffling an array is more complex and takes more memory.

26
Q

Describe a hash table.

A

A hash table is a data structure that produces an associative array. A key is mapped to certain values through the use of a hash function. They are often used for tasks such as database indexing.

27
Q

Define mean, mode, median. Explain these concepts to a layman. When is either preferred over the other.

A
28
Q

Explain different types of distributions. Why is normal distribution so important to data scientists. What is the central limit theorem? Give a real life example.

A
29
Q

What is Skewness?

A

Measures the lack of symmetry in data distribution

A symmetrical distribution will have a skewness of 0

30
Q

What is Kurtosis?

A

Describes the extreme values in one versus the other tail.

It is actually the measure of outliers present in the distribution.

High kurtosis in a data set is an indicator that data has heavy tails or outliers.

31
Q

What is heteroscedasticity?

A

Errors increasing as the X value increases

Variance of the errors increases

Variance in the variability of a variable against an independent predicting variable