Educative Grokking the ML Interview Flashcards

1
Q

How a candidate should approach machine learning system design questions?

A
  1. Setting up the problem
  2. Understanding the scale and latency requirements
  3. Defining metrics
  4. Architecture discussions
  5. Offline model building & execution
  6. Online model execution & evaluation
  7. Iterative model improvement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you close the gap between your understanding of the question and the interviewer’s expectations from your answer?

A

Ask questions

You will be able to narrow down your problem space, chalk out the requirements of the system, and finally arrive at a precise machine learning problem statement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What should you do after arriving at a precise machine learning problem statement?

A

Discuss about performance and capacity considerations of the system

Latency requirements and scale of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is defining metrics important?

A

Metrics will help you to see if your system is performing well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some metrics used for offline testing?

A

You may have generic metrics; for example, if you are performing binary classification, you will use AUC, log loss, precision, recall, and F1-score. In other cases, you might have to come up with specific metrics for a certain problem. For instance, for the search ranking problem, you would use NDCG as a metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is offline metrics used?

A

You will use offline metrics to quickly test the models’ performance during the development phase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is online metrics used?

A

Once you have selected the best performing models offline, you will use online metrics to test them in the production environment. The decision to deploy the newly created model depends on its performance in an online test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which type of metrics do you may need to come up with online metrics?

A

While coming up with online metrics, you may need both component-wise and end-to-end metrics.

Consider that you are making a search ranking model to display relevant results for search queries. You may use a component-wise metric such as NDCG to measure the performance of your model online. However, you also need to look at how the system (search engine) is performing with your new model plugged in, for which you can use end-to-end metrics. A commonly used end-to-end metric for this scenario is the users’ engagement and retention rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When should the architecture discussion come up?

A

The architecture discussion comes up after defining metrics, which comes after setting up the problem and understanding requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do you need to think about when figuring out the architecture of the system?

A

You need to think about the components of the system and how the data will flow through those components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which phase/step in the ML system setup helps in chalking out the architecture?

A

The requirements gathered during problem setup help you in chalking out the architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different ways one can generate training data?

A
  • Human labeled data
  • Data collection through a user’s interaction with the pre-existing system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What should our goal be when working on machine learning-based system?

A

As we work on a machine learning-based system, our goal is generally to improve our metrics (engagement rate, etc.) while ensuring that we meet the capacity and performance requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When does major performance and capacity discussions come up again after defining the requirements?

A

Major performance and capacity discussions come in during the following two phases of building a machine learning system:

  1. Training time: How much training data and capacity is needed to build our predictor?
  2. Evaluation time: What are the Service level agreement (SLA) that we have to meet while serving the model and capacity needs?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the three different types of complexities that ML algorithms have?

A
  • Training complexity
  • Evaluation complexity
  • Sample complexity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the training complexity of a ML algorithm?

A

The training complexity of a machine learning algorithm is the time taken by it to train the model for a given task.

17
Q

What is the evaluation complexity of a ML algorithm?

A

The evaluation complexity of a machine learning algorithm is the time taken by it to evaluate the input at testing time.

18
Q

What is the sample complexity complexity of a ML algorithm?

A

The sample complexity of a machine learning algorithm is the total number of training samples required to learn a target function successfully.

19
Q

What the 3 components that ML systems consist of?

A

A machine-learning system consists of three main components. They are the training algorithm (e.g., neural network, decision trees, etc.), training data, and features. The training data is of paramount importance.

20
Q

What are the hypotheses that was fomulated for the A/B test?

A

Null hypotheses and alternative hypotheses

21
Q

What is the null hypothesis?

A

H0 is when the design change will not have an effect on variation. If we fail to reject the null hypothesis, we should not launch the new feature.

22
Q

What is the alternative hypothesis?

A

H1 is alternate to the null hypothesis whereby the design change will have an effect on the variation. If the null hypothesis is rejected, then we accept the alternative hypothesis and we should launch the new feature. Simply put, the variation will go in production.

23
Q

What is P-value?

A

P-value is used to help determine the statistical significance of the results. In interpreting the p-value of a significance test, a significance level (alpha) must be specified.

The significance level is a boundary for specifying a statistically significant finding when interpreting the p-value. A commonly used value for the significance level is 5% written as 0.05.

24
Q

What is a back test?

A

Backtesting is a term used in modeling to refer to testing a predictive model on historical data

25
Q

Why is transfer learning useful?

A

It is because the model doesn’t have to learn from scratch and can achieve higher accuracy in less time as compared to models that don’t use transfer learning.