Final Review Flashcards

1
Q

(T/F) Supervised learning and unsupervised clustering both require at least one input attribute

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

(T/F) Grouping people in a social network is an example of unsupervised machine learning

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is topic modelling in natural language processing (NLP)?

A

Topic modelling is an unsupervised machine learning approach that can scan a series of documents, find word and phrase patterns within them, and automatically cluster word groupings and related expressions that best represent the set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a recurrent neural network (RNN)?

A

Recurrent neural networks are a class of neural networks that are helpful in modelling sequence data.
Derived from feed forward networks, RNNs exhibit similar behaviour to how the human brain functions. Simply put: recurrent neural networks produce predictive results in sequential data that other algorithms can’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the bias-variance tradeoff

A

Bias is the degree to which a models predictions vary from the true value. High bias implies a simple model that is not able to capture the complexity of the data and is underfit.
Variance is the degree to which a models predictions vary for different training sets. High variance implies a complex model that overfits to the training data.
The bias variance tradeoff is this the balance of model complexity that will get you the best amounts of bias and variance so as to not overfit or underfit to the training data and make more accurate predictions on new unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is lexicon normalization in text preprocessing?

A

A type of textual noise is about the multiple representations exhibited by a single word. For example - “play”, “player”, “played”, and “plays” are different variations of the word “play”. Though they mean different things contextually they are all similar.

Lexicon normalization converts all of the disparities of a word into their normalized form (also known as the lemma). Normalization is a pivotal feature for feature engineering with text as it converts the high dimensional features to a lower dimensional space which is ideal for any machine learning model. The most common lexicon normalization practices are stemming and lemmatization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define confusion matrix, accuracy, precision, and recall

A

A confusion matrix is an NxN matric used for evaluating the performance of a classification model where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kind of errors it is making.

Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the regularization techniques that can be used for a convolutional neural network?

A
  1. L2 & L1 regularization
  2. Dropout
  3. Data augmentation
  4. Early stopping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the steps to create a bag of words

A
  1. Tokenization: First, the input text is tokenized. A sentence is represented as a list of its constituent words and it’s done for all the input sentences
  2. Vocabulary creation: Of all the obtained tokenized words, only unique words are selected to create the vocabulary and then sorted in alphabetical order
  3. Vector creation: Finally, a sparse matrix is created for the input out of the frequency of vocabulary words. In this sparse matrix, each row sentence vector whose length (the columns of the matrix) is equal to the size of the vocabulary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(T/F) You have classification data with classes Y = {+1, -1} and features Fi = {+1, -1} for i = {1, …, K}. In an attempt to turbocharge your classifier you duplicate each feature so now each example has 2K features with Fk+i = Fi for i = {1, …, K}. The following questions compare the original feature set with the doubled one. You may assume in the case of ties, class +1 is always chosen. Assume there are equal numbers of training examples in each class.
For a Naive Bayes model, which of the following are true:
1. The test accuracy could be higher with the doubled feature set
2. The test accuracy will be the same with either feature set
3. The test accuracy could be higher with the original features

A
  1. False
  2. False
  3. True
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

You are training a sum model and you find training loss is near 0 but test loss is very high. Which of the following is expected to reduce test loss? (multi)

  1. Increase training data size
  2. Decrease training data size
  3. Increase model complexity
  4. Decrease model complexity
  5. Training on a combination of training and test but only test on test
  6. Conclude that ML doesn’t work
A
  1. Increase training data size
  2. Decrease model complexity
  3. Training on a combination of training and test but test only on test (would reduce test loss but is not good practice)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

You train a linear classifier on 1000 training points and discover accuracy is only 50%. Which of the following if done in isolation has a good chance of improving training accuracy? (multi)
1. Add new features
2. Train on more data
3. Train on less data

A
  1. Add new features
  2. Train on less data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In supervised learning, training data includes:
1. Output
2. Input
3. Both
4. None

A

Both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You are given rows of a few Netflix series marked as positive, negative, or neutral. Classifying reviews of a new Netflix series is an example of:
1. Supervised Learning
2. Unsupervised Learning
3. Semisupervised Learning
4. Reinforcement Learning

A

Supervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which of the following is the second stage in NLP?
1. Discourse analysis
2. Syntactic analysis
3. Semantic analysis
4. Pragmatic analysis

A

Syntactic analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Text summarization finds the most informative sentences in which of the following:
1. Video
2. Sound
3. Image
4. Document

A

Document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is the XOR problem exceptionally interesting to researchers?

A

Because it is the simplest linearly inseparable problem that exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following gives non-linearity to a neural network?
1. Convolution
2. Stochastic gradient descent
3. Sigmoid activation function
4. Non-zero bias

A

Sigmoid activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A matches the start of the string and B matches the end:
1. A = ^, B = $
2. A = $, B = ^
3. A = $, B = ?
4. A = ?, B = ^

A

A = ^, B = $

20
Q

If we use K-means on a finite set of examples, which of the following is true:
1. K-Means is not guaranteed to terminate
2. K-Means is guaranteed to terminate but is not guaranteed to find the optimal clustering
3. K-Means is guaranteed to terminate and find the optimal clustering
4. None of the above

A

K-Means is guaranteed to terminate but is not guaranteed to find the optimal clustering

21
Q

Given a sound clip of a person speaking, the textual representation of the speech can be determined by what?

A

Speech-to-text

22
Q

Naive Bayes Requires:
1. Categorical Values
2. Numerical Values
3. Either 1 or 2
4. Both 1 and 2

A

Categorical values

23
Q

Which of the following are the most widely used metrics and tools to assess a classification model?
1. Confusion matrix
2. Precision
3. Area under the ROC curve
4. All of the above

A

All of the above

24
Q

In a classification problem if, according to the hypothesis, output should be positive but it is negative it is said to be:
1. False positive
2. False negative
3. Consistent hypothesis
4. None of the above

A

False negative

25
Q

In a simple MLP model with 8 neurons in the input layer, 5 neurons in the hidden layer, and 1 neuron in the output layer. What is the size of the weight matrices between the hidden output layer and the input hidden layer
1. 1x5, 5x8
2. 5x1 8x5
3. 8x5 5x1
4. 8x5 1x5

A

5x1 8x5

26
Q

_____ is a high level API built on tensorflow
1. PyBrain
2. Keras
3. PyTorch
4. Theano

A

Keras

27
Q

The classification boundary realized by the perceptron is:
1. Parabola
2. Straight line
3. Circle
4. Ellipse

A

Parabola

28
Q

How do we calculate the hidden layer input for a multi-layer perceptron that is to be inserted into the activation function?

A

hidden_layer_input = matrix_dot_product(X, wh) + bh
Where X is the input matric, wh is the weight matrix, and bh is the bias matrix

29
Q

What is the sigmoid activation function?

A

1 / (1 + e^(-x))

30
Q

How do we calculate the slope or gradient of hidden and output layer neurons?

A

Calculating the derivatives of non-linear activation functions at each layer for each neuron. Computing the derivative of the function at the output value

31
Q

how do we calculate the error gradient?

A

Eg = dEt / dw
Where Eg is the error gradient and dEt/dw is the partial derivative of the total error with respect to weight

32
Q

How do we calculate the change factor (delta) at the output later?

A

d_output = Eg * slope_output
Where Eg is the error gradient

33
Q

What are the steps in one epoch of training a multi-level perceptron?

A
  1. Forward propagation
  2. Compute the loss
  3. Backward propagation
  4. Update the weights
34
Q

What happens in forward propagation?

A

Input data is passed through the input layer. Each neuron computes the weighted sum of inputs and passes it through its activation function to get an output. This output is passed to the next layer of neurons and the process is repeated until the output layer which is the predicted output of the network

35
Q

What happens in backward propagation?

A

Once loss is computed, the error is propagated back through the network. We take the derivative of the loss function with respect to the output of each neuron in the layer. We multiply this by the derivative of the activation function to get the delta value. This value serves as the input to the previous layer and we repeat until we’re back to the input layer

36
Q

What happens when we update the weights of the MLP after backward propagation?

A

We update weights using an optimization algorithm like stochastic gradient descent to minimize the loss. The amount that weights are updated depends on the given learning rate which is a hyperparameter

37
Q

What is pooling in a convolutional neural network?

A

We take the input matrix and replace each non-overlapping 2x2 block with the maximum in that submatrix. The purpose is to reduce the size of output from the convolutional layer while retaining the most important information. It reduces the number of parameters and prevents overfitting

38
Q

What happens in a convolutional layer of a convolutional neural network

A

A filter is moved across the input image to detect patterns and features. The filter consists of weights that are adjusted during training. The convolutional later reduces the size of the input data while extracting important features

39
Q

What is stochastic gradient descent?

A

Used to adjust the parameters of a model to make more accurate predictions. It calculates the gradient of the error function (how far off the predictions are) on a small subnet of data and uses this to update the model’s parameters. This is repeated until the error is at a minimum

40
Q

What is data augmentation?

A

Adds Gaussian noise around each data point to the inputs while leaving the output unchanged. It improves the generalization ability of the model by exposing it to more variation in the input data so it is more robust to variations

41
Q

What are the 3 steps to text preprocessing in natural language processing?

A

Noise removal, lexicon normalization, object standardization

42
Q

What is noise removal in NLP?

A

Removing pieces of text which are not relevant to the context of the data. A general approach is to make a dictionary of “noisy” entities and eliminate tokens that are within that dictionary

43
Q

What are the most common lexicon normalization practices?

A

Stemming, lemmatization

44
Q

What is lemmatization?

A

An organized procedure of obtaining the root form of a word to reduce inflections of variant forms to the base form.
Ex: am, are, is -> be

45
Q

How do you calculate the inverse document frequency for a set of documents?

A

IDF = log(total # of docs / # of docs containing word W)

46
Q

What is the formula for the TF-IDF score?

A

w = tf * log(N/df)