Machine Learning Flashcards

(200 cards)

1
Q

Which of the following are the advantages of transformers over a recurrent sequence model?
a) better at learning long-range dependencies
b) Slower to train and run-on modern hardware
c) require many fewer parameters to achieve similar results
d) none of the above

A

a) better at learning long-range dependencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of these parts of the self-attention operation are calculated by passing inputs through MLP?
a) values
b) keys
c) queries
d) all the above

A

d) all the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the field of natural language processing (NLP)?
a) computer science
b) artificial intelligence
c) linguistics
d) all of the mentioned

A

d) all of the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main challenge/s of NLP?
a) handling ambiguity of sentences
b) handling tokenization
c) handling pos-tagging
d) All of the mentioned

A

a) handling ambiguity of sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is machine translation?
a) Converts one human language to another
b) Converts human language to machine language
c) Converts any human language to English
d) Converts machine language to human language

A

a) Converts one human language to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

choose from the following areas where NLP can be useful.
a) automatic text summarization
b) automatic question-answering systems
c) information retrieval
d) all the mentioned

A

d) all the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following properties will a good position encoding ideally have?
a) unique for all positions
b) relative distances are independent of absolute sequence position
c) well-defined for arbitrary sequence lengths
d) all the above

A

d) all the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following includes the major tasks of NLP?
a) automatic summarization
b) discourse analysis
c) machine translation
d) all the mentioned

A

d) all the mentioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Neural machine translation was based on encoder-decoder _____
a) RNNs
b) LSTMs
c) both a & b
d) neither a & b

A

c) both a & b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The encoder LSTM is used to process the _____ sentence.
a) input
b) output
c) function
d) All the above

A

a) input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the type of autoencoder?
a) Supervised neural network
b) unsupervised neural network
c) semi-supervised neural network
d) reinforcement neural network

A

b) unsupervised neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of data can the autoencoder apply dimensionality reduction on?
a) linear data
b) nonlinear data
c) both a & b
d) none of the above

A

c) both a & b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A module that compresses data into an encoded representation that is typically several orders of magnitude smaller than the input data.
a) The encoder
b) Bottleneck
c) The decoder
d) None of the above

A

a) The encoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

a module that contains the compressed knowledge representation and considers the most important part of the autoencoder network?
a) the encoder
b) bottleneck
c) the decoder
d) None of the above

A

b) bottleneck

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A module that helps the network “decompress” the knowledge representations and reconstructs the data back from its encoded form.
a) input layer
b) bottleneck
c) output layer
d) none of the above

A

c) output layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of autoencoders work by penalizing the activation of some
neurons in hidden layers?
a) Sparse autoencoder
b) Variational autoencoder
c) Deep autoencoder
d) Convolution autoencoders

A

a) Sparse autoencoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which of the following is done by a deep autoencoder?
a) image reconstruction
b) image colorization
c) image search
d) image denoising

A

c) image search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following is done by a convolution autoencoder?
a) data compression
b) image search
c) information retrieval
d) image colorization

A

d) image colorization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which of the following is an autoencoder application?
a) watermark removing
b) dimensionality reduction
c) image generation
d) all the above

A

d) all the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which autoencoder doesn’t require reducing the bottleneck nodes?
a) sparse autoencoder
b) deep autoencoder
c) variational autoencoder
d) None of the above

A

a) sparse autoencoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

in NLP, bidirectional context is supported by which of the following embedding
a) WORD2VEC
b) BERT
c) GLOVE
d) All the above

A

b) BERT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

For a given token, its input representation is the sum of embedding from the token, segment, and position
a) ELMO
b) GPT
c) BERT
d) none of the above

A

c) BERT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

BERT Base Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48

A

a) 12

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

BERT large Contains _____ encoder layers
a) 12
b) 24
c) 36
d) 48

A

b) 24

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
BERT aims at tackling various NLP tasks such as _____ a) question answering b) language inference c) text summarization d) all of the mentioned
d) all of the mentioned
26
The BERT model is pre-trained on relatively generic tasks a) masked language modeling (MLM) b) next sentence prediction c) a and b d) none of the mentioned
c) a and b
27
_______ Is to hide a word in a sentence and then have the program predict what word has been hidden (masked) based on the hidden word's context. a) Masked language modeling (MLM) b) Next sentence prediction c) Sequence classification d) Named entity recognition (NER)
a) Masked language modeling (MLM)
28
_______ is to have the program predict whether two given sentences have a logical, sequential connection or whether their relationship is simply random a) Masked language modeling (MLM) b) Next sentence prediction c) Sequence classification d) Named entity recognition (NER)
b) Next sentence prediction
29
BERT Can process text ______ a) left-to-right b) right-to-left c) both d) none of the mention
c) both
30
BERT was created and published in 2018 By ______ a) Amazon b) Microsoft c) IBM d) Google
d) Google
31
What is the difference between CNN and ANN? a) CNN has one or more layers of convolution units, which receive its input from multiple units. b) CNN uses a simpler algorithm than ann. c) They complete each other, so to use ANN, you need to start with CNN. d) CNN is the easiest way to use neural networks.
a) CNN has one or more layers of convolution units, which receive its input from multiple units.
32
The data fed into the model and output from each layer is obtained. this step is called. a) Feed forward b) Feed backward c) Input layer d) Output layer
a) Feed forward
33
Common types of pooling layers. a) 5 b) 2 c) 3 d) 4
b) 2
34
computes the output volume by the computing dot product between all filters and image patches. a) Input layer b) Convolution layer c) Activation function layer d) Pool layer
b) Convolution layer
35
What is back propagation? a) it is another name given to the curvy function in the perceptron b) it is the transmission of error back through the network to adjust the inputs c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn d) all of the mentioned
c) it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
36
Which of the following functions can be used as an activation function in the output layer if we wish to predict the probabilities of n classes (p1, p2...pk) such that the sum of p over all n equals 1? a) RELU b) Sigmoid c) Softmax d) Tanh
c) Softmax
37
Which of the following would have a constant input in each epoch of training a deep learning model? a) Weight between input and hidden layer b) Weight between hidden and output layer c) Biases of all hidden layer neurons d) Activation function of output layer
a) Weight between input and hidden layer
38
Which of the following neural network training challenges can be solved using batch normalization? a) overfitting b) underfitting c) training is too slow d) none of the mentioned
c) training is too slow
39
The number of nodes in the input layer is 10 and the hidden layer is 5. the maximum number of connections from the input layer to the hidden layer are? a) 50 b) Less than 50 c) More than 50 d) None of the mentioned
a) 50
40
Is Deep Learning a specialized subset of machine learning? a) true b) false
a) true
41
_____ are models, used to generate data similar to the data on which they are trained, by destroying training data through the successive addition of gaussian noise, and then learning to recover the data by reversing this noising process. a) Federal learning. b) Attention learning. c) CNN. d) Diffusion models.
d) Diffusion models.
42
What is the goal of training a diffusion model? a) Learn the reverse process b) Learn to understand the image c) Extract the image features d) Classify the images
a) Learn the reverse process
43
one of the benefits of the diffusion model is _____ a) scalability b) not requiring adversarial training. c) parallelizability d) all of the above.
d) all of the above.
44
in general diffusion model consist of _____ main process a) 5 b)4 c) 3 d)2
d)2
45
A diffusion model is trained by finding the reverse Markov transitions that the likelihood of the training data. a) Maximize b) Minimize. c) Increase. d) Decrease.
a) Maximize
46
for the reverse process in the diffusion model, we much choose the _____ a) the Sobel filter b) Laplacian operator c)thresholding method d)the gaussian distribution parameterization
d)the gaussian distribution parameterization
47
the transition distributions in the Markov chain are gaussian, where the forward process requires a ______, and the reverse process parameters are learned. a) variance schedule b) Laplacian operator. c)the gaussian distribution parameterization. d)none of the mentioned.
d)none of the mentioned.
48
our diffusion model is parameterized as a Markov chain, meaning that our latent variables depend only on the _____ timestep a) previous or following b) previous c) following d) none of the mentioned
a) previous or following
49
a _____ is used to obtain log-likelihoods across pixel values as the last step in the reverse diffusion process. a) kl divergences b) simplified training objective c) u-net-like. d) discrete decoder.
d) discrete decoder.
50
diffusion models can be applied to a) image denoising b) super-resolution. c) image generation. d) all of the above.
d) all of the above.
51
What is the main goal of federated learning? a) to train a single machine learning model on a centralized dataset b) to train multiple machine learning models on decentralized datasets c) to train a single machine learning model on decentralized datasets d) to train multiple machine learning models on a centralized dataset
c) to train a single machine learning model on decentralized datasets
52
How does federated learning differ from traditional machine learning? a) federated learning requires less data b) federated learning requires more computational resources c) federated learning requires less communication bandwidth d) federated learning requires more data privacy concerns
d) federated learning requires more data privacy concerns
53
What is an advantage of federated learning compared to traditional centralized training? a) it is more accurate b) it is faster c) it requires less data d) it allows for decentralized data to be used
d) it allows for decentralized data to be used
54
How is data privacy protected in federated learning? a) data is encrypted before being sent to the centralized server b) data is never shared with any other parties c) data remains on the individual devices and is only used for model training d) data is aggregated and anonym zed before being used for model training
c) data remains on the individual devices and is only used for model training
55
In federated learning, who is responsible for training the model? a) a centralized server b) a third-party organization c) individual clients d) the data owner
c) individual clients
56
key benefits of federated learning……. a) it involves more diverse data. b) it’s secure. c) it yields real-time predictions. d) all of the above
d) all of the above
57
What are the challenges of federated learning? a) efficient communication across the federated network. b) managing heterogeneous systems in the same networks. c) privacy concerns and privacy-preserving methods. d) all of the above
d) all of the above
58
How does federated learning work? a) Transfer of weights and biases to cloud server b) Transfer of data to cloud server c) Transfer of model to cloud server d) Transfer of user info to cloud
a) Transfer of weights and biases to cloud server
59
Is federated learning more efficient than standard ml techniques for a large number of devices? a) True b) False c) Depends on use case d) Cannot say
a) True
60
federated learning is ______ a) Supervised b) Unsupervised c) Reinforcement learning. d) None of the above
b) Unsupervised
61
What is the basic concept of recurrent neural network? a) use a loop between inputs and outputs in order to achieve the better prediction. b) use recurrent features from dataset to find the best answers. c) use previous inputs to find the next output according to the training set. d) use loops between the most important features to predict next output.
c) use previous inputs to find the next output according to the training set.
62
The other RNN´s issue is called 'vanishing gradients'. what is that? a) when the values of a gradient are too small and the model joins in a loop because of that. b) when the values of a gradient are too big and the model stops learning or takes way too long because of that. c) when the values of a gradient are too small and the model stops learning or takes way too long because of that. d) when the values of a gradient are too big and the model joins in a loop because of that.
c) when the values of a gradient are too small and the model stops learning or takes way too long because of that.
63
LSTM. What is that? a) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between b) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is not recommended to use it, unless you are using a small dataset. c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between d) LSTM networks are an extension for recurrent neural networks, which basically shorten their memory. therefore, it is well suited to learn from important experiences that have very low time lags in between
c) LSTM networks are an extension for recurrent neural networks, which basically extends their memory. therefore, it is well suited to learn from important experiences that have long-time lags in between
64
The network that involves backward links from output to the input and hidden layers is called _________ a) self-organizing maps b) perceptron c) recurrent neural network d) multi layered perceptron
c) recurrent neural network
65
RNNs Stands for? a) Recurrent neural networks b) Report neural networks c) Receives neural networks d) Recording neural networks
a) Recurrent neural networks
66
What is the activation function used in forget gate? a) Sigmoid b) Tanh c) RELU d) None of the above
a) Sigmoid
67
How Many Gates Are There In LSTM? a) 3 b) 5 c) 4 d) 2
a) 3
68
……, When the points in the dataset are dependent on the other points in the dataset. a) continuous data b) discrete data c) sequential data d) ordinal data
c) sequential data
69
……… helps to identify important elements that need to be added to the cell state. a) Forget gate b) Input gate c) Output gate d) None of the above
b) Input gate
70
LSTM used in …… a) speech recognition b) music composition c) time series prediction d) all of the above
d) all of the above
71
what should be the aim of training procedure in Boltzmann machine of feedback networks? a) to capture inputs b) to feedback the captured outputs c) to capture the behavior of system d) none of the mentioned
d) none of the mentioned
72
What consist of Boltzmann machine? a) fully connected network with both hidden and visible units b) asynchronous operation c) stochastic update d) all of the mentioned
d) all of the mentioned
73
by using which method, Boltzmann machine reduces the effect of additional stable states? a) No such method exists b) Simulated annealing c) Hopfield reduction d) None of the mentioned
b) Simulated annealing
74
for which another task can Boltzmann machine be used? a) pattern mapping b) feature mapping c) classification d) pattern association
d) pattern association
75
Presence of false minima will have what effect on probability of error in recall? a) Directly b) Inversely c) No effect d) Directly or Inversely
a) Directly
76
What happens when we use mean field approximation with Boltzmann learning? a) It slows down b) It gets speeded up c) Nothing happens d) may speedup or speed down
b) It gets speeded up
77
in Boltzmann learning which algorithm can be used to arrive at equilibrium? a) Hopfield b) mean field c) Hebb d) none of the mentioned
d) none of the mentioned
78
All the visible layers in a restricted Boltzmann machine are not connected to each other. a) True b) False
a) True
79
What are the two layers of a restricted Boltzmann machine called? a) input and output layers b) recurrent and convolution layers c) activation and threshold layers d) hidden and visible layers
d) hidden and visible layers
80
A deep belief network is a stack of restricted Boltzmann machines. a) True b) False
a) True
81
the main and most important feature of RNN is _________. a) visible state b) hidden state c) present state d) None of these
b) hidden state
81
RNN remembers each and every information through________. a) Work b) Time c) Hours d) Memory
b) Time
82
to create a numerical representation of our text-based dataset we generate two lookup table, what are they_____. a) maps character to numbers b) maps numbers back to characters c) identify unique characters present in text d) both a & b
d) both a & b
83
_______occurs when the gradients become very small and tend towards zero. a) Exploding gradients b) Vanishing gradients c) Long short-term memory networks d) Gated recurrent unit networks.
b) Vanishing gradients
84
on what parameters can change in weight vector depend? a) learning parameters b) input vector c) learning signal d) all of the mentioned
d) all of the mentioned
85
________Occurs when the gradients become too large due to back-propagation. a) Exploding gradients b) Vanishing gradients c) Long short-term memory networks d) Gated recurrent unit networks
a) Exploding gradients
86
If a competitive network can perform feature mapping, then what is that network can be called? a) self-excitatory b) self-inhibitory c) self-organization d) none of the mentioned
c) self-organization
87
why do we need biological neural networks? a) to solve tasks like machine vision & natural language processing b) to apply heuristic search methods to find solutions of problem c) to make smart human interactive & user-friendly system d) all of the mentioned
d) all of the mentioned
88
what is auto-association task in neural networks? a) find relation between 2 consecutive inputs b) related to storage & recall task c) predicting the future inputs d) None of the mentioned
b) related to storage & recall task
89
What is unsupervised learning? a) features of group explicitly stated b) number of groups may be known c) neither feature & nor number of groups is known d) none of the mentioned
c) neither feature & nor number of groups is known
90
XLNet Is an ________ language model which outputs the joint probability of a sequence of tokens based on the transformer architecture with recurrence. a) Auto-regressive b) Auto-Negressive c) Objective d) Bidirectional
a) Auto-regressive
91
XLNet Is “Generalized” because it captures bi-directional context by means of a) mechanism called____ a) PLM b) BERT c) TRANSFORMER-XL d) MLM
a) mechanism called____
92
______ Keep track of the position of each token in a sequence (will know why we have this in the later sections) a) pretrain-finetune discrepancy b) transformer-xl c) positional encoding d) segment recurrence
c) positional encoding
93
______ cache the hidden state of first segment in memory in each layer and update attention accordingly. it allows reuse of memory for each segment. a) pretrain-finetune discrepancy b) transformer-xl c) positional encoding d) segment recurrence
d) segment recurrence
94
the attention weights determined by a simple feed forward neural network are____ a) query b) keys c) values d) all of the above
d) all of the above
95
_____ Traditional Methods predict the current token given previous “n” tokens, or predict the current token given all tokens after it. a) Bidirectional b) Masked language modeling c) XLNet d) BERT
a) Bidirectional
96
______Is A Neural Network architecture that can model bidirectional contexts in text data using transformer. a) BERT b) XLNet c) MLM d) PLM
a) BERT
97
A disadvantage of BERT is it corrupts the input with _______ and suffers from pretrain-finetune discrepancy. a) Mask b) PLM c) MLM d) All of above
a) Mask
98
XLNet Is the latest and greatest model to emerge from the booming field of natural language processing (NLP) a) True b) False
a) True
99
XLNet Is “Generalized” a) True b) False
a) True
100
The Attention Learning mechanism has changed the way we work with deep learning algorithm a) true b) false
a) true
101
the advantage of transformers over recurrent sequence model is slower to train and run on Modern Hardware a) true b) false
b) false
102
Fields like NLP and Computer Vision have been revolutionized by the attention mechanism a) true b) false
a) true
103
Attention Learning is an Interface connecting the Encoder and Decoder that provides the Decoder with Information a) true b) false
a) true
104
the encoder LSTM or RNN units produce the words in a sentence one after another a) true b) false
b) false
105
The Encoder reads the input sentence and tries to make sense of it a) true b) false
a) true
106
The LSTM is supposed to capture the Long-Range dependency better than the RNN a) true b) false
a) true
107
RNNs Can’t remember longer sentences and sequences a) true b) false
a) true
108
If the Encoder makes a bad summary, the translation will be also bad a) true b) false
a) true
109
the Decoder is used to process the entire input sentence and decode it into a Context Vector a) true b) false
b) false
110
autoencoders belong to supervised neural networks a) true b) false
b) false
111
Bottleneck Is the most important part of the network a) true b) false
a) true
112
Convolution Autoencoders Can Do Image Reconstruction a) true b) false
a) true
113
Deep Autoencoder is composed of two, Symmetrical Deep-Belief networks a) true b) false
a) true
114
Deep Autoencoders can’t do image search a) true b) false
b) false
115
Sparse Autoencoders Offer us an alternative method for introducing an information Bottleneck without requiring a reduction in the number of nodes a) true b) false
a) true
116
Sparse Autoencoders work by penalizing the activation of Neurons in input layer a) true b) false
b) false
117
Autoencoders can De-Noise images a) true b) false
a) true
118
Autoencoders can’t be used to reduce dimensionality a) true b) false
b) false
119
the Encoder module that helps the network "decompress” the knowledge representations and reconstructs the data back from its encoded form a) true b) false
b) false
120
BERT (bidirectional encoder representation from transformers) is a recent paper published by researchers at Amazon AI Language? a) true b) false
b) false
121
BERT doesn’t read the text input sequentially? a) true b) false
b) false
122
BERT Allows Transform Learning on the existing pretrained models and hence can be custom trained for the specific subject. a) true b) false
a) true
123
In BERT, The relationship between all words in a sentence is Modeled Irrespective of their position. a) true b) false
a) true
124
BERT uses unidirectional language model for producing word embedding. a) true b) false
b) false
125
BERT is not an open-source machine learning framework for NLP? a) true b) false
b) false
126
BERT not understand human language as it is spoken naturally. a) true b) false
b) false
127
BERT is expected to have large impact on voice search as well as text-based search. a) true b) false
b) false
128
same word can have multiple word embedding possible with BERT a) True b) False
b) False
129
BERT is a deep bidirectional, supervised language representation a) true b) false
b) false
130
pooling is an up-sampling operation that reduces the Dimensionality of the Feature Map. a) true b) false
b) false
131
The RELU operation is applied to each Pixel and replaces all the negative Pixel values in the Feature Map with Zero a) true b) false
a) true
132
Pooling Or Spatial Pooling Layers: Also Called Sub-Sampling a) true b) false
a) true
133
Pooling reduces the Dimensionality of each feature map by retaining the most important information a) true b) false
a) true
134
the aim of the fully connected layer is to use the low-level feature of the input mage produced by Convolutional and Pooling Layers a) true b) false
b) false
135
The Hyperparameters for a Pooling Layer are Filter Size, Stride and max or average Pooling a) true b) false
a) true
136
When we apply a filter of 1×1, then there is no reduction in the size of the image and hence there is no loss of information. a) true b) false
a) true
137
flattening means that every Neuron in the previous layer is connected to each Neuron in the next layer a) true b) false
b) false
138
RELU introduces linearity to the network, and the generated output is a Rectified Feature Map a) true b) false
b) false
139
Convolutional Layer Receives a set of Input Feature Maps (IFM) and generates a set of Output Feature Maps (OFM). a) true b) false
a) true
140
diffusion models work by destroying training data through the successive addition of Laplacian noise, and then learning to recover the data by reversing this noising process. a) true b) false
b) false
141
A Discrete Decoder is used to obtain Log likelihoods across Pixel values as the last step in the Reverse Diffusion process. a) true b) false
a) true
142
diffusion model is a latent variable model which maps to the latent space Sobel using a Fixed chain. a) true b) false
b) false
143
The goal of training a Diffusion model is to learn the reverse process a) true b) false
a) true
144
the transition distributions in the Markov chain are Gaussian, which depends only on the forward process. a) true b) false
b) false
145
Diffusion model is parameterized as a Markov chain, meaning that our latent variables x1, … xt depend only on the previous (or following) timestep. a) true b) false
a) true
146
for the reverse process in the Diffusion model, we much choose a variance schedule. a) true b) false
b) false
147
The transition distributions in the Markov chain are Gaussian, where the forward process requires a variance schedule, and the reverse process parameters are learned. a) true b) false
a) true
148
cascade Diffusion models (like Stable Diffusion) apply the Diffusion process on a smaller latent space for computational efficiency using a Variational Autoencoder for the up and down sampling. a) true b) false
b) false
149
Diffusion Models can be applied to image De-Noising, Inpainting, Super Resolution, and Image Generation. a) true b) false
a) true
150
Federated Learning is not used to improve the privacy and security of machine learning models. a) true b) false
b) false
151
Federated Learning requires the use of a centralized server. a) true b) false
b) false
152
Federated Learning can’t be used to train models on data that is distributed across multiple devices, such as Smartphones or IoT devices. a) true b) false
b) false
153
Federated Learning requires the use of a centralized database. a) true b) false
b) false
154
Federated Learning can’t be used to improve the privacy of machine learning models by keeping sensitive data on individual devices. a) true b) false
b) false
155
Federated Learning is a Type of machine learning that allows multiple parties to train a model without sharing their data. a) true b) false
a) true
156
Federated Learning requires participating devices to have high computational power. a) true b) false
b) false
157
Federated Learning enables Participants to train local models cooperatively on local data without disclosing sensitive data to a central cloud server a) true b) false
a) true
158
Federated Learning can’t be used to train deep learning models. a) true b) false
b) false
159
Federated Learning can be used to train models on data that is distributed across multiple devices in real-time. a) true b) false
a) true
160
In Sequential Data, the points in the dataset are dependent on the other points in the dataset. a) true b) false
a) true
161
A Timeseries is a common example of Sequential Data, with each point reflecting an observation at a certain point in time. a) true b) false
a) true
162
the crucial element to remember about sequence models is that the data we’re working with are Independently and Identically Distributed (I.I.D.) samples. a) true b) false
b) false
163
Sequence models are the machine learning models that input or output sequences of data. a) true b) false
a) true
164
structured data includes text streams, audio clips, video clips and time-series data. a) true b) false
b) false
165
the conventional feedforward artificial neural networks can deal with sequential data and can be trained to hold knowledge about the past. a) true b) false
b) false
166
traditional RNNs are very excellent at capturing Long-Range dependencies. a) true b) false
b) false
167
LSTMs Are explicitly designed to avoid the Long-Term Dependency problem. a) true b) false
a) true
168
input gate controls what information should be forgotten. a) true b) false
b) false
169
input gate helps to Identify important elements that need to be added to the cell state. a) true b) false
a) true
170
RBMs are a supervised learning technique a) true b) false
b) false
171
RBM isn’t restricted to have only the connections between the visible and the hidden units a) true b) false
b) false
172
RBM performs discriminative learning similar to what happens in a classification problem a) true b) false
b) false
173
If number of visible nodes = nV, number of hidden nodes = nH, then number of connections in RBM = nV* nH a) true b) false
a) true
174
Boltzmann machines are non-deterministic generative deep learning models with 3 types of nodes: visible, hidden and output nodes a) true b) false
b) false
175
Boltzmann machines Fall into the class of unsupervised learning. a) true b) false
a) true
176
sparse Autoencoders introduces information Bottleneck by reducing the number of nodes at hidden layers. a) true b) false
b) false
177
The idea is to Encourage network to learn an Encoding and Decoding which only relies on activating a small number of neurons. a) true b) false
a) true
178
To implement Undercomplete Autoencoder, constrain the number of nodes present in hidden layer(s) of the neural network. a) true b) false
a) true
179
Autoencoders are not capable of learning nonlinear manifolds (a continuous, non-intersecting surface.) a) true b) false
b) false
180
A Neural Network with multiple hidden layers and Sigmoid nodes can form non-linear decision boundaries. a) true b) false
a) true
181
Neural Networks compute non-convex functions of their parameters. a) true b) false
b) false
182
For Logistic Regression, with parameters optimized using a Stochastic Gradient method, setting parameters to 0 is an acceptable initialization. a) true b) false
a) true
183
For arbitrary Neural Networks, with weights optimized using a Stochastic Gradient method, setting weights to 0 is an acceptable initialization. a) true b) false
b) false
184
Given a design matrix x ∈ r^(n×d) where d << n, if we project our data onto a k dimensional subspace using PCA where k equals the rank of x, we recreate a perfect representation of our data with no loss. a) true b) false
a) true
185
hierarchical clustering methods require a predefined number of clusters, much like k-means. a) true b) false
b) false
186
Given a predefined number of clusters k, globally minimizing the k-means objective function is NP-hard. a) true b) false
a) true
187
a Random Forest is an ensemble learning method that attempts to lower the bias error of decision trees. a) true b) false
b) false
188
bagging algorithms attach weights w1...wn to a set of n weak learners. they re-weight the learners and convert them into strong ones. boosting algorithms draw n sample distributions (usually with replacement) from an original data set for learners to train on. a) true b) false
b) false
189
using cross validation to select Hyperparameters will guarantee that our model does not overfit. a) true b) false
b) false
190
Bidirectionality is Achieved by a phenomenon called “Masked Language Modeling”. a) true b) false
a) true
191
BERT Overcomes this shortcoming; in that it considers previous and next tokens to predict the current token. a) true b) false
a) true
192
XLNet is not the latest and greatest model to emerge from the booming field of natural language processing (NLP). a) true b) false
b) false
193
XLNet is not “generalized” because it captures Bidirectional context by means of a mechanism called “Permutation Language Modeling”. a) true b) false
b) false
194
XLNet is not a generalized Autoregressive model where next token is dependent on all previous tokens a) true b) false
b) false
195
XLNet is the idea of capturing Bidirectional context by training an Autoregressive model on all possible permutation of words in a sentence a) true b) false
b) false
196
XLNet Integrates the idea of Auto-Regressive models and bi-directional context modeling, yet overcoming the disadvantages of BERT a) true b) false
a) true
197
Autoregressive (AR) Language Modeling and Autoencoding (AE) have been the two most successful pretraining objectives. a) true b) false
a) true
198
There are proposed methods used in XLNet like background, objective: permutation language modeling. a) true b) false
a) true
199
For both BERT and XLNet, partial prediction plays a role of reducing optimization difficulty by only predicting tokens with sufficient context. a) true b) false
a) true