10_advanced deep learning concepts Flashcards by Annina Vietze

By which two factors is the performance of a model limited?

architecture-driven limitations:
limited model capacity
improper model initialization
appropriateness of architecture
(inductive biases)
data-driven limitations:
limited amount of data
data quality

How well did you know this?

Not at all

Perfectly

What is meant by “data quality”?

1) appropriateness
(eg highly pixelated image would not be good for image classification task)

2) cleanliness
(how accurate was the labeling done? outliers in the dataset?)

3) generalizability
(are there domain shifts? greyscale training images are useless to train RGB models)

How well did you know this?

Not at all

Perfectly

How can we improve the data quality?

only use appropriate data
clean data
carefully check data for domain shifts

How well did you know this?

Not at all

Perfectly

Why do we need data augmentation?

to increase the size of training dataset synthetically

How well did you know this?

Not at all

Perfectly

What kinds of data augmentation are there?

original
horizontal flip
vertical flip
contrast variations
image blocking

–> can also be combined

How well did you know this?

Not at all

Perfectly

How can models be pre-trained?

Through transfer learning

–> initialize model parameters with those from a model of the same architecture
that was previously trained on similar data

How well did you know this?

Not at all

Perfectly

How can the capacity of the model be improved?

deeper models have more layers than others
wider models have more neurons in a single layer

How well did you know this?

Not at all

Perfectly

What are issues when training large networks?

backpropagation gets more complicated for a large number of network layers:

gradients can vanish (eg with sigmoid function, large positive numbers go to zero)
gradients can explode

How well did you know this?

Not at all

Perfectly

How can we avoid vanishing gradients?

batch normalization (BatchNorm) on every layer

take outputs and normalize them, scale before going through the activation function

How well did you know this?

Not at all

Perfectly

How do we get rid of exploding gradients?

residual connections

only learn the residuals that typically have less extreme gradients –> learn the differences/delta gradients between expected outputs and the output of the layer

How well did you know this?

Not at all

Perfectly

What is ResNets?

takes advantage of residual connections as well as BatchNorm

–> are very deep! up to 101 layers

How well did you know this?

Not at all

Perfectly

How do most supervised tasks work?

discriminative (discriminate between different choices)

How well did you know this?

Not at all

Perfectly

How is the U-Net build?

encoder-decoder = autoencoder architectures

How well did you know this?

Not at all

Perfectly

What is the Code (= bottleneck) between encoder and decoder layers?

goal: a meaningful representation of the data

How well did you know this?

Not at all

Perfectly

What is one way to perform representation learning?

autoencoder

How well did you know this?

Not at all

Perfectly

What are autoencoders used for?

Study These Flashcards

representation learning
data denoising
anomaly detection

What can you do with decoders if they are trained successfully?

Study These Flashcards

use them to generate data from noise

–> standalone decoders can be called generators

What are adversarial attacks?

Study These Flashcards

stack a barely visible rgb noise image on top of another image and the model should not be confused by this

What does GAN stand for?

Study These Flashcards

generative adversarial network

What is the general idea behind a GAN?

Study These Flashcards

have generator G create fake samples and try to trick discriminator D into thinking they are real samples

–> two-player minimax game

How do GANs work? (steps)

Study These Flashcards

1) D tries to maximize objective function (succeeds in identifying real samples) - pursues classification task

2) G tries to minimize objective function (succeeds i generating seemingly real samples)

Training: iterate between training D and G (with backdrop) until D says 50% chance of being real or fake

How do diffusion models work?

Study These Flashcards

the generator (now acting as an encoder) is trained to make sense of increasingly noisy data

–> turn highly noisy latent representations into realistic images

the latent representation is created by a large language model

What is meant by the concept “attention” for CNNs?

Study These Flashcards

which parts of the input data are important?

What does attention in NLP enable?

Study These Flashcards

enable each element of the input sequence to attend to any element of the output sequence

–> transformer models implement this attention mechanism

What are some of the most important learning paradigms?(5)

- supervised/unsupervised learning - transfer learning - semi-supervised/weakly supervised learning - self-supervised learning - continual learning

What is transfer learning?

like supervised learning, but takes learned abilities from earlier tasks to the next

What is semi-supervised learning?

combines a supervised learning process (with a small labeled dataset) with unsupervised methods eg use clustering to label more data before training

What is weakly supervised learning?

related to semi-supervised learning in that it learns to train based on weak labels eg noisy labels (add confidence score to a label)

What is self-supervised learning?

getting data is becoming less & less expensive, but labelling it is still very expensive --> goal is to learn basic representation that can be easily transferred to a given task instead of learning a trivial task, the model is trained to decide whether two samples are identical or not typically relies heavily on data augmentations --> very efficient and time/money-saving to pretrain models!

What is continual learning?

some models have to mitigate domain shifts in the data --> one risk is catastrophic forgetting can be minimized by experience replays!

10_advanced deep learning concepts Flashcards

(30 cards)