College 3 Flashcards

1
Q

Define: Occam’s razor

A

Prefer the simplest hypothesis that fits the data.

‘all thing being equal, the simplest solution tends to be the best one’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

x1= 1
x2= 2
x3=3

w0= 0.7,	
w1=  0.5	
w2= -0.5 	
w3 = 1

what will be the output of the perceptron?

A

0.5 -1 + 3 +0.7 = 3.2 > 0 so 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the main drivers of the breakthrough in deep learning?

A
  • data
  • computation
  • machine learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are flaws in deep learning?

A

Adding carefully crafted noise to a picture can create a new image that people would see as identical, but which a DNN sees as utterly different. In this way, any starting image can be tweaked so a DNN misclassifies it as any target image a researcher chooses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are strengths in deep learning?

A
  • ability to integrate information from huge heterogeneous sources
  • ability to predict
  • ability to detect / recognise
  • ability to discover patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are weaknesses in deep learning?

A
  • DL algorithms lack ‘common sense’
  • DL cannot put events into their context
  • DL depends critically on the quality of the underlying statistics/data
  • DL is opaque
  • DL is ill-understood
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define: Amara’s law

A

We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to deal with AI?

A
  • ignore the hype, focus on the concrete impact for your domain
  • without human, AI algorithms are of no value
  • Many applications will see a performance boost
  • Many domains/jobs will transform dramatically (also in science)
  • AI algorithms offer tremendous powerful tools but lack proper understanding.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Tishby’s information plane say?

A

The first hidden layer has a lot of information about the input as well as the output and the last hidden layer has little information on both (at initialisation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two phases in deep learning mechanics?

A
  1. mapping input to output (fast).

2. getting rid of noise (slow) compressing representations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the relationship between deep learning and gaussian processes.

A

Dropout – a technique that’s been in use in deep learning – can give us principled uncertainty estimates. These uncertainty estimates basically approximate those of our Gaussian process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define: lottery ticket hypothesis

A

If you want to win the lottery, you should buy all the tickets
There are many pathways from input to output, there is always one winning ticket, so try all the options.
This is an explanation why DL is successful and why you use so many parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does network pruning work (standard approach)?

A
  1. Train the network (randomly initialise weights: Wr)
  2. Remove superfluous network (i.e., small weights)
  3. Fine-tune the network

option: repeat steps 2 and 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is step 3 in the alternative approach to network pruning?

  1. Train the network
  2. Remove superfluous network (i.e., small weights)
  3. ?
A

Retrain the network by initialising with Wr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

After pruning, subnetworks can do the job. What is the advantage of using subnetworks?

A
  1. The subnetworks are typically between 1% - 15% of the original size
  2. They require considerably less training time.
  3. they perform at or around the same level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is overparameterization useful?

A

You need to keep the original initialisation when pruning because random reinitialisation destroys the advantage of pruning.

17
Q

What does new work say about overparameterization?

A

Given a randomly initialised overparameterized network, find subnetworks that perform that task. Such subnetworks (composed of random weights can even outperform trained ones.

18
Q

What are the conclusions?

A
  1. deep learning networks are overparameterized.
  2. Their learning dynamics consist of two phases
  3. large depth may provide a computational, rather than a representational advantage
  4. Overparameterization may provide a sampling rather than a representational advantage
  5. random subnetworks may outperform trained dense ones.