{ "@context": "https://schema.org", "@type": "Organization", "name": "Brainscape", "url": "https://www.brainscape.com/", "logo": "https://www.brainscape.com/pks/images/cms/public-views/shared/Brainscape-logo-c4e172b280b4616f7fda.svg", "sameAs": [ "https://www.facebook.com/Brainscape", "https://x.com/brainscape", "https://www.linkedin.com/company/brainscape", "https://www.instagram.com/brainscape/", "https://www.tiktok.com/@brainscapeu", "https://www.pinterest.com/brainscape/", "https://www.youtube.com/@BrainscapeNY" ], "contactPoint": { "@type": "ContactPoint", "telephone": "(929) 334-4005", "contactType": "customer service", "availableLanguage": ["English"] }, "founder": { "@type": "Person", "name": "Andrew Cohen" }, "description": "Brainscape’s spaced repetition system is proven to DOUBLE learning results! Find, make, and study flashcards online or in our mobile app. Serious learners only.", "address": { "@type": "PostalAddress", "streetAddress": "159 W 25th St, Ste 517", "addressLocality": "New York", "addressRegion": "NY", "postalCode": "10001", "addressCountry": "USA" } }

Week 2 Flashcards

(29 cards)

1
Q

Defining characteristic of DNN

A

More than 1 hidden layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Advantages of SGD

A

Efficient for large sample
Implementable numerically
Can be ‘controlled’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RNN

A

Recurrent Neural networks sequentially feed output back into network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Connection from FNN to RNN

A

RNNs can be reduced to FNNs by UNFOLDING them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

First NN

A

McCulloch and Pitts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Perceptron & problems of

A

Rosenblatt ‘58

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What started AI winter

A

‘69 Minsky showed XOR couldn’t be replicated by perceptron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

FULLY CONNECTED

A

If all entries of each L_i in NN are non zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Universal approximation property

A

Let g:R -> R be a measurable function such that:
a) g is not a polynomial function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define FNN

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Differences between (hyper)params

A

Hyper:
Set by hand
Features

Params:
Chosen by machine (weights and biases)
Optimised by SGD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Architecture of network

A

Hyperparameters and Activatuon Functions (things chosen by you)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Dense layer

A

Entire layer is connected (non zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Number of parameters that characterise N

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Adding units

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Continuity and differentiability of NN

A

Bottom line:
If every activation function is continuous, then so to is the NN

The NN is overall as differentiably continuous as the LEAST differentiable activation function

17
Q

One dimensional activation functions (entire table)

18
Q

Dead ReLU problem & solution

A

A layer of ReLU activations receives only negative values -> producing constant output

This can freeze gradient based algorithms

Therefore use leaky ReLU or Parametric ReLU (or ELU)

Usually 0 < α < 1

19
Q

Multi dimensional activation

20
Q

When is identity function useful

21
Q

Limitation of Heaviside

A

As is non continuous, can’t be used in gradient based algorithms

22
Q

Saturating activation functions

A

Output is bounded

Sigmoid, tanh

23
Q

Continually differentiable counterpart of ReLU

24
Q

Boltzmann dist

A

From stat physics

Analogous to Multinomial logistic regression
Which is Standard activation function in image recognition

25
Motivation for max out
Several, simple, convex non linear functions can be expressed as maxima of affine functions eg ReLU(x) = max{0, x} |x| = max{-x, x}
26
Sup Norm
27
Lp norm
28
Def universal approximation property
29
Limitation of UAP
Result is non constructive: It does not tell what the approximating NNs f and h look like, just that they exist Also it is non quantitative: It doesn’t tell how many hidden units are required to create these networks