Module 7 Flashcards

1
Q

Overview of logistic regression

A
  • Used to estimate the probability that an event will occur as a function of other variables
  • Can be considered a classifier as well
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe Inputs and outputs of logistic regression

A

Input - variables can be continuous or discrete
Output - Set of coefficients that indicate the relative impact of each driver + A linear expression for predicting the log-odds ratio of outcome as function of drivers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

List logistic regression use cases

A
  1. probability of an event
  2. Binary classification
  3. Multi-class classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the goal of logistic regression?

A
  • Predict the true portion of success, pi, at any value of the predictor
  • pi = # of success / # of trials
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe Y X and PI in Binary logistic regression model

A
Y = Binary Response
X = Quantitative predictor
pi = success
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Logistic regression Pros

A
  • Explanatory value
  • Robust
  • Concise
  • Easy to score data
  • returns good probability estimates
  • preserves summary stats of training data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Logistic Regression Cons

A
  • Does not handle missing values well
  • Doesnot work well with discrete drivers with distinct values
  • Cannot handle variables that affect outcome in discontnues way ( step functions)
  • Assumes each var affects log-odds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe Neural Network Concept

A
  • constructed and implemented to model the human brain

- performs pattern matching, classification, etc tasks that are difficult for traditional computers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe an artificial neural network

A
  • posses a large number of processing elements called nodes/neurons operating in parallel
  • neurons connected by link
  • each link has weight regarding input signal
  • each neuron has internal state called activation level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the components of a single-layer neural network

A

Input layer, Hidden layer, output layer, parameters are weights and intercepts are biases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Ak and g(z) in a neural network

A

Ak is activations in the hidden layer
g(z) is called the activation function - popular functions are sigmoid and rectified linear
g(z) are typically non-linear derived features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe details of the output layer in ann and fitting model

A
  • Output activation function encodes softmax function

- Fit model by minimizing cross entropy/ negative multinomial log-likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe how CNN works

A
  • builds up an image in a hierarchical fashion
  • hierarchy is constructed through convolution and pooling layers
  • Edges and shapes are recognized and pieced together to form shapes/target image
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the convolution filter ( learned, score)

A
  • filters are learned during training
  • Input image and filter are combined using the dot product to get a score
  • score is high if sub-image of the input image is similar to filter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the idea of convolution, its result, and the weight in the filters?

A
  • the idea is to find common patterns that occur in different parts of the image
  • Result is a new feature map
  • weights are learned by the network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Pooling and its adv

A
  • each nonoverlapping 2 x 2 block is replaced by maximum
  • sharpens feature identification
  • allows for locational invariance
  • reduces dimensions by a factor of 4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe the architecture of CNN

A
  • many convolve + pool layers
  • filters are typically small (3x3)
  • Each filter creates a new channel in the convolution layer
  • As pooling reduces size, the number of filters/channels increases
18
Q

How to create features X to characterize the document?

A

Use Bag of words

19
Q

What is a bag of words

A
  • Bag of words are unigrams
  • Identify 10K most frequently occurring words
  • create a binary vector of length 10k for each document and score 1 in every position that the corresponding word occurred
20
Q

What is a recurrent neural network?

A
  • builds a model that takes into account the sequential nature of the data and build memory of past
21
Q

What is each observation in RNN and target y

A
  • The feature for each observation is a sequence of vectors
  • Target Y is a single variable such as sentiment or one-hot vector for multiclass
  • Y can also be a sequence
22
Q

Describe architecture of RNN and what does it represent?

A
  • Hidden layer is a sequence of vectors A that receive input X and A -1 that produce output O
  • same weights, W,U,B are used at each step
  • represents an evolving model updated as each element is processed
23
Q

How to increase accuracy for RNN

A

add LTSM - long and short-term memory

24
Q

What is autocorrelation

A

is the correlation of all pairs

25
What is the RNN forecaster similar to
Autoregression procedure
26
When to use deep learning
- image classification, modeling, medical - Speech modeling, language, forecasting - when the signal to noise ratio is high - use simpler models like AR(5) or glmnet if you can
27
When does fitting neural network become difficult
When the objective is the nonconvex - the solution nonconvex functions and gradient descent.
28
implementing non convex functions and gradient descent
- Start with a guess for all parameters and set t = 0 | - Iterate until the objective fails to decrease
29
How to find a direction that points downhill on the gradient descent?
- Use gradient vector/ vector of partial derivatives where p is the learning rate
30
What does Backpropagation use
- R is a sum so the gradient is the sum of gradients | - Backpropagation uses chain rule for differentiation
31
What is slow learning
- Gradient descent is slow and has a low learning rate | - Use early stopping for regularization
32
What is stochastic gradient descent
- rather than using all data, use minibatches drawn at random
33
What is an epoch
count of iterations and amounts to a number of minibatch updates
34
What is Regularization
shrinks weights at each layer, two forms are dropout and augmentation
35
What is dropout learning
- at each update remove units with no probability and scale weights of those retained - other units stand in for those removed
36
What is ridge and data augmentation, what is effective in
- make copies of each (x,y) and add small gaussian noise, not modifying the copies - this makes the fit robust - equivalent to ridge regularization in OLS - effective with SGD
37
What is double descent
- with neural networks better to have too many hidden units than too few - running stochastic gradient descent till zero training error gives a good out-of-sample error - Increasing layers and training to zero error gives better out-of-sample error
38
In a wide linear model ( p > n) what does SGD with small step size lead to ?
minimum norm
39
What is minimum norm
zero residual solution
40
what is similar to the ridge path
Stochastic gradient flow
41
which ratio is less prone to overfitting
high signal-to-noise ratio