Liner regression Flashcards

(70 cards)

1
Q

what is simple linear regression used for

A

Supervised learning;

quantitative response π‘Œ (dependant) on the basis of a single predictor variable (independent) 𝑋

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If 𝑓 is to be approximated by a linear function, then it becomes:
π‘Œ=𝛽0+𝛽1𝑋+πœ–

What does B0 mean

A

B0 is intercept term: the expected value of π‘Œ when 𝑋 = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If 𝑓 is to be approximated by a linear function, then it becomes:
π‘Œ=𝛽0+𝛽1𝑋+πœ–
What does B1 mean

A

B1 is the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the process of linear regression

A
  1. assess the significance of the coefficients
  2. quantify the extent to which the model fits the data

(line of best fit using r squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how is the quality of linear regression assessed

A

using residual standard error.

EG if RSE = 3.26: actual sales in each market deviate from the true regression line by
approximately 3,260 units on average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens when more variable are added to a linear regression model

A

R2 will increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the uncertainties when predicting using a MULTIPLE linear regression model

A

Reducible error: coefficients are only estimates for the true population regression plane

Model bias: linear model (or any other models) for 𝑓(𝑋) is almost always an approximation of reality.

Irreducible error: the response cannot be predicted perfectly because of the
random error πœ– of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assumptions of the linear model

A

Additivity: the effect of changes in a predictor 𝑋 on the response π‘Œ is independent of the values of the other predictors. (no other factors impact)

Linearity: change in the response π‘Œ due to a one-unit change in 𝑋 is constant,
regardless of the value of 𝑋 . 𝑗

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When is linear regression not applicable

A
  1. to order the outcomes eg 1= stroke
  2. if the probability is outside 0-1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is logistic regression

A

Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote (discreet outcome, based on a given dataset of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what makes logistic regression different to linear

A

It is used to make a prediction about a categorical variable instead of a continuous one.

also has a probability between 0-1

logs are categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the negative of logistic regression

A

it needs a large data set to have sufficient statistical power to detect a significant effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the dummy variable approach

A

qualitative predictors with the logistic regression mode.

Dummy variables assign the numbers β€˜0’ and β€˜1’ to indicate membership in any mutually exclusive and exhaustive category

it creates a value of 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is linear discriminant analysis

A

In LDA, we model the distribution of the predictors 𝑋 separately in each of the response classes (i.e. given π‘Œ), and then use Bayes’ theorem to flip these around into estimates forPr π‘Œ = π‘˜ 𝑋 = π‘₯ .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

3 reasons to use LDA

A

When classes are well-separated: parameter estimates for the logistic are surprisingly unstable.

If 𝑛 (data set) is small and the distribution of the predictors 𝑋 is normal
LDA is popular when we have more than two response classes.

it maximises separability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what happens if alpha is too small

A

the optimiser will take a long time to find the minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is exploding gradient.

A

the the slope is vertical so the system will become completely unstable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what causes an exploding gradient.

A

when we have complex models with many para meters and large nural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is a nested function

A

functions that embeds another function. as a result of neural link.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is an activation function

A

function that decides whether information goes from one layer to another.

an example is a step function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the difference between bagging and boosting

go over this one.

A

bagging- multiple models with the same training set

boosting- selecting data points which give wrong predictions.

Each time the data gives a wrong prediction it trains the new model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

explain the tradeoff between accuracy and interpretability

A

increasing training data sets may make result more accurate but less easy to digest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

between random bagging, boosting and random foresting has the most chance of over fitting when adding more data

A

boosting because you increase the likelyhood to overtrain the model and the model becomes less effective at predicting future data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is a recommender system

A

A recommendation system is an AI algorithm, that uses Big Data to suggest or recommend additional products to consumers.

past purchases, search history, demographic information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what does machine learning do
Finds a mathematical formula when applied to a collection of inputs (Β« training data Β») produces the desire outputs.
26
what is machine learning
imput+ desired result computation program
26
what is traditional programming
input+ Programm computation = results
27
what are the different types of unsupervised learning
dimension reduction and clustering.
28
what is dimension reduction
a technique used to reduce the number of features in a dataset while retaining as much of the important information as possible.
29
supervised learning
the training set gives the computer example answers. eg pictures of cats and dogs are already provided
30
what does x mean in an algorithm
Input, also known as features or exogenous variables
31
what does Y mean in an algorithm
Output, also known as label, response or endogenous variable: y
32
what does x1 or y1 mean in an algorithm
to collect historical data from a previous algorithm.
33
what are the two types of supervised learning
classification and regression
34
what is classification (what type of learning is it)
Supervised learning we are trying to predict results which have discrete output (i.e. category or class) eg identifying objects or language
35
types of classification
Logistic Regression Linear (and quadratic) Discriminant Analysis K-Nearest Neighbors RLab: Logistic, LDA, QDA, KNN
36
what is regression
we are trying to predict results which have continuous output. like finding the line of best fit stock prices forecast, correlation analysis, medical diagnosis, demand and sales volume analysis,...
37
semi supervised learning
It uses a small amount of labeled data and a large amount of unlabeled data, which provides the benefits of both UL and SL while avoiding the challenges of finding a large amount of labeled data.
38
how does reinforcement learning work
The goal is to learn a policy which is a function (similar to the model in SL) that takes the feature vector of a state as input and outputs an optimal action to execute in that state. The action is optimal if it maximizes the expected average reward. the policy is constantly being updated
39
what is model-based and what is model-free reinforcement learning?
Model-based means memorizing lots of information Model-free means generalize situation eg he self-driving car doesn't memorize every movement but tries to generalize situations and act rationally while obtaining a maximum reward.
40
what is a scalar
a numerical value like 2
41
what is a vector
is an ordered list of scalar values, called attributes, like π‘Ž = βˆ’2, 5 .
42
what is a matrix
matrix is a rectangular array of numbers arranges in rows and columns. 2 6 βˆ’1 30 βˆ’6 βˆ’3
43
what is a function
A function is a relation that associates each element π‘₯ of a set 𝒳 (the domain of the function) to a single element 𝑦 of another set 𝒴 (the codomain of the function).
44
where is the local minimum found
We say that 𝑓(π‘₯) has a local minimum at π‘₯=𝑐 if 𝑓(π‘₯)β‰₯𝑓(𝑐) for every π‘₯ in some open interval π‘₯ = 𝑐.
45
what is a derivative
a function 𝑓 is a function or a value that describes how fast 𝑓 grows (or decreases).
46
what is differentiation
Differentiation is the process of finding a derivative.
47
what is a discreet random variable
a random variable from a distinct data set like a dice can only be random between 1-6
48
what is a continuous random variable
a random variable from an infinite data set
49
what is bayes rule
Conditional probability = the probability of the random variable π‘Œ = 𝑗 given the observed predictor vector π‘₯0 of the random variable 𝑋:Pr (π‘Œ=𝑗 |𝑋=π‘₯0) = Pr π‘Œ=𝑗𝑋=π‘₯0 Pr(𝑋=π‘₯0) ----------------------------------- Pr(π‘Œ = 𝑗)
50
what are parameters
variables that define the model learned by the learning algorithm (are directly modified by the algorithm based on the training data).
51
what does it mean if a model has a low biased
the model predicts the training data well
52
what does it mean if a model has high biased
model makes many mistakes on the training data. The line of best fit may underfit the data and may consider the general direction of data.
53
what is the solution of high biased and underfitting
Main reasons: - model is too simple for the data (linear regression) - engineered features are not informative enough Main solutions: - try a more complex model - engineer features with higher predictive power
54
describe a model with low variance
Low variance = low sensitivity = performs well on both train and test sets.
55
describe a model with high variance
High variance = high sensitivity = performs well on train but poor on test overfitting
56
what is a training set
BEFORE Analyst feeds the algorithm input data, which corresponds to an expected output. The model evaluates the data repeatedly to learn more about the data’s behavior and then adjusts itself to serve its intended purpose.
57
what is a test set
AFTER the model is built, testing data once again validates that it can make accurate predictions. Test data provides a final, real-world check of an unseen dataset to confirm that the ML algorithm was trained effectively.
58
what are the causes and solution for high variance (overfitting)
Problems model is too complex for the data (deep NN) -too many features but a small number of training examples Solutions -try simpler model - add more training data if possible - regularize the model (more widely used)
59
what is bagging (bootstrap aggregate) and why is it useful
multiple models of the same algorithm with different random training samples it avoids overfitting data
60
what is boosting
selecting data points which give wrong predictions. Each time the data gives a wrong prediction it trains the new model Often causes overfitting
61
what are the 4 types of classification methods
logistic Linear discriminant analysis (maximises distance) QDA K's nearest neighbours
62
if the boundaries are linear what model of classification shall I use
Linear discriminant analysis and logistic regression
63
How to tell is a model is multivariate
if it has more than one x
64
how to tell if a model is linear and additive
the X is ^to the power linear equation will always be in the form of $y = mx + b$
65
what does r1,r2 mean in an algorithm
they are different leaves on a decision tree
66
what happens is alpha is too large on gradient descent
your optimizer will be jumping big leaps and never find the minimum
67
what happens is alpha is too small on gradient descent
it will take forever to find the minimum
68
what is a perceptron
A perceptron takes several binary inputs π‘₯1 , π‘₯2 , π‘₯3 ,... and produces a single binary output as follows:
69