review Flashcards

1
Q

we come up with prediction in term of proabbility the use it to decide

A

if it 0 or 1 (categorical value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

transformation is non linear so we use

A

odds which is ratio of probablity/ 1- probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

in a regression outcome we are predicitng

A

log odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

baseline goal is to predict

A

wheter observation will be 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

baseline predicts

A

msot frequent outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

which data set to use to find outcome of baseline model

A

training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to build regression tree

A

splitting IV and predict most frequent outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to come up with prediction

A

count the nu,bers of outcome per split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

we choose how to define splits but use it

A

consisntley throught the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to decide where to split

A

First decide what objective (error points that misclassified) to minimize and maximize accuracy, try different points and select one that minimize error or max accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

most cases arent exact algorithim but

A

best found tree not optimal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Annova class is for prediciting variable

A

limitless where as classification is for probability between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

continous is defined wiwthout a threshold how

A

most frewunt or verages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

classfication problem deals with probability either 0 or 1 and when use probabilty always have

A

threshold, speceficty, senstivity –> ROC curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

single regression trees has high variance so prediciatbility will have

A

high variablity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how to fix high variablity issue

A

multiple treees in random forest then make prediciton based on most frequent outcome from multiple trees

17
Q
  • Build 5 trees and once have predictions from these trees then make prediction based on most frequent outcome-> can use idea of thresholds too how
A

if 3/5 trees = 1 then p y =1 is 0.6 then use this number to compare to thresholds
 If threshold is 0.7 predict 0 because 0.6<= 0.7

18
Q

we randomly selct what when build many trees

A

subset of variables. o Uses random row and column to make random and new data frame and based off this it is used to build trees and then repeat for multiple trees to make prediction on outcome

19
Q

N tree is

A

parameter usually between 200-500

20
Q

adjusting parameter doesnt really help becasue

A

default is good so just run algorithim

21
Q

random forest is generalization of regression trees and always perforrms betetr but disaadvantage is

A

less interpratble