Trees Flashcards

1
Q

what is a main issue with logsitic regression

A

coeffcicent indicaates effect of variable not how decesion is made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

decesions we make in real life are:

A

in sequential order like regression trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

main benefit of regrssion trees

A

is easy to understand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

in trees we know directrion the variable effects the probability but cant tell

A

impact on y variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

we split the data using the IV into

A

yes or no decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

trees doesnt assumes model is

A

linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

adding more splits will

A

increase accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 splits in the tree

A

3 decision tree levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

terminal node is an

A

output not condition ex- will tell you color red or grey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

root node is

A

condition very correlated with y variable - most important one at top

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

we can have 100% accuracy if keep adding splits with no errors but issue is

A

too many variables and leads to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is fix to overfitting issue

A

set lower bond on number of points in each subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

each split divides points into

A

buckets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

if we set minimum bucket size = lower bound thenn we

A

wont split if points in the split is less than minimum bucket size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

buckets only tell you

A

SIZE OF SPLit not the outcome of the bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

if bucket size too large then

A

model is too simple and will have poor accuracy

17
Q

logistic regression bucket size equivalent to

A

observations and predicts most frequent # or the baseline model

18
Q

in classfication model use

A

majority vote

19
Q

categorical variables/ classficiation are discrete variables what would using continous variables be

A

take average of all numbers, most frequent -> ex how many cars sold

20
Q

regression trees are easy to understand issue is when using single tree

A

we will observe errors significsantlly higher

21
Q

single trees are unstable meaning

A

small change in data= different trees/ interprattion

22
Q

how to fix issue of trees

A

forest

23
Q

random forest generate multiple trees instead of 1 then

A

takes average to come up with predicted values. -> if 5 trees then need 5 different data sets

24
Q

how does forest select trees

A

randomly select rows with replacement (same row can be selected )

25
Q

why do we select rows randomly with replacement

A

because trees highly sensitive to small changes so we change data slighlty to make regression tree and then by taking an average reduces the variance in the oredicted value

26
Q

minimum bucket in forests is called

A

Node size

27
Q

other parameter in trees

A

number of trees

28
Q

default number of trees in r

A

500

29
Q

`more trees is better because

A

unbiased

30
Q

why would more trees = issue

A

computationally dificult

31
Q

dont have to worry about what in forests

A

overfitting