Trees Flashcards

(31 cards)

1
Q

what is a main issue with logsitic regression

A

coeffcicent indicaates effect of variable not how decesion is made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

decesions we make in real life are:

A

in sequential order like regression trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

main benefit of regrssion trees

A

is easy to understand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

in trees we know directrion the variable effects the probability but cant tell

A

impact on y variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

we split the data using the IV into

A

yes or no decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

trees doesnt assumes model is

A

linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

adding more splits will

A

increase accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 splits in the tree

A

3 decision tree levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

terminal node is an

A

output not condition ex- will tell you color red or grey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

root node is

A

condition very correlated with y variable - most important one at top

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

we can have 100% accuracy if keep adding splits with no errors but issue is

A

too many variables and leads to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is fix to overfitting issue

A

set lower bond on number of points in each subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

each split divides points into

A

buckets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

if we set minimum bucket size = lower bound thenn we

A

wont split if points in the split is less than minimum bucket size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

buckets only tell you

A

SIZE OF SPLit not the outcome of the bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

if bucket size too large then

A

model is too simple and will have poor accuracy

17
Q

logistic regression bucket size equivalent to

A

observations and predicts most frequent # or the baseline model

18
Q

in classfication model use

A

majority vote

19
Q

categorical variables/ classficiation are discrete variables what would using continous variables be

A

take average of all numbers, most frequent -> ex how many cars sold

20
Q

regression trees are easy to understand issue is when using single tree

A

we will observe errors significsantlly higher

21
Q

single trees are unstable meaning

A

small change in data= different trees/ interprattion

22
Q

how to fix issue of trees

23
Q

random forest generate multiple trees instead of 1 then

A

takes average to come up with predicted values. -> if 5 trees then need 5 different data sets

24
Q

how does forest select trees

A

randomly select rows with replacement (same row can be selected )

25
why do we select rows randomly with replacement
because trees highly sensitive to small changes so we change data slighlty to make regression tree and then by taking an average reduces the variance in the oredicted value
26
minimum bucket in forests is called
Node size
27
other parameter in trees
number of trees
28
default number of trees in r
500
29
`more trees is better because
unbiased
30
why would more trees = issue
computationally dificult
31
dont have to worry about what in forests
overfitting