Trees Flashcards by Ryan Riggs

what is a main issue with logsitic regression

coeffcicent indicaates effect of variable not how decesion is made

How well did you know this?

Not at all

Perfectly

decesions we make in real life are:

in sequential order like regression trees

How well did you know this?

Not at all

Perfectly

main benefit of regrssion trees

is easy to understand

How well did you know this?

Not at all

Perfectly

in trees we know directrion the variable effects the probability but cant tell

impact on y variable

How well did you know this?

Not at all

Perfectly

we split the data using the IV into

yes or no decisions

How well did you know this?

Not at all

Perfectly

trees doesnt assumes model is

linear

How well did you know this?

Not at all

Perfectly

adding more splits will

increase accuracy

How well did you know this?

Not at all

Perfectly

3 splits in the tree

3 decision tree levels

How well did you know this?

Not at all

Perfectly

terminal node is an

output not condition ex- will tell you color red or grey

How well did you know this?

Not at all

Perfectly

root node is

condition very correlated with y variable - most important one at top

How well did you know this?

Not at all

Perfectly

we can have 100% accuracy if keep adding splits with no errors but issue is

too many variables and leads to overfitting

How well did you know this?

Not at all

Perfectly

what is fix to overfitting issue

set lower bond on number of points in each subset

How well did you know this?

Not at all

Perfectly

each split divides points into

buckets

How well did you know this?

Not at all

Perfectly

if we set minimum bucket size = lower bound thenn we

wont split if points in the split is less than minimum bucket size

How well did you know this?

Not at all

Perfectly

buckets only tell you

SIZE OF SPLit not the outcome of the bucket

How well did you know this?

Not at all

Perfectly

if bucket size too large then

Study These Flashcards

model is too simple and will have poor accuracy

logistic regression bucket size equivalent to

Study These Flashcards

observations and predicts most frequent # or the baseline model

in classfication model use

Study These Flashcards

majority vote

categorical variables/ classficiation are discrete variables what would using continous variables be

Study These Flashcards

take average of all numbers, most frequent -> ex how many cars sold

regression trees are easy to understand issue is when using single tree

Study These Flashcards

we will observe errors significsantlly higher

single trees are unstable meaning

Study These Flashcards

small change in data= different trees/ interprattion

how to fix issue of trees

Study These Flashcards

forest

random forest generate multiple trees instead of 1 then

Study These Flashcards

takes average to come up with predicted values. -> if 5 trees then need 5 different data sets

how does forest select trees

Study These Flashcards

randomly select rows with replacement (same row can be selected )

why do we select rows randomly with replacement

because trees highly sensitive to small changes so we change data slighlty to make regression tree and then by taking an average reduces the variance in the oredicted value

minimum bucket in forests is called

Node size

other parameter in trees

number of trees

default number of trees in r

500

`more trees is better because

unbiased

why would more trees = issue

computationally dificult

dont have to worry about what in forests

overfitting

Trees Flashcards

(31 cards)