MODULE 2 S3.1 Flashcards

Decision Tree (46 cards)

1
Q

They are widely used for classification and regression tasks.

A

Decision Trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Decision trees learn a hierarchy of _________ questions, leading to a ____________.

A

if/else
decision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Learning a decision tree means learning the sequence of if/else questions that gets us to the __________ answer most quickly.

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In the machine learning setting, questions are called _________

A

tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F To build a tree, the algorithm searches over all possible tests and finds the one that is most informative about the target variable

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The top node, which represents the whole dataset

A

root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Decision tree classes

A

DecisionTreeRegressor
DecisionTreeClassifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

We can visualize the tree using the ____________ function from the tree module.

A

export_graphiz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It is a text file format for storing graphs.

A

.dot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

It is a diagram or chart that people use to determine a course of action or show a statistical probability.

A

Decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

____________ : feature (attribute)
____________ : decision (rule) or reaction
____________ : outcome

A

node
branch
leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Decision Tree in Machine Learning

A

Classification trees
Regression trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Decision Variables

Classification : _____________
Regression : ______________

A

categorical
continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The topmost node of a decision tree that represents the entire message or decision.

A

Root node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The process of dividing a bode into two or more nodes. It’s the part at which the decision branches off into variables.

A

Splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A node within a decision tree where the prior nose branches into two or more variables.

A

Decision (internal) node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Also called as the external or terminal node, It is the last node in the tree and furthest from the root node.

A

Leaf (terminal) node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Paths that connect the nodes and represent the different possible outcomes of the test.

A

Branch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Nodes that precede other nodes in the tree hierarchy.

20
Q

Nodes directly connected to the parent node, resulting from the split or decision made at the parent node.

21
Q

The opposite of splitting, the process of going through and reducing the tree to only the most important nodes or outcomes.

22
Q

Decision Tree Algorithms

A

ID3 (Iterative Dichotomiser 3)
C4.5
CART (Classification and Regression Tree)
CHAID (Chi-square Automatic Interaction Detection)
MARS (Multivariate Adaptive Regression Splines)

23
Q

This algorithm uses the information gain metric to determine the best feature to split on at each node.

A

ID3 (Iterative Dichotomiser 3)

24
Q

T/F ID3 is prone to underfit.

25
ID3 is prone to _________ and can create __________.
overfitting huge trees
26
Algorithm that continues splitting until all instances are perfectly classified or no further useful features are available.
ID3
27
ID3 was developed by _________________ in ______
Ross Quinlan 1986
28
successor of ID3
C4.5
29
Who developed C4.5?
Ross Quinlan
30
Algorithm that uses the gain ratio metric instead of information gain to account for the number of branches in a feature.
C4.5
31
It handles both categorixal and continuous data and prunes trees to avoid overfitting, which makes it better at handling noisy data.
C4.5
32
Algorithm that can be used for classification and regression problems and uses Gini impurity or mean squared error.
CART (Classification and Regression Tree)
33
CART was developed by
Leo Breiman Jerome Friedman Richard Olshen Charles Stone
34
CART uses : classification : _____________ regression : ______________
Gini impurity mean squared error
35
It provides clear and interpretable models, and trees are pruned to prevent overfitting.
CART
36
Algorithm that uses chi-squared tests to find the best split.
CHAID (Chi-square Automatic Interaction Detection)
37
Algorithm that is typically used for categorical variables and can handle multiway splits. It performs multi-level splits when computing classification trees.
CHAID
38
CHAID was developed by ____________
Gordon Kass
39
Algorithm that is primarily used for regression. It builds models by fitting piecewise linear regressions and combining them into a single model.
MARS (Multivariate Adaptive Regression Splines)
40
T/F MARS is capable of modeling complex, nonlinear relationships, and interactions between featers.
True
41
MARS was developed by ______________
Jerome Friedman
42
Full form of ID3
Iterative Dichotomiser 3
43
Full form of CART
Classification and Regression Tree
44
Full form of CHAID
Chi-square Automatic Interaction Detection
45
Full form of MARS
Multivariate Adaptive Regression Splines
46
T/F CART is an n-ary tree
FALSE Binary tree