Decision Trees and Networks Flashcards
(21 cards)
What is a decision tree?
It is a decision support tool that uses a tree-like model of decisions and their possible consequences.
What is the goal of a decision tree?
It aims to predict the value of a target variable based on several input variables.
What are the two types of decision tree?
- Classification Tree
- Regression Tree
How is the impurity of leaves quantified?
Using the Gini Impurity
How is the Gini Impurity for a leafcalculated?
The Gini Impurity of n is 1 minus the sum of the probability for class i squared.
How is the total Gini Impurity calculated?
Total Gini Impurity is a weighted average of the Gini Impurity for all of the leaves.
How is Gini Impurity calculated for numeric data?
Before calculating Gini impurity observations need to be sorted by the numeric category and the average of each pair of observations taken. The gini impurity is then calculated for each of these as if it where categorical.
Which category is used as the root of the tree?
The category with the lowest Gini impurity score.
What are the two types of pruning for decision trees?
- Pre-pruning
- Post-pruning
What is pruning?
Pruning is a data compression technique that reduces the size of a tree by removing sections that are non-critical and redundant to classify instances.
Why should pruning be used?
Because it reduces the complexity of the final classifier improving the prediction accuracy by reducing overfitting.
What is the aim of pre-pruning?
To make sure the tree does not contain too many layers.
How is pre-pruning implemented?
Pre-pruning specifies the minimum number of samples that must be present in the nodes. This means the pruning takes place as the tree is being created.
How is Post-pruning implemented?
Starting at the lowest branch the error of the whole tree (e) and the error of the tree minus the branch (e’) are compared. Remove the lowest branch if e’ < e. Repeat.
What is centrality?
How important a node is in a network
What is degree?
The number of edges a node has
What is an Eigenvector Centrality?
A measure of a neighbour’s relative importance with each vertex being proportional to the sum of its neighbours.
What is PageRank?
The importance derived form a neighbour based on its out-degree
What is a nodes closeness?
The mean distance to other vertices (geodisic paths)
What is a nodes betweenness?
The sum of geodisic paths passing through a vertex