Week 7 (Actual) Flashcards

1
Q

What is a Decision Tree?

A

A tree structure which consists of:

Root/internal node (independent)
Leaf (dependent variable)
Branch (decision)

Can be classification or regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are decision trees constructed?

A

Given a data set, group similar samples and look for the best rules that split dissimilar samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Gini Index?

A

Given a training dataset of J classes:

IG(p) = 1 - sum pi^2

where pi is the fraction of items labelled with class i in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Information Gain?

A

The information we gain after splitting the samples based on an independent variable.

IG (Y, X) = H(Y) - H (Y | X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the drawbacks of Decision Trees?

A

Unstable - Small change in data results in large change.

Relatively Inaccurate - Support vector machine and neural networks perform better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Probabilistic Graphical Models?

A

Nodes represent random variables, and edges (links/arcs) represent conditional independence.

Undirected or Directed (Bayesian).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Bayesian Networks?

A

A kind of probabilistic graphical model that uses the direction of edges to represent the cause-effect relationship and Bayes theorem for probabilistic inference.

A compact representation of a probability distribution in terms of conditional distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages of Bayesian Networks?

A

Graphical Representation: of joint probability distributions of random variables - interpretable.

More powerful: can capture complex relationships.

Combine data and prior knowledge: better approximation.

Generative approach: generate new data similar to existing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the disadvantages of Bayesian Networks?

A

Requires prior knowledge of many probabilites.

Sometimes computationally intractable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main problems faced in Bayesian Networks?

A

Inference.

Training the models.

Determining the structure of the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you represent the joint probability distributions of random variables?

A

A set of nodes: represent random variables.

A set of directed edges: represents “directed dependency”.

A conditional distribution for each node given its parents: P(Xi | Parents(Xi)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What groups do random variables (nodes) fall in to?

A

Observed: The nodes we have knowledge about.

Unobserved: Nodes we have to infer probability for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Markov condition?

A

Each random variable X is conditionally indepdendent of its non-descendants, given its parents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly