models Flashcards

1
Q

Provost’s 9 main model categories

A

clustering / segmentation (u)
classification (s)
regression (s)
similarity matching (s, u)
co-occurrence grouping (u)
profiling (u)
link prediction (s, u)
data reduction (s, u)
causal modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

linear discriminant

A

a hyperplanar discriminant for a binary target variable will split the attribute phase space into 2 regions

fitting:
* we can apply entropy measure to the two resulting segments, to check for information gain (weighting each side by the number of instances in it)
* we can check the means of each of the classes along the hyperplane normal, and seek maximum inter-mean separation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

probability estimation tree

A

a classification tree that may be considered a hybrid between classification and regression models

leaves are annotated with a category value, and a probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

decision tree (general)

A

for regression or classification

tunable via

  • minimum leaf size
  • number of terminal leaves allowed
  • number of nodes allowed
  • tree depth
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

support vector machines (linear)

A

simplest case involves a hyperplanar fitting surface, in combination with L2 regularization, and possibly a hinge loss function

via the kernel trick, more sophsiticated fitting surfaces can be used

support vectors consist of a subset of the training instances used to fit the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

logistic regression

A

typically used for modeling binary classification probabilities

in simplest form, a simple linear regression model in a sigmoid wrapper: 1/(1+exp(M)) where M is the linear regression model (ie linear hyperplane scalar field over the attribute phase space)

for a special logistic loss function, the loss surface is convex, allowing steepest descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

hierarchical clustering

A

under some (cluster) metric, find the two closest clusters, and merge them; iterate

the cluster metric is called the linkage function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

centroid clustering

A

each cluster is represented by its cluster center, or centroid

k-means method

choose starting centers for k clusters in the predictor phase space, then iterate (can be tuned over different k):
* assign each instance to the cluster it’s closest to
* calculate the centroid of each of the resulting clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

naive Bayes

A

for classification

generative; features are considered for giving evidence for or against target variable values; each instance gets its own pdf

allows instant updating, with new data (Bayesian property)

relies on the class as the prior, with the instance the conditioning event: p(E|C=c) = p(C=c|E)p(C=c) / p(E)

probability of class C=c, given instance E, where e_i are individual instance-predictor values or ranges:

  • p(C=c|E) = p(e_1|c)…p(e_k|c)p(C=c) / p(E)
  • this assumes strong independence of effect of individual predictors on class values
  • without the independence assumption, p(C=c|E) is very hard to compute (“sparseness” of individual instances)

p(E)

  • can be difficult to compute accurately, so naive Bayes may leave it out, yielding a ranking classifier
  • however, a full formula does exist, which includes p(E)

further simplified (with p(E) decomposed), to put in terms of predictor lift: p(c|E) = p(e_1|c)…p(e_k|c)p(c) / p(e_1)…p(e_k)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly