models Flashcards
Provost’s 9 main model categories
clustering / segmentation (u)
classification (s)
regression (s)
similarity matching (s, u)
co-occurrence grouping (u)
profiling (u)
link prediction (s, u)
data reduction (s, u)
causal modeling
linear discriminant
a hyperplanar discriminant for a binary target variable will split the attribute phase space into 2 regions
fitting:
* we can apply entropy measure to the two resulting segments, to check for information gain (weighting each side by the number of instances in it)
* we can check the means of each of the classes along the hyperplane normal, and seek maximum inter-mean separation
probability estimation tree
a classification tree that may be considered a hybrid between classification and regression models
leaves are annotated with a category value, and a probability
decision tree (general)
for regression or classification
tunable via
- minimum leaf size
- number of terminal leaves allowed
- number of nodes allowed
- tree depth
support vector machines (linear)
simplest case involves a hyperplanar fitting surface, in combination with L2 regularization, and possibly a hinge loss function
via the kernel trick, more sophsiticated fitting surfaces can be used
support vectors consist of a subset of the training instances used to fit the model
logistic regression
typically used for modeling binary classification probabilities
in simplest form, a simple linear regression model in a sigmoid wrapper: 1/(1+exp(M)) where M is the linear regression model (ie linear hyperplane scalar field over the attribute phase space)
for a special logistic loss function, the loss surface is convex, allowing steepest descent
hierarchical clustering
under some (cluster) metric, find the two closest clusters, and merge them; iterate
the cluster metric is called the linkage function
centroid clustering
each cluster is represented by its cluster center, or centroid
k-means method
choose starting centers for k clusters in the predictor phase space, then iterate (can be tuned over different k):
* assign each instance to the cluster it’s closest to
* calculate the centroid of each of the resulting clusters
naive Bayes
for classification
generative; features are considered for giving evidence for or against target variable values; each instance gets its own pdf
allows instant updating, with new data (Bayesian property)
relies on the class as the prior, with the instance the conditioning event: p(E|C=c) = p(C=c|E)p(C=c) / p(E)
probability of class C=c, given instance E, where e_i are individual instance-predictor values or ranges:
- p(C=c|E) = p(e_1|c)…p(e_k|c)p(C=c) / p(E)
- this assumes strong independence of effect of individual predictors on class values
- without the independence assumption, p(C=c|E) is very hard to compute (“sparseness” of individual instances)
p(E)
- can be difficult to compute accurately, so naive Bayes may leave it out, yielding a ranking classifier
- however, a full formula does exist, which includes p(E)
further simplified (with p(E) decomposed), to put in terms of predictor lift: p(c|E) = p(e_1|c)…p(e_k|c)p(c) / p(e_1)…p(e_k)