graphs Flashcards

1
Q

receiver operating characteristic (ROC) graph

A

usually for ranking classifiers (usually binary); for accepting n most likely classifications (as “positive”), over all the test set of size N, create confusion matrix; plot false positive rate on x axis (N-n divided by total negative in the whole set), and true positive rate on y axis (n divided by total positive in the whole set); plot over all acceptable n

features:

  • ROC graphs remove class priors (eg class proportion imbalances)–they allow looking at the model’s predictive power (“if there are many negative examples, even a moderate false alarm rate can be unmanageable”)
  • do not factor in costs/benefits
  • for ranking classifiers, the area under the ROC curve is of significance (above and to left of diagonal); this statistic is equivalent to the Mann-Whitney-Wilcoxon measure; it’s also equivalent to the Gini coefficient (with a “minor algebraic transformation”)

ROC space details:

  • a classifier near the LLC (left side and near x-axis, abv main diagonal) are interpreted as “conservative”–they make in-class predictions only with strong evidence, so make few false positive errors (but sacrifice true positives in the process)
  • a classifier near the URC (abv main diagonal, but on rh side, w/ y close to 1) is interpreted as “permissive”–they make positive classifications with “weak evidence”
  • diagonal line from (0,0) to (1,1)–the policy of “guessing a class” (in a Bernoulli sense); eg guesses positive class half the time (coin-flip-wise), it will converge to (0.5,0.5); guesses positive 90% of the time, will converge to (0.9,0.9)
  • any performance in the square half below and to the right of the (0,0) to (1,1) diagonal would be “worse than random guessing”
  • a ranking model (usually) starts with everything classified as “N” (ie we select the top “zero” entries of the test set in the ranking order)–so in the LL corner of the ROC space (0,0) / nothing is ranked as positive, so both true and false positive rates are 0 (highly conservative)
  • at the other extreme, for high “n,” the ranking model is assuming everything is positive, arbitrarily, putting points in the UR corner of ROC space (1,1) (highly permissive)
  • for optimal ranking classifiers, we would expect the curve getting close to ideal–UL corner in ROC space (0,1), where all true positives in the test set have been accurately classified, with no false positives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

profit curve

A

with a ranking classifier, create confusion matrix for accepting n most likely correct category classifications; compute profit/loss from the confusion matrix; plot profit/loss as a function of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

2x2 classification table

A

frequency matrix for binary classification problems

usually,
predictions are on rows: (1) positive, (2) negative
true classes are on columns: (1) positive, (2) negative

rates are column-based:

  • sensitivity aka recall, true positive rate, proportion of positive outcomes predicted positive
  • specificity aka precision, true negative rate, proportion of negative outcomes predicted negative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

confusion matrix

A

a frequency matrix for classification problems; each row a model (class) prediction and each column the actual class; the closer to diagonal the matrix is, the better the model; useful for imbalanced classes, giving more information re accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

learning curve

A

for a given model and a fixed holdout set size, plot the model accuracy as a function of training set data size; typically plateaus as marginal gain of more data goes to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

gini coefficient

A

a general measure of dispersion, as area between Lorenz curve and diagonal line; eg plot the cumulative holdings of wealth by the population, with population ordered in increasing order of wealthiness–if everyone had same wealth, g.c.=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

fitting graph

A

typically x axis is “model complexity” and y axis is model accuracy on (a) training data and (b) holdout data; “sweet spot” is where training data and holdout data plots are about to diverge away from each other–where training data starts to get increasingly accurate (overfitting), and holdout accuracy starts to plunge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

cumulative response curve and lift curve

A

for a ranked classifier at cutoff n with test set of size N, plots the true positive rate on the y-axis (n divided by total number of positives in the test set), against the proportion of the population that is considered in the class of relevance (i.e. n/N)

features:

  • similar to ROC curve, the greater the “lift” (rise abv main diag), the better the performance
  • in a true lift curve, the performance at any x value registers as the ratio between the curve’s value and the diagonal
  • cumulative response curves are not entirely independent of class priors–class priors determine potential rate of increase of the curve (unlike with ROC)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

dendogram

A

a 2-D visualization for progressive clustering; instances are on the x-axis, and the degree of clustering (low to high) is on the y-axis; the instances are ordered so that initial clusters are immediate neighbors, recursing on this ordering scheme as clustering is increased (i.e. at a given height / level on the dendogram, the ordering scheme applies to subgroups of instances)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

entropy graph

A

re segmentation and information gain–a visualization of the weighted-sum-of-entropies resulting from any given segmentation scheme–each segment occupies a proportion (0 to 1) on the x axis, the segment’s height is the classification entropy (so a kind of bar plot); low height means low entropy (so “good” classification for that segment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

scree plot

A

used (at least) in context of PCA, showing the percent of total variance as a function of the number of (leading) PCA components retained; so it allows figuring out how many PCA components to retain for modeling purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly