Exam Flashcards
(162 cards)
Knowledge Discovery
about understanding a domain with interpretable models
Prediction
stream where the methods you use do not matter
Black box setting
you don’t care about how the model works
Classification
- predicts future cases of a binary class.
- models the dependency target on other attributes.
- Sometimes a black-box classifier.
- some attributes may not appear because of overshadowing in decision trees.
- Supervised learning
Regression
tries to precict a numeric target variable
Clustering
divides a dataset into groups of similar cases
Frequent Patterns/Association
finds dependencies between variables
Support Vector Machine
a single line through the dataset that tries to find a nice boundary division between positive and negative attributes
Neural Network
same as the Support Vector machine, but does it with a curve
iid
identically distributed
Nominal
categorical/discrete, can only test for equality
Numeric
can test for inequalities and can use arithmetic or distance measure
Ordinal
can compare inequalities as well, but not use arithmetic or distance measure
Binary
Nominal variable with only two values
Entropy
measure of the amount of information/chaos, highest when entropy is distributed equally over the values, 1/m, unsupervised
Entropy formula
– plg(p) – (1–p)lg(1–p), p is the probability of a value, other way of writing it: Σ – Pilg(Pi)
Max Entropy formula
–m*1/m lg(1/m) = –lg(1/m) = lg m
Cumulative distribution function(CDF)
sums up all of the values in a dataset in a formula
Probability density function(PDF)
the derivative of the CDF, the relative density of points for each value, density is not the probability. the peak is where the most values and thus the biggest density is
Histograms
estimates density in a discrete way by defining cut points and count occurences per created bin. unsupervised method
Histograms Equal width
the bins are cut in equal size intervals
Histograms Equal height
the bins are cut so every bin contains about the same amount of data points
Kernel(Gaussian) Density Estimation
estimates the density of the population from a sample
Downside Entropy
the entropy concept does not apply well to numerical data, sadly, only to nominal data