Model Selection: Summarizing and Visualizing Training Samples Flashcards

(19 cards)

1
Q

Why do we need to characterize the training samples?

A
  • for features selection and construction
  • for choosing a proper model class and its complexity,
  • for preprocessing the samples e.g. standardization (each component has zero mean and unit variance) or whitening (covariance structure is removed).
  • for detecting redundancy in the samples.
  • to indicate how many prototypes or typical cases are present
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is univariate data?

A

Univariate data is only a set of numbers, that is, a set of scalars. Each number corresponds to an observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are measures of center when it comes to univariate data?

A

measures of the center:
mean: arithmetic average sums up all numbers and divides sum by the number of numbers.
median: the middle value in the sorted list of numbers.
mode: the value that occurs most often.
mid-range: half of the range (max minus min value).

Mathematical characteristics
the mean minimizes the average squared deviation (L2-norm),
the median minimizes average absolute deviation (L1-norm),
the mid-range (0.5 times the range) minimizes the maximum absolute deviation (L∞norm).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three types of Distribution?

A

symmetric distributions: mean = median.
Gaussian distribution: mean and median should be estimated by the empirical mean.
Laplace distribution: mean and median should be estimated by the empirical median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can we measure variability?

A

range: largest sample minus smallest sample.
variance: average deviation from the sample mean computed as the square root of the average squared deviation from the sample mean.
quantiles: the value for which a certain percentage of samples is larger or is smaller. Quartiles use 25%, 50% (median), and 75%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The default statistics, which are shown for the boxplot are:

A
  • the median as horizontal bar
  • the box ranging from the lower to the upper quartile
  • whiskers range from the maximal to the minimal observations without the outliers
  • outliers as points; outliers are observations that have larger deviation than fact times the interquartile range from the upper or lower quartile. In R default is fact=1.5.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain Histograms

A

Histogram: graphical representation of the data distribution which shows tabulated frequencies as adjacent rectangles which erect over discrete intervals (bins).
* area of the rectangle: equal to the frequency of the observations in the interval
* equidistant bins: heights of the rectangles proportional to frequency of the observations

Histograms help to assess:
* spread or variation
* general shape
* peaks
* low density regions
* outliers informative overview of the observations via histogram with R command hist()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens when smoothing the surface of a histogram

A

Smoothing the surface of a histogram leads to a probability density function. That is, if samples go to infinity and the bin widths to zero. In general, probability density functions are obtained by kernel density estimation (KDE) which is a non-parametric (except for the bandwidth) method also called Parzen-Rosenblatt window method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the most tricky part of a KDE?

A

The most tricky part of KDE is the bandwidth selection:
* too small: many peaks and wiggly (overfitting)
* too large: peaks vanish and no details (underfitting)

For Gaussian kernels rule-of-thumb (Silverman’s rule).
The closer the true density to a Gaussian, the better the estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a violin plot?

A

combining boxplot and density estimation a rotated kernel density at each side of boxplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does bivariate data work?

A

bivariate data: two scalar variables, pairs of data points
x: response, dependent variable, target, output, label
y: explanatory variable, independent variable, regressor, feature

response is caused by explanatory variable -> causality
statistical or machine learning methods cannot determine causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does a scatter plot work?

A

A scatter plot shows each observation as a point, where the x-coordinate is the value of the first variable and the y-coordinate is the value of the second variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does linear dependence work in scatter plots?

A
  • two variables linearly dependent: points are on a line
  • two variables linearly dependent to some degree: points at a line
  • the more points are on a line, the higher the linear dependence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For bivariate data, what is the measure of the linear correlation between to variables?

A

For the bivariate data (x1,y1),(x2,y2),…,(xN,yN) Pearson’s sample correlation coefficient r is a measure of the linear correlation (dependence) between the two variables x and y

or a perfect linear dependency the correlation coefficient is r = 1 or r = −1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For bivariate data, what is the test of independence between to variables?

A

a test for a correlation coefficient of ρ = 0. The test is a t-test with the null hypothesis H0 that ρ = 0. The test is only valid if both variables are drawn from a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does linear regression do?

A

Linear regression: fit a line to bivariate data
Extract information about the relation of the two variables y and x. functional relationship: y = a + b x

17
Q

What are some unsupervised methods to summarize multivariate samples?

A
  • principal component analysis
  • independent component analysis
  • factor analysis
  • projection pursuit
  • k-means clustering
  • hierarchical clustering
  • mixture models: Gaussian mixtures
  • self-organizing maps
  • kernel density estimation
  • hidden Markov models
  • Markov networks (Markov random fields)
  • restricted Boltzmann machines
  • neural network: auto-associators, unsupervised deep nets
18
Q

What are some descriptive methods to summarize multivariate data?

A

Projection methods:
* new representation of objects
* down-projection into lower-dimensional space: keeps the neighborhoods
* finding structure in the data

principal component analysis, multidimensional scaling

descriptive:
* map to lower dimensional space
* compact and non-redundant data storage or transmission
* data visualization
* feature selection
* preprocessing methods for subsequent data analysis
* descriptive model with unique inverse -> generative framework which selects inverse model, e.g. density estimation
* descriptive model without inverse -> principal curves, multidimensional scaling

19
Q

What are some generative methods to summarize multivariate data?

A

Generative models:
* build a model of the observed data
* match the observed data density

density estimation, factor analysis, independent component analysis, generative topographic mapping

generative:
* model or to simulate the real world
* model samples same distribution as the real world observations
* describe the data generation process

Advantages of generative models:
* determining model parameters e.g. calcium concentration, reaction rate, distribution of channels
* generating new simulated observations,
* simulating in unknowns regimes, e.g. new parameters,
* assessing the noise and the signal in the data
* supplying distributions and error bars for latent variables
* detection of outliers as very unlikely observations,
* detection and correction of noise in the observations