Lecture 8 Flashcards

1
Q

What are Randomization tests?

A

Permutation tests; exact tests; test that enumerates ALL of the possible outcomes that could occur in some reference besides the outcome that was actually observed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 2 characteristics of a Randomization test?

A
  1. Small data IS NOT poor data.

2. If data sets are too large for exact tests, then typically MC methods or a classical parametric test will be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name 2 characteristics of Fisher’s exact test.

A
  1. It had a one-sided Alternative Hypothesis.

2. It does not require large samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Confusion Matrix?

A

One dichotomous variable represents Reality/truth. The other dichotomous variable represents a measurement/test/claim.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Under which 3 conditions will a Hypergeometric distribution exist?

A
  1. Total number of items (population) is fixed.
  2. Sample size (number of trials) is a portion of the population.
  3. Probability of success changes after each trial.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is overfitting?

A

Something that occurs when we fit our model “too perfectly” to the data at hand (zero bias), and will thus perform poorly and unpredictably on “new” data, across different samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

To what leads overfitting?

A

To large variance of parameters when the model is applied to new data/samples. There is zero/low bias and high variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can we introduce when the model fits the data “too well”?

A

A penalty term to introduce bias to the traditional estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which 2 things can avoid overfitting?

A
  1. Implenting the Bias Variance Trade-off.

2. Implenting train-test paradigm principles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Bias Variance Trade-off?

A

To avoid overfitting we should introduce a little bias in the model built on the data at hand (no perfect fit anymore), it will perform better on “new” data with less variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which 2 regularisation techniques are based on the Bias Variance Trade-off principle? What do they do?

A
  1. Ridge regression.
  2. Lasso regression.
    These techniques slightly change the slope of the regression line in order to create bias.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Tran-Test paradigm?

A

We should use training data to build the model and test data to assess the model. Both training and testing data are created (from the dataset/sample that we started with) by sophisticated partitioning and using the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Cross-Validation?

A

It is the most famous technique to implement these ideas, more particularly by sophisticated partitioning the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly