Categorical data analysis Flashcards

1
Q

When is categorical data analysis used?

A

When your outcome variable is nominal scale

The predictor variables can be anything, however, these lectures will only have a single predictor variable and the predictor will be a nominal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the chi-square “goodness of fit” test?

A

A test to determine how good our observed data matches the values expected by theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What value does a chi-square ‘goodness of fit’ test use?

A

An X2 value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When should you use a chi-square ‘goodness of fit’ test?

A

Chi-square goodness of fit test is used for categorical data when you want to compare observed frequencies against some hypothesis about the true probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the four principles for a statistical test for the chi-square ‘goodness of fit’ test.

  • A diagnostic test statistic T*
  • Sampling distribution of T if the null is true*
  • The observed T in your data*
  • A rule that maps every value of T onto a decision (accept or reject H0)*
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you get the chi-squared distribution (X2)?

A

X2 is what you get when you take normally distributed data, square it and add it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the features of the chi-square ( χ2 ) distribution?

A

Continuous distribution

Has a noticeable positive skew to it

The shape of the distribution depends on the ‘degrees of freedom’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ‘degrees of freedom’ (DOF)?

A

The total number of ‘things’ you’re interested in minus the number of known constraints on those ‘things’

  • The number of degrees of freedom is the number of quantities of interest in the data - 1 (one constraint on those quantities)*
  • E.g. for a chi-square goodeness of fit test involving k categories, the degrees of freedom is equal to k-1*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is another name for the rejection region?

A

Critical region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When can we reject the H0 in a chi-square test?

A

There is a 5% chance of observing an X2 value greater than the significance level

Therefore we can ensure a Type 1 error rate of .05 if we reject H0 only if X2 is greater than 95% significance level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 important outputs in a chi-square ‘goodness of fit’ test?

A

The test statistic (X2)

The p-value

The degrees of freedom for the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How should you write up a chi-square ‘goodness of fit test’?

A

1) Report the relevant descriptive statistics (can also do this in a table or figure in your text)
2) Specify the null hypothesis and the statistical test run
3) Give the result of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Chi-square tests are used for 1)______ data: the outcome variable is 2)_____ scale

A

1) Categorical
2) Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For a chi-square test, describe:

Diagnostic test statistic

Distribution

Degrees of freedom

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the R code for a ‘goodness of fit’ test?

A

goodnessOfFitTest(x, p)

x = raw nominal data, p = null hypothesis

e.g. polling data, election results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a chi-square test of association?

(also known as chi-square test of independence or chi-square test of homogeneity)

A

Very similar to chi-square goodness of fit test but you use it if you aren’t given the expected frequencies

Instead, you estimate them based on the data

E.g. the null hypothesis population parameter for j is given by the total observations for j divided by sample size

17
Q

How is sampling distribution calculated in the chi-square ‘test of association’?

A

Created by squaring and summing the normally distributed variables

Same as goodness of fit

18
Q

χ2(3) = 11.303, p = 0.0102 is a stat block from what type of test?

Explain what the numbers mean.

A

Chi-square test

χ2(3) = 11.303, p = 0.0102

χ2 = sampling distribution

(3) = degrees of freedom
11. 303 = test statistic

p = 0.0102 = p-value

19
Q

Chi-square ‘goodness of fit’ tests compare observed frequencies of one variable vs what?

A

A hypothesis about the true probabilities of that variable.

20
Q

What do chi-square ‘tests of association’ / ‘test of independence’ test?

A

If two nominal scale variables are related to each other

21
Q

What do chi-square ‘tests of association’ / ‘test of independence’ use for its test statistic?

A

X2

22
Q

How is degrees of freedom calculated for a chi-square test of association?

A

(r-1)(c-1)

where r=# of categories of one variable and c=# of categories of the other

23
Q

How do you run a chi-square test of association in R?

A

chisq.test(x)

x = observed frequency contingency table of two nominal variables

24
Q

Describe the

diagnostic test statistic

distribution

degrees of freedom

for chi-square test of association

A