QDA Flashcards

(34 cards)

1
Q

3 ways to numerically summarise a categorical variable

A
  1. Frequencies or counts
  2. Relative frequencies
  3. Relative cummulative frequencies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can categorical variables be summarised visually?

A

Bar and pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are bar charts for and what are the y and x axis

A

Representing frequencies of each of the different categories, the y axis is the frequencies and the x axis are the categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are pie charts for?

A

Representing the frequencies of each of the different categories as a slice of pie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When describing the contents of a numerical variable we can look at different aspects of its distribution such as:

A

Measures of location such as the mean
Measures of spread and variability
Extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is a t.test used?

A

When variables are independent and the errors are normally distributed. Use the mean to calculate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Wilcoxon rank sum test?
How does it work

A

Non-parametric alternative to a t.test. (Used when we cannot assume a normal distribution)

Puts all measurements into one column and assigns a value to each value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a scatter plot do?
What to look for and how to interpret

A

Display two numerical variables of interest along the x axis (independent) and y axis (dependent)

Whether it has a positive relation, linear, quadratic or exponential, strong relation, clear relation or outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Two main types of analysis

A

Descriptive - Describing data using numerics or graphical

Inferential - Using sample data to make a conclusion on larger populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main data types?

A

Categorical - Attributes observes for sampling unit. Binary categories

Numerical - Numerical value on a discrete, ordinal or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a confidence interval?

A

The likely range the mean/proportion would fall in if the exercise was repeated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

P value rule

A

P value <= a = Reject the null (significant)
P value > a = Fail to reject null (not significant)

(P value should be less than 0.05 for any difference to be significant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does it mean to test a null hypothesis?

A

It is what you’re trying to disprove. It is the given facts

The mean has a specific value against an alternative hypothesis.

H0: u = u0
H1: u =/ u0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the type 1 and 2 error probabilities

A

a = p(type 1 error) = p(reject H0 | H0 is true)

B = p(type 2 error = p(fail to reject H0 | H0 is false)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to test for normality

A

Quantile - Quantile (Q-Q plot)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Numerical tests for normality

A

Kolmogorov-Smirnov (K-S) test
Shapiro-Wilks test

17
Q

How do you test for variance?
Give the definition of each test

A

ANOVA is the main test for variance as it used to determine if there is a statistically significant difference between two or more categorical groups by testing the differences of means by using variance

Fishers F test which involves dividing the larger variance by the smaller variance

18
Q

What is a prop test?

A

A test to find the confidence interval for the mean of a population from a sample (proportion)

Testing the proportions in several groups are the same by using their means

19
Q

What can we use for hypothesis testing?

A

T.test. The mean for a sample from a population

20
Q

What is a correlation test used for?
Provide an example

A

Used for numerical data as a pre-step to linear regression

Eg speed vs distance

21
Q

What type of variables does linear regression use?

A

Two continuous variables that are numeric for both the independent and the dependent

22
Q

What is a pearsons chi squared test (x2)

A

Used to discover whether there is a relationship between two categorical variables

23
Q

What is the difference between a one way ANOVA and two way?

A

One way ANOVA is a parametric test used to determine whether there are any significant differences between the means of two or more independent variables

Two way ANOVA is testing the effect of two independent variables on a dependent variable

24
Q

How to graphically show the variance in a categorical and continuous variable?

25
Name 4 diagnostic plots to test lm models
Residual vs fitted QQ plot Scale location Residuals vs leverage
26
What look for in residual vs fitted diagnostic plot
It should look scattered otherwise suggest issues with model assumptions
27
What to look for in a QQ plot
Needs to be a straight line for all plotted values
28
How to analyse models in lm
Discuss coefficients Linear relations Significant SSR and SSE ratio R2 value Outliers Unwanted patterns in residuals requiring transformation Check if they fit the assumption of homoscedascity
29
What different types of models are there?
Linear regression Multiple regression ANOVA and ANCOVA Logistic regression
30
When do you use ANOVA? When do you use One way and multi-way
When all explanatory variables are categorical One way is used when there is one factor or categorical independent variable Multi way is when there’s more than one categorical independent variable
31
Different types of transformation techniques for models
Log dependent variable Square the independent variable 1/ the independent variable Joining categories
32
How to graphically represent a fully numeric dataset What does it do?
Through the plots() It plots all the numeric variables all at once with each other
33
When do you use a logistic regression?
When all the variables are categorical and the dependent variable is binary
34
What do you need to use a chi squared test for?
When you want to find out if two variables are independent If the expected frequencies of the categorical variables are less than 5 then use a fishers exact test