STA2300 Flashcards Preview

STA2300 > STA2300 > Flashcards

Flashcards in STA2300 Deck (75)
Loading flashcards...
1

M1-4: What are quantitative, categorical and ordinal variables?

Quantitative: Take on numerical values. Can find the average (ie height, heart rate, etc.)

Categorical: Definite categories (ie male or female). Doesn't make sense to average. May be coded on SPSS.

Ordinal: Categorical data in a set order (ie survey - disagree, neutral, agree, etc.).

2

M1-4: What graphs should we use for quantitative variables?

Stem and leaf plot & histogram

3

M1-4: What graphs should we use for Categorical variables?

Bar chart & pie chart

4

M1-4: What 3 features do we look at in graphs of quantitative variables (stem and leaf, boxplot & histogram)?

i) Shape - number of modes / peaks, symmetry, deviations, etc.
ii) Centre - a typical approximate value
iii) Spread - the range of values the data can take.

5

M1-4: What is the 5 number summary?

Minimum, Quartile 1, Median, Quartile 2, Maximum

6

M1-4: What characterises the Normal model?

Mean (mu) and SD (sigma) as well as bell-shaped approximation.

7

M1-4: z-score is the number of standard deviations the observation is above the mean. Converting to a z-score, is a process called ________. What is the formula for this?

standardising
z = (y-μ) / σ

8

M1-4: Converting z-scores to y is a process called _______? What is the formula for this? ** (not on formula sheet) **

unstandardising
y = μ + z σ

9

M1-4: What is correlation?

Measures the direction and strength of linear relationship between two quantitative variables. It is measured using the coefficient r (only if linear).

10

M1-4: R^2 measures what?

Strength only of a relationship between two quantitative variables. Normally expressed as a percentage.

11

M1-4: What is the general form of a regression line? What do the components represent?

ŷ = b0 + b1x

ŷ denotes predicted value of y
b0 is the intercept
b1 is the slope

12

M?? - What are the 5 guidelines to supporting P-values and conclusions?

> 10%: Insufficient evidence to support Ha (re-state Ha)
5-10%: Slight evidence to support Ha (re-state Ha)
1-5%: Moderate evidence to support Ha (re-state Ha)
0.1 - 1%: Strong evidence to support Ha (re-state Ha)
< 0.1%: Very strong evidence to support Ha (re-state Ha)

13

M??? - There are rows on the formula sheet main page. What does each row provide the formulas and characters for, for both hypothesis testing and Confidence Intervals?

- The first row is for proportions
- The second row is for one-sample mean
- The third line is the two-sample mean
- The last line is for paired means

14

M1-4: What are response and explanatory variables? What axis do they go on?

A response (dependent) variable is a particular quantity that we ask a question about in our study. We put it on the Y-AXIS.

An explanatory (independent) variable is any factor that can influence the response variable. We put it on the X-AXIS.

15

M1-4: What are formulas for mean and standard deviations of a binomial?

The mean µ of a binomial is np.

The SD σ of a binomial is √npq

16

M7: What is p-hat?

p̂ is a sample proportion statistic. It is a variable and has a distribution. Larger sample sizes means the mean stays similar, the spread gets smaller and sample proportion looks more Normal.

It is calculated by X / n, where n is the sample size and X is the number of occurrences of the desired event by sample size.

17

M7: What is SD(y bar)?

SD(y bar) = sigma / square root of n.

It refers to the sample standard deviation.

Used in questions like: "The annual household income in Brisbane is known to be $72000 with a standard deviation of $12000. If we randomly select 80 incomes from this population, what is the probability that the average income in the sample is more than $75000?"

18

M7: Law of large numbers states that as sample size increases from a population with mean µ, what happens to sample mean y¯ of observed values?

It gets closer and closer to the population mean μ.

19

M7: What is a standard error?

The SD of any sample proportion. It is found by the square root of (p hat x q hat / n).

So, where question is:
Suppose that 20% of a random sample of n = 64 Data Analysis students receive an A for the subject. What is the standard error of the sample proportion?

We get square root of ((0.2 x 0.8) / 64) = 0.05

20

M7: How would you describe the distribution of sample proportions?

The distribution of sample proportions is approximately normal with mean=p and standard error = square root of (pq / n).

21

M7: What is a sample proportion and how can it be identified?

It is when the question gives a p value. p and p-hat are not used in sample means (y and y hat are).

22

M7: What is x bar in statistics?

x-bar is used to represent the sample mean, a statistic, which is used to estimate the true population parameter, μ.

23

M8: The statement "there is a 95% probability that the population mean is between 350 and 400" may also mean what?

The 95% confidence interval for the population mean is (350, 400).

24

M8: Does increasing the sample size increase or decrease the confidence interval width, and why?

It decreases it, as it decreases the STANDARD ERROR, the statistic whereby n value is computed.

25

* M8: What does statistical inference refer to?

Drawing conclusions about parameters.

26

M8: What is the Standard Error of the sampling distribution of a proportions question?

SE(p-hat) = square root of ((p-hat x q-hat) / n)

27

* M8: To halve the margin of error at the same level of confidence, what do you need to do?

Find ME (critical value x SE(statistic)) and alter the n value in the SE(statistic) to work.

28

M8: How do you find the ME?

ME can be found from critical value x SE(statistic).

29

M9: As the sample size increases, the Margin of Error ______ ?

Decreases.

The more samples / information you have, the more accurate your data is going to be, hence a smaller ME.

Large samples mean the ME nears zero.

30

M??: What is the Centre of a distribution?

i) Look at a graph, or a list of the numbers, and see if the center is obvious.
ii) Find the mean, the “average” of the data set.
iii) Find the median, the middle number.