STA2300 Flashcards

Question

* M8: What does statistical inference refer to?

Answer 1

Drawing conclusions about parameters.

Answer 2

SE(p-hat) = square root of ((p-hat x q-hat) / n)

Answer 3

Find ME (critical value x SE(statistic)) and alter the n value in the SE(statistic) to work.

Answer 4

ME can be found from critical value x SE(statistic).

Answer 5

Decreases. The more samples / information you have, the more accurate your data is going to be, hence a smaller ME. Large samples mean the ME nears zero.

Answer 6

i) Look at a graph, or a list of the numbers, and see if the center is obvious. ii) Find the mean, the “average” of the data set. iii) Find the median, the middle number.

Answer 7

Use the Table T with df (n-1) and the CI % (ie 80, 90, 95, 98, 99).

Answer 8

It is a confidence interval problem and a one-sample mean problem. We use the 2nd row of the sheet. The statistic is y-bar (sample mean) and the parameter is µ (mean).

Answer 9

The same centre and a narrower spread. While it is obvious the centre does not change, the spread can't be wider. This is due to the critical value determining the width of the confidence interval, and decreasing the CI decreases the critical value.

Answer 10

If many such confidence intervals were calculated by repeated sampling, X% of such intervals would contain the true population mean.

Answer 11

We assume H0 is true and assign μ = 0

Answer 12

A matched pair is when an observation in the first sample is matched to an observation in the second. The scores are paired from each formulation: there are two scores for each subject. We must compute the differences and treat them like a single sample.

Answer 13

Examples of this include: - Comparing on campus and off campus students - Comparing yields after using two fertilisers t = ((ȳA-ȳB)-(µA- µB))/(SE(ȳA-ȳB))

Answer 14

Because the P-value indicates, as per the hypothesis test, the likelihood of any significance being attributed to chance.

Answer 15

15 - 1 = 14. Remember we always use the smallest.

Answer 16

For a test with one mean, SD(statistic) = σ/(√n). However, sigma is almost always unknown, so we use the SE(statistic) = s/(√n) We then have: t = (ȳ-µ)/(SE(ȳ)), where SE(ȳ) is shown above.

Answer 17

b) Define the difference using μ in the hypothesis.

Answer 18

H0 is that A and B are not associated. Ha is that A and B are associated. Always describe the variables A and B though.

Answer 19

i) State the hypotheses ii) Compute the test statistic iii) Compute the P-value iv) Make a conclusion

Answer 20

We first compute the expected count for each cell. This is found by: Expected count = (Row total x column total /)/(Table total) We then compute test statistic: X^2 = Σ((Observed count-expected count)^2)/(Expected count)). Sigma here tells us that this is for EACH CELL.

Answer 21

A chi square (X2) statistic is used to investigate whether distributions of categorical variables differ from one another.

Answer 22

They are ALL two-tailed.

Answer 23

No, there isn't at the 1% because the P-value is higher than alpha (LoS). However, at the 5% and 10% there is, because the P-value is lower. Alpha (LoS) refers to the evidential cut off point required for Ha to be verified or assumed correct.

Answer 24

A discrete variable is able to be counted in a finite amount of time (eg money in pocket, grains of sand on a beach (although that may take a while), etc.)

Answer 25

A continuous variable would literally take forever to count. You can't count age, for example, or time, because it continues. You can however, turn age into a discrete variable by specifying a timeframe (ie, a humans age in years).

Answer 26

You need to count the females who support (usually with a yes) the question outlined. This is a trick question, made to look easy.

Answer 27

In a contingency table, column totals produce a marginal distribution. In a contingency table, row totals also produce a marginal distribution. Each column produces a conditional distribution. Each row also produces a conditional distribution.

Answer 28

(observed-expected)/(√expected). Residuals tell us how much predictions miss by.

Answer 29

New mean: $431 New SD: $115.6 The means would increase by that amount, while standard deviations stay the same.

Answer 30

They BOTH increase by 20%.

Answer 31

We observe less greens voters in the 50+ age | category.

Answer 32

Mu is the population mean, whilst y bar is the sample mean. If given a dataset of a sample and asked to calculate the mean, then it would be y bar. Same with s and sigma - if asked to calculate the standard deviation from a sample, then it is s.

Answer 33

i) Symmetric: Mean for measuring centre Standard Deviation for measuring spread ii) Asymmetric: Median for measuring centre IQR for measuring spread

Answer 34

Continuous

Answer 35

i) Stem and leaf plot ii) Histogram iii) Side-by-side boxplot iv) Scatterplot v) Bar Graph vi) Contingency tables Remember: Quantitative: Take on numerical values. Can find the average (ie height, heart rate, etc.) Categorical: Definite categories (ie male or female). Doesn't make sense to average. May be coded on SPSS.

Answer 36

When given the mean and SD of the population and asked to work backwards. This is different from standardising which is when you're given the mean and SD of the population and asked to find a probability. These come under the Normal Curves topic.

Answer 37

Stratified sampling is where we split the population into known stratas (groups of similar cases, ie males or females, grades, on-campus / off campus students). If there are 100 students studying a class, and 70% are off-campus, then when we choose the stratified sample we'd want to ensure 7 of the 10 we choose for sampling are off-campus students too.

Answer 38

Cluster sampling is where we first select groups (clusters) of cases. Each cluster is considered representative of the whole population, ie similar to another. It requires a census to be performed within that cluster (ie a school, suburb, etc.). We might randomly select 10 schools within Brisbane, and perform census within each school.

Answer 39

An SRS is where every sample of the same size has the same chance of being selected. Therefore, each case has equal chance of being selected.

Answer 40

i) Observational studies - researcher observes cases; no intervention. An example is observing where someone might give birth. ii) Experimental studies - impose treatments to observe changes. An example is plant growth trials. ONLY EXPERIMENTS CAN ESTABLISH CAUSE AND EFFECT

Answer 41

A blind experiment is when only the person receiving the treatment doesn't know what treatment they're receiving. A double blind experiment is where neither the subject nor the experimenter know who is receiving which treatment.

Answer 42

``` Mean = SIGMA(x times probability) SD = SIGMA[(x – mean)^2 times probability ] ``` Remember, SIGMA refers to the sum of. This means that we need to add up all values of the rest of the sum (in a table). Eg. for SD, we need to add up the (value of cell x (usually a quantitative title) minus the mean) squared, and then multiply that by the probability p (or whatever is in the cells).

Answer 43

A statistic is a value calculated from sample, while parameters are from population.

Answer 44

Shape = Approximately normal Mean = µ Standard deviation = σ/√n

Answer 45

Shape = Approximately normal Mean = p Standard deviation = √(pq/n)

Answer 46

Single individual: z = (y - µ) / σ Sample size of any given amount: use z = (y-bar - µ) / (σ/ √n). This is because in the second formula, you need to calculate SD, whereas in the first you don't have any need to use n and so it is just the standard deviation σ in latter part of the equation.

Answer 47

i) Form - Whether it is approximately linear or curved. ii) Direction - Positive? (high values of one accompany high values of another) or negative? (high values of one accompany low values of another). iii) Scatter - Small, moderate or large, depending on how close the points are. iv) Outliers - Any point that doesn't fit the pattern of the scatterplot.

Answer 48

y hat = b0 + b1x where the y hat indicates a prediction, b0 is the intercept, b1 is the slope. ``` The slope (b1) tells us how much y changes when x changes by 1 unit The intercept (b0) tells us the value of y when x is zero ``` For SPSS Output, b0 is the first row in coefficients, b1 is the 2nd row in coefficients. ALWAYS DEFINE y (dependant / response) and X (independent / explanatory)

Answer 49

The statistic y bar 1 - y bar 2 comes from the difference between the two sample means. The SD's are held for each individual case, under S1, S2.

Answer 50

d bar is the mean difference sd is the SD of the differences (found by entering the differences of all values into calculator and finding the SD) The rest of the equation is the same as one sample data.

STA2300 Flashcards

(75 cards)