STA2300 Flashcards
(75 cards)
M1-4: What are quantitative, categorical and ordinal variables?
Quantitative: Take on numerical values. Can find the average (ie height, heart rate, etc.)
Categorical: Definite categories (ie male or female). Doesn’t make sense to average. May be coded on SPSS.
Ordinal: Categorical data in a set order (ie survey - disagree, neutral, agree, etc.).
M1-4: What graphs should we use for quantitative variables?
Stem and leaf plot & histogram
M1-4: What graphs should we use for Categorical variables?
Bar chart & pie chart
M1-4: What 3 features do we look at in graphs of quantitative variables (stem and leaf, boxplot & histogram)?
i) Shape - number of modes / peaks, symmetry, deviations, etc.
ii) Centre - a typical approximate value
iii) Spread - the range of values the data can take.
M1-4: What is the 5 number summary?
Minimum, Quartile 1, Median, Quartile 2, Maximum
M1-4: What characterises the Normal model?
Mean (mu) and SD (sigma) as well as bell-shaped approximation.
M1-4: z-score is the number of standard deviations the observation is above the mean. Converting to a z-score, is a process called ________. What is the formula for this?
standardising
z = (y-μ) / σ
M1-4: Converting z-scores to y is a process called _______? What is the formula for this? ** (not on formula sheet) **
unstandardising
y = μ + z σ
M1-4: What is correlation?
Measures the direction and strength of linear relationship between two quantitative variables. It is measured using the coefficient r (only if linear).
M1-4: R^2 measures what?
Strength only of a relationship between two quantitative variables. Normally expressed as a percentage.
M1-4: What is the general form of a regression line? What do the components represent?
ŷ = b0 + b1x
ŷ denotes predicted value of y
b0 is the intercept
b1 is the slope
M?? - What are the 5 guidelines to supporting P-values and conclusions?
> 10%: Insufficient evidence to support Ha (re-state Ha)
5-10%: Slight evidence to support Ha (re-state Ha)
1-5%: Moderate evidence to support Ha (re-state Ha)
0.1 - 1%: Strong evidence to support Ha (re-state Ha)
< 0.1%: Very strong evidence to support Ha (re-state Ha)
M??? - There are rows on the formula sheet main page. What does each row provide the formulas and characters for, for both hypothesis testing and Confidence Intervals?
- The first row is for proportions
- The second row is for one-sample mean
- The third line is the two-sample mean
- The last line is for paired means
M1-4: What are response and explanatory variables? What axis do they go on?
A response (dependent) variable is a particular quantity that we ask a question about in our study. We put it on the Y-AXIS.
An explanatory (independent) variable is any factor that can influence the response variable. We put it on the X-AXIS.
M1-4: What are formulas for mean and standard deviations of a binomial?
The mean µ of a binomial is np.
The SD σ of a binomial is √npq
M7: What is p-hat?
p̂ is a sample proportion statistic. It is a variable and has a distribution. Larger sample sizes means the mean stays similar, the spread gets smaller and sample proportion looks more Normal.
It is calculated by X / n, where n is the sample size and X is the number of occurrences of the desired event by sample size.
M7: What is SD(y bar)?
SD(y bar) = sigma / square root of n.
It refers to the sample standard deviation.
Used in questions like: “The annual household income in Brisbane is known to be $72000 with a standard deviation of $12000. If we randomly select 80 incomes from this population, what is the probability that the average income in the sample is more than $75000?”
M7: Law of large numbers states that as sample size increases from a population with mean µ, what happens to sample mean y¯ of observed values?
It gets closer and closer to the population mean μ.
M7: What is a standard error?
The SD of any sample proportion. It is found by the square root of (p hat x q hat / n).
So, where question is:
Suppose that 20% of a random sample of n = 64 Data Analysis students receive an A for the subject. What is the standard error of the sample proportion?
We get square root of ((0.2 x 0.8) / 64) = 0.05
M7: How would you describe the distribution of sample proportions?
The distribution of sample proportions is approximately normal with mean=p and standard error = square root of (pq / n).
M7: What is a sample proportion and how can it be identified?
It is when the question gives a p value. p and p-hat are not used in sample means (y and y hat are).
M7: What is x bar in statistics?
x-bar is used to represent the sample mean, a statistic, which is used to estimate the true population parameter, μ.
M8: The statement “there is a 95% probability that the population mean is between 350 and 400” may also mean what?
The 95% confidence interval for the population mean is (350, 400).
M8: Does increasing the sample size increase or decrease the confidence interval width, and why?
It decreases it, as it decreases the STANDARD ERROR, the statistic whereby n value is computed.