Exam 3 Flashcards
(50 cards)
- EDA For categorical variables - 2 charts - 1st:
- BAR CHARTS
- -represent categories by ARBITRARY positions on horizontal line
- -construct bar over each category such that HEIGHT is proportional to #/% in category
- -shape, center, and spread DO NOT APPLY TO BAR CHARTS
- EDA For categorical variables - 2 charts - 2nd:
- PIE CHART
- -represent categories for ARBITRARY positions in pie
- -construct pie section such that AREA of section is proportional to #/% in category
which graph is better??
BAR CHART always better than pie
- -bc comparing bar’s heights is easier than comparing pie slice areas
- -bar charts are easier to label than pie charts
- -pie charts req. lots of colors, textures
Pictogram
picture enhanced bar chart
- -can be misleading
- -intended visual element is HEIGHT…but perceived visual element is area
For categorical variables:
p = population proportion (parameter) phat = sample proportion (Statistic)
phat = # of India. in category of interest / # of India. in sample
ex. p = proportion of all BYU students who are married
p hat = proportion of students in a random sample of 300 BYU students who are married
proportion sampling variability
- parameters typically UNKNOWN
- -bc usually impossible to know exactly what values a var. takes for every member of pop. - statistics are computed from the sample
- -vary from sample to sample due to sample variability
we want to understand how statistics behave relative to the parameter
sampling distribution of phat
–theoretical probability distribution
describes distribution of: ALL sample proportions from ALL possible random samples of the same size taken from a population
CENTER: Mean (phat) = p
SPREAD = st. dev. of sampling distribution of phat
= SD(phat) = radical ((p)*(1-p) / n)
SHAPE: approx. normal if n s large, but large depends on how close p is to .5
check: np > 10, n(1-p) > 10
- -need larger n for normality when p is close to zero of one
- one sample z confidence interval for proportions
- -C.I. estimate for the pop. proportion “p”
1. investigate sampling distribution of phat for SRS from pop. of interest
2. use sampling distribution to develop CI for p
SPREAD = radical (phat)(1 - phat) / n SHAPE = np >10, n1-p > 10
C.I. formula for proportion
phat +/- z(radical (phat1-phat)/n)
phat = point estimate of p (pop. proportion)
z* = multiplier
st. dev. part = standard error of phat = estimate using sample data, of st. dev. of sampling distribution of phat
everything after +/- = m (margin of error) - measures max. diff. that could exist btw phat and p at a specified level of confidence
= table value multiplier * standard error
4 steps for C.I. proportions
- STATE - specific parameter of interest
- PLAN - choose procedure, level of confidence
- SOLVE - collect data, check conditions, and calc. interval
- CONCLUDE - interpret C.I.
CI proportions example
US senators voted 54-46 against plan to expand background checks for gun buyers - NYT news poll taken 2013 asked 965randomly selected adults whether they favor/oppose federal law req. background checks on all potential gun buyers
–87% favored
STATE: what % of U.S. adults favor a federal law req. background checks for all potential gun buyers?
PLAN: Construct a 95% large-sample z confidence interval for p, proportion of all U.S. adults who favor background checks for potential gun buyers
phat = 87%, sample size = 965, confidence level = 95%
SOLVE: conditions:
1. SRS = yes! 965 randomly selected adults
2. sampling distribution approx. normal?
(965.87) = >10 YES, (965.13) = ?10 YES!
CI = phat +/- zradical (p1-phat)/ n
=.87 +/- 1.96radical (.87.13)/ 965 = (0.849 , 0.891)
CONCLUDE: we are 95% confident that the true proportion of US adults who favor background checks for buyers is btw. .849 and .891 in April 2013
sample size determination in proportions
margin of error:
m = zradical(p1-p) / n
—->
n = (z/m)^2 * p(1 - p*)
p* = best guess for p (bc not p hat bc haven’t taken sample yet and not p bc don’t know pop. parameter)
setting p* = .5 always produces sample size that, if anything, is a little too large (so no harm)
ex. with finding sample size with margin of error
want to estimate p with 95% confidence and margin of error of 3% - what size sample do you need?
n = (1.96 / .03)^2 * .5(1 - .5) = 1067.11 = (1068) —> ALWAYS round UP
p* look at prior info. if possible, otherwise use p* = .5 and 95% CI
if n INC. the m INC.
One sample z test for pop. proportion
- beg. with claim about value parameter
- -take SRS and compute statistic (s) value
- -use sampling distribution of stat —> compute prob. of getting stat. value if claim about parameter value is TRUE
- -if prob. unlikely, conclude that claim about parameter value is incorrect —> reject H0
STATE - specify claim about parameter of interest
PLAN - choose procedure, specify H0, Ha, alpha
SOLVE - check conditions, test stat. and p-value
CONCLUDE - compare p-value to alpha, interpret test results
conditions and test stat. formula in one sample z test for pop. proportion
conditions:
- SRS?
- Normality? np > 10, n(1 - p) > 10
test stat.
z = (phat - p0) / radical (p0(1 - p0)) / n
pval < alpha = reject H0 = statistically significant
- Role-type classifications; EDA or C to Q data
# of variables 1 = patter of interest: distribution # of variables 2 (for each indiv.) = patter of interest: relationship (want to study relationship btw variables using visual displays and numerical summaries)
relationships
goals: characterize relationship
- -predict one from other
- -investigate cause-effect relationship
if prediction or cause-effect analysis is the goal, one variable is the RESPONSE and one is the EXPLANATORY
Y - response = outcome of the study
X - explanatory = used to predict or explain changes in response variable
response and explanatory variables chart
RESPONSE
categorical. quantitative
EXPLANATORY. cat. C - C C - Q
quant. Q - C Q - Q
C-Q and Q - Q important in this class
whether women more talkative than men?
–explanatory = gender (categorical) and response = level of talkativeness (quantitative)
= C - Q
C - Q
categorical explanatory variable and quantitative response variable
–visual display tool: side by side box plots
–numerical summary tool: 5 # summary or 2 # summary (mean and SD) for each category
- Matched Pairs t-procedures for means
observational data:
- -Individuals grouped in sets of 2
- -1 individual. in each set has 1 of 2 conditions to be compared
experimental data
- -units come in sets of 2 (twins, pairs of arms)
- -1 unit in each set randomly assigned to each of 2 treatments
one sample t-procedures for MU (in matched pairs t-procedures)
C.I.
= bar +/- t* (s / radical n)
test of significance
Ho: Mu = Mo
Ha: Mu > Mo (or
randomized block design with 2 treatments or 2 measurements
blocks (pairs)
- -2 matched individuals
- -one individual and 2 treatments
- -one individual: pre and post measurements
randomization
- -randomly assign treatments to individuals within each pair
- -randomly assign order of treatments
- -randomly select individuals
matched pairs: 2 subjects
or matched pairs: one subject, 2 treatments
mean and st. dev. are computed from the differences
procedures for mean difference: (Md)
C.I
dbar +/- t* (Sd / radical n)
test
Ho: Md = 0
Ha: Md > 0 (or < or not equal to)
t = dbar / (Sd / radical n)
state - plan - solve - conclude
C.I. example for Md (left vs. right)
have two identical knobs - one right (clockwise) turn and one left
–25 right handed students turn knob specified distance with right hand
(order of knobs random)
–time for each turn it response variable
–diff. of left-right computed and analyzed
- STATE:
- -what is the mean difference in time required for right handed students to turn a knob to the left vs. to the right - PLAN:
estimate the Md with a 95% confidence interval
3. SOLVE: data collected dbar = 13.32 seconds, n = 25, level = 95% Sd = 22.94 seconds --plot data with dot plot
conditions? SRS, YES!, Normal? Yes! - dotpot had no OUTLIERS
interval: dbar +/- t* Sd / radical n, df = 25-1 = 24
= 13.32 +/- (2.064)*(22.94 / radical 25) = 13.32 +/- 9.47
- CONCLUDE:
We are 95% confident that the true mean difference btw left and right times is btwn 3.85 and 22.79 seconds