Exam II Material Flashcards

1
Q

distributions can be

A

skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

skewed data can be

A

negative or positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

distributions that are peaky or taily are called

A

kurtosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

distributions that are very flat with long tails are called

A

platykurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

distributions that are very pointy/peaky are called

A

leptokurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

distributions that are just right are called

A

mesokurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

3 ways to test to determine is data is normal

A
  1. whether data is normal enough depends on what you will do with the data
  2. there arent as many hard rules
  3. key is to justify what you are doing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are tests of normality

A

kolmogorov- smirnov and shapiro-wilk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are kolmogorov- smirnov and shapiro-wilk very sensitive to

A

n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

kolmogorov- smirnov and shapiro-wilk, between these two what is considered better

A

S-W

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are Q-Q plots

A

a good visual method for double-checking data especially for large n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when do we not consider our data normal for skewness and kurtosis

A

if skewness and kurtosis are more than 2x their standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when would we consider alternate tests for skewness and kurtosis

A

3x standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is correlation

A

describes a relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how should we do correlation by hand

A

arrange data in order of one of the quantitative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

correlations are a descriptor….

A

of how reliably a change in one variable predicts change in another variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are positive relationships

A

ones where an increase in one variable predicts an increase on the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are negative relationships

A

ones where the an increase in on variable predicts a decrease in the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

is there always a relationship in correlation

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

correlation alone cannot be used to make a

A

definitive statement about causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

correlation can be found in almost

A

everything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is the most effective way of presenting relationship data

A

scatterplots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are relationships best described by lines

A

linear relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what relationships are best described with curves

A

curvilinear relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
how can we quantify a correlation
by the pearson product moment correlation
26
the pearson correlation varies from
-1 to 1
27
what is the number in pearson with the weakest/ no correlation
0
28
whats the number for the strongest correlation
1
29
what indicates the direction of the correlation in pearson
the sign
30
positive sign means
positive correlation
31
negative sign means
negative correlation
32
assumptions of the pearson correlation (5)
1. uses two variables 2. variables are both quantitative (ratio/ interval) 3. variable relationships are linear 4. minimal skew/ no large outliers 5. must observe the whole range for each variable
33
what should you not do when working with correlation
do not bin data, use the raw scores/ values
34
in correlation set up with will be comparing two different variables for the..
same set of cases
35
in pearson correlation output p< = 0.05 means
significant
36
parametric analysis includes
pearson
37
both variables are ratio/ interval and normal
pearson
38
nonparametric analysis includes (3)
1. spearman's rank 2. kendall's tau-b 3. ETA
39
: appropriate for ordinal and skewed data
spearman's rank
40
appropriate for ordinal and skewed data, generally considered superior to Spearman (especially for small groups) and is less affected by error
kendall's tau-b
41
a special coefficient used for curvilinear relationships, particularly good for nominal by interval analyses
ETA
42
an entire, comprehensive group
population
43
a subset of the population, used to infer things about the population
sample
44
random samples are not casual or haphazard, getting truly random samples requires care
sampling
45
for characteristics of populations in regression, we might the true
n
46
do you know the mean in population in regression
you might be able to estimate
47
do you know the standard deviation in regression for population
we probably dont
48
for samples in regression we always know
n, mean, and st
49
what do regression and correlation have in common
both are about relationships between variables and work best with quantitative variables
50
regression differs from correlation in that we have explicit "........." variables used to estimate the value of some target variable
predictor
51
do you need strong evidence of causality that with correlation
yes
52
regression is primarily calculated by analyzing
error
53
a key to calculating regression is to look at the
predictive error for the y-axis variable
54
the what of what is the key to calculating the regression line
sum of squares of the error
55
the goal of regression is to find a best fit line that minimizes the
squares of the error
56
assumptions of linear regression
1. requires 2 or m ore scalar variables 2. there is one dependent variable and one or more independent variables 3. the relationships between the independent variables and dependent must be linear 4. the data must be homescedastic
57
property of a dataset having variability that is similar across it's whole range
homoskedasticity
58
opposite of homoscedastic is
heteroskedastic
59
symbol for number of observations for a sample and a population
sample- n population- n/N
60
symbol for a datum for a sample and a population
sample- x population- X
61
symbol for mean of a sample and a population
sample- x bar population- lu (mew)
62
symbol for variance for a sample and a population
sample- s2/SD2 population- sigma squared
63
symbol for standard deviation for a sample and a population
sample- s/SD population- sigma
64
what does R mean in a linear regression?
correlation between the observed values, and the ones the model predicts
65
what does R2 mean in a linear regression ?
the amount of variability in the dependent variable that is accounted for by changes in ALL the independent variables
66
what does unstandard B represent in a linear regression?
tells you the unit change in the dependent per unit change in the independent
67
what does std err represent in a linear regression?
used in calculating the t
68
what does beta tell you in a linear regression?
how strongly this variable predicts the dependent
69
t and sig in a linear regression
tells you if the variable was a significant predictor of the dependent
70
adjusted R^2 in a linear regression
If you have a lot of independent variables, you’ll get some relationships due to chance. This tries to correct for that
71
std error of the regression in a linear regression
A measure of how accurately the model predicts the dependent variable
72
population -
an entire comprehensive group
73
sample-
a subset of the population used to infer things about the population
74
sampling-
random samples are not casual or haphazard, getting truly random samples requires care
75
random sampling-
used when surveying obtains a "snapshot" of the population just because sampling is random doesn't mean that your sample is perfectly represenattive
76
random assignment-
a process used in an experiment to minimize bias in your experiment groups
77
in both random sampling and random assignment, what does increasing n do?
it will decrease the likelihood of seeing a non-representative or biased sample
78
what does probability tell us?
when events are common, vs when events are rare
79
are common outcomes statistically significant ?
no
80
what outcomes are considered "statistically significant"?
rare outcomes
81
is probability arbitrary ?
yes - 100%
82
the central limit theorem
“Regardless of the shape of the population, the shape of the sampling distribution of the mean approximates a normal curve if the sample size is large enough”
83
does the sample tell us everything about the population?
no
84
what is the criteria for the probability of obtaining any specific sample from a population to fit a normal curve?
if the sample is sufficiently large
85
is it likely to get a very extreme sample?
no but it is possible you will most likely get a mean somewhere near the actual population mean.
86
what does the sampling distribution of the mean refer to?
the probability distribution of means for all possible random samples of size n for a population
87
standard error of the mean (SEM)
describes the average amount of variability sample means have around the true population mean
88
what does a z-test do?
converts a mean to a z-score (typically sufficiently rare)
89
what magnitude is considered sufficiently rare?
greater than +_ 1.96
90
alternative hypothesis / research hypothesis (H1)
states there is something special about the population being observed
91
null hypothesis (H0)
states there is nothing special about the population being observed
92
do we ever accept H1?
NO we can only reject H0
93
what do decision rules define
precisely when you reject H0 or not
94
what do the design rules depend on?
types of study the variables tests performed your field
95
what is the significance level (alpha)?
the proportion of area under the curve considered "rare" for the purposes of your decision rule originally set as a= 0.05
96
what do we say when we do not have a significant result?
"we fail to reject" H0 this is a weak result
97
what do we say when we of have a significant result ?
we definitely "reject H0" this is a strong result
98
what do we say when we keep or reject the null
keep: H0 could be true reject: H0 is most likely false
99
one tailed vs two tailed tests-
one tail- used not very often, retain H0 for all except for one side of the curve two-tail- used more frequently, retain H0 for only middle of the curve, reject H0 for both ends
100
when should you choose a one tail test?
-if you are positive that your hypothesis could only possibly result in a change in one direction - if you are only interested in a change in one direction - must be established as an experimental and analytical protocol before any analysis occurs -if the consequences of being different in one
101
why shouldn't you choose a one-tailed test?
-if you do not have very strong justification, reviewers will be critical of your choice - sometimes seen as a sketchy way of making something look significant
102
in general, what is alpha ?
a trade off between two types of mistake the choice of what alpha is is mostly arbitrary
103
what is a type 1 error
false positive equal to alpha, decreases as alpha decreases
104
what is a type 2 error
miss/ mistake
105
what does it mean for the null if p is greater than or equal to alpha?
we retain the null it is not significant
106
what does it mean when the p is less than or equal to alpha?
we reject the null the data is significant
107