Flashcards in Research methods - Review Deck (88):
What is chronbach's alpha and what does it assess? what score on cronbach's alpha is acceptable?
the average correlation among all possible pairs of items. (easily calculated [by a computer] and reported) Cronbach's alpha can range from 0-1. SPSS will calculate it and can also indicate the change in Cronbach's if any particular item were omitter. Good (i.e., acceptable, though not outstanting reliability = .80*. the higher the better) internal reliability
What do 95% confidence interavals show/mean/represent?
they define the range of values within which the true mean is likely to lie on 95% of occasions. 95% CI indicate error of measurement (in estimation of population mean) whether means are likely to differ significantly
if the confidence interval bars for two means overlap by more than 1/4 is the difference between means likely to be significant or non-significant?
What is significance?
The significance level or 'p' is the probability that an observed effect arose by chance if there really is no effect.
What are the three types of descriptive statistics and when are they used?
* Mean (M), median & mode - tell us typical score on a continuious measure for comparing means between conditions.
* Percentages or proportions of cases that fall into particular categories (or frequencies) - compare between catefories with different no. of cases.
* Correlation coefficients (r) and regression coefficients (B and beta) - describe the nature of a linear relationship between continuous variables. direction and strength. Bs tell us about the effect in the original scales of the measurement. Betas can indicate the relative strength of different predictors.
parametric tests assume that.......?
* scores on continuous variables (that are used in the analysis) are normally distibuted
* the variability (or variance) in scores in similar for different conditions (there are tests available to assess this, including in SPSS if these assumptions are violated, then the outcome of parametric test can be misleading. If the data do deviate markedly from these assumptions, we could use a nonparametric test.
What are the broad classes/types of parametric tests?
* tests that assess the signifiance of the difference between means (t-test and ANOVAs)
* Tests that assess the significance of the linear relationship between continuous variables (correlations and regressions)
What are the assumptions of an ANOVA/t-test?
* Scores on the DV are (roughly) normally distributed) i.e., bell curve)
* Variance or variability is (roughly) similar for the different conditions(the assumption of 'homogeneity of variance')
In order to meet the homogeneity of variance assumption (i.e. in ANOVA) the levene's test should be _____________?
In what situations does heterogeneity of variance(different levels of variance between conditions) pose a greater threat to the validity of the statistical outcome?
* the n per condition is small the Ns per condition are markedly different
* the direction of skewness varies between conditions
* Heterogeneity of variance and non-normality coexist.
What is the appropriate parametric test to assess the difference between means where the IV is varied between subjects and there is one IV with 2 levels?
Independent groups t-test (or one-way ANOVA)
What is the appropriate parametric test to assess the difference between means where the IV is varied between subjects and there is one IV with 3 levels?
What is the appropriate parametric test to assess the difference between means where the IV is varied within subjects and there is one IV with 2 levels?
Paired samples t-test
What is the appropriate parametric test to assess the difference between means where the IV is varied within subjects and there is one IV with 3+ levels?
Oneway repeated measures ANOVA
What is the appropriate parametric test to assess the difference between means where the IV is varied between subjects and there are 2 IVs with 2+ levels on each?
Factorial ANOVA (2-way, 3-way etc)
What is the appropriate parametric test to assess the difference between means where the IV is varied within subjects and there are 2 IVs with 2+ levels on each?
Repeated measures ANOVA
What is the appropriate parametric test to assess the difference between means where the IV is varied both between and within subjects (i.e., mixed design) and there are 2 IVs with 2+ levels on each?
mixed ANOVA (split plot ANOVA)
What is a moderator?
The moderator is the variable (another IV) that affects the primary IV-DV relationship in which we're interested. In other words, the moderatory affects the relationshop between the key IV and the DV. We say that two IVs interact. We could just as correctly say that the IV and the moderator interact. (Moderation is closely related to interaction, however it's a more theorestical idea in that it reflects our focus in posing and answering a research question)
What is an Interaction?
The term interaction indicates that the effect of one IV varies, depending on another IV. It applies when we expect that the size and or direction of the effect of one IV will differ depending on the other IV. A difference in size means that an IV has a larger effect at one level of the 2nd IV than at the other level. A difference in direction means that an IV has opposite effects at the 2 levels of a 2nd IV. An interaction can be a conceptial proposition, but we also test the statistical significance of interactions.
The following hypotheses is suggestive one what kind of relationship between IVs?: It was predicted that information type and coping style would interact in their effects on pre-operative anxity. Specifically, it was expected that, for people with a problem-focused style, detailed infroamtion would result in lower anxiety than would routine information. In contrast for those who use avoidant coping, it was expected that routine information would result in lower anxiety than would detailed information.
What two types of effects do factorial ANOVAS (2+ IVs) test?
Main effects of each IV Interaction between the IVs. e.g., whether the difference in the mean anxiety between detailed and routine information itself differs between problem-focused people and avoidant people. That is, does the difference between detailed and routine vary between the coping styles? or, in other words, is detailed minus routine different for problem focused people than it is for avoidant people?
What is a main effect?
A test of one IV at a time A main effect refers to the overall effect of one IV, completely ignoring the other IV. For example, the test of the main effect of coping style indicates the significance of the overall difference between problem-focused and avoidant participants, collapsed or averaged across levels of information type. That is, all of the problem-focused participants are compared with all of the avoidant participants (regardless of the type of information that thye received). Ps are distinguished only according to their coping style. When an interaction is significant, main effects are sometimes of limited interest because they don't tell the whole story. We report them, but the interaction would be a central result.
What is moderation?
Moderation is closely related to interaction (moderation is only demonstrated if an interaction exists). However, it's a more theorestical idea in that it reflects our focus in posing and answering a research question. An interaction indicates that the relationship between an IV and the DV varied depending on another variable. That other variable is often referred to as the moderator. So the MODERATOR is the variable (another IV) that affects the primy IV-DV relationship in which we're interested, In other words, the moderator affects the relationship between the key IV and the DV. We say that two IVs interact. We could just as correctly say that the IV and the moderator interact.
There are two advertisments for mobile phones, one is 'transformers' and one is 'connecting people', the phone company wants to know which one is the better ad. You might think that the preferred advertisement will vary depending on the gender of the teenager. If you do think this, then you are predicting an interaction between type of advertisement and sex of teenager or in other words that.... Sex of teenager _______*fill blank* the effect of type of advertisement.
Which of the following is not a potential threat to the internal validity of a quasi-experiement? A. Instrumentation B. Generalisability C. History D. Selection E. Maturation
B. Generalisability (because this concerns external validity!)
2. A threat to internal validity of a static group comparison (non-equivalent control group) design is: A. Selection B. Instrumentation C. Testing D. Regression toward the mean E. Response bias Also, describe a static group comparison design.
A. selection Group 1 X O Group 2 O (control) why not the others?: B. Instrumentation - no reason to think this C. Testing - no repeated measures D. Regression toward the mean - repeated measures only E. response bias - no idea about this and why would it differ
Research has ________ validity when the measures provide a very good index of the phenomena that are being studied A. Internal B. External C. Construct D. Discriminant E. None of the above
A correlation of .3 between anger and aggressive behaviour: A. Indicates that anger causes agression B. Indicates that anger explains 9% of the variance in aggressive behaviour C. Indicates that reducing anger will reduce aggressive behaviour D. Shows that anger and aggression are strongly related. Although we can't be sure of the causal direction E. Shows that another variable is responsible for the ange-aggression relationship
B. Indicates that anger explains 9% of the variance in aggressive behaviour
For samples of a given size, external validity is promoted by A. Non-probability samples B. volunteer samples C. Random samples D. Quota samples E. Samples of convienience
C. Random samples
Consider these questions Q1 “How close is your relationship with your mother?” - Very close, quite close, not close, quite distant, very distant Q2 “What was the $ value of your last telephone bill?” _______ Q3 “Australia should remain a monarchy because this is the only way to ensure that the political power of the government is held in check to safeguard the rights of citizens” - strongly disagree, disagree, unsure, agree, strongly agree How would we evaluate the quality of the questions? A. Q1 contains an unwarranted assumption; Q2 is ambiguous; Q3 would be vulnerable to social desirability effects B. Q1 is satisfactory; Q2 is too sensitive - the amount of money spent on phone bills is none of our business; Q3 is satisfactory C. Q1 is too sensitive - we should not ask people questions about their mothers; Q2 is good because it asks for specific information; Q3 is leading D. Q1 contains an unwarranted assumption; Q2 requires information that the participant might not remember; Q3 is double-barrelled E. all of the questions are satifactory
D. Q1 contains an unwarranted assumption; Q2 requires information that the participant might not remember; Q3 is double-barrelled.
First consider this 2x2 between groups desgin study: Researchers investigated the effects of ‘author’s sex’ and ‘errors’ on judgements that people make about the quality of a journalist’s writing. In the study, participants were randomly allocated to one of four conditions: (i) female author and grammatical errors, (ii) female author and no grammatical errors, (iii) male author and errors, (iv) male author and no errors. Author and errors were manipulated using bogus journalist names and 4 similar articles. The participants read the newspaper article and were asked: “What is the quality of the newspaper article?” (rated on a standard rating scale). Q1. What is the purpose of question “What is the quality of the newspaper article?” A. It provides a measure of the independent variable B. It provides a measure of the dependent variable C. It provides an assessment of extraneous variables D. It is a manipulation check E. Both (C) and (D) Q2. The above design is: A. a true experimental, posttest only two-group design with one between-subjects factor and one within-subjects factor B. a true experimental, posttest only two-group design with two between-subjects factors C. a quasi-experimental, static group comparison with two between-subjects factors D. a true experimental, four-group design with one-between subjects factor and one within-subjects factor E. a 2 x 2 between-subjects factorial design
Q1: B. It provides a measure of the dependent variable Q2. E. a 2 x 2 between-subjects factorial design
The correlation coefficient for the scatter plot is most likely to be (and why) A. 1.00 B. .64 C. -.80 D. -1.00 E. Impossible to say without knowing the degrees of freedom ￼￼
C. - 80 Graph goes right to left = negative
In a study of the number of crimes per year and the number of churches in towns & cities... CRIMES and CHURCHES were found to be highly correlated, r = .69. However, when the population size (POP) of the towns and cities was taken into account, the partial correlation was .11 (nonsig). This gives us good reason to conclude that: A. There are likely to be other unstudied factors which predict CRIMES more strongly than POP or CHURCHES (??) B. CRIMES is highly related to CHURCHES but not to POP (??) C. There is not a strong linear relationship between CRIMES and CHURCHES (the correlation is strong) D. The relationship b/t CRIMES & CHURCHES is a spurious one E. At least three of these (A, B, C, D) are true
D. the relationship between Crimes and Churches is a spurious on.
Which of the following is true of the figures above? A. Figure 1 shows heteroscedasticity; Figure 2 shows a strong negative correlation, Figure 3 shows an interaction B. Figure 1 shows homoscedasticity; Figure 2 shows a strong negative correlation, Figure 3 shows main effects but no interaction C. Figure 1 shows heteroscedasticity; Figure 2 shows a weak negative correlation, Figure 3 shows an interaction D. Figure 1 shows heteroscedasticity; Figure 2 shows a moderate positive correlation, Figure 3 shows an interaction E. Figure 1 shows heteroscedasticity; Figure 2 shows a moderate positive correlation, Figure 3 shows main effects but no interaction
A. Figure 1 shows heteroscedasticity; Figure 2 shows a strong negative correlation, Figure 3 shows an interaction.
What DOES and DOESN'T the significance value (p) tell us?
DOES: tell us if a relationship exists Doesn't: tell us effect size or give any conceptual understanding, or how compelling the evidence is
What three things can tell us something about the meaning of our results?
Effect size Clinical significance Error bars
p (sig. level) depends on the number of _________?
Variance explained and the size of difference between means are examples of?
Formal effect size indices.
What is variance explained?
Proportion of total variance in the DV that is explained by the effect (IV)
What are the indices for total variance explained for A.ANOVA B. Regression and C. correlation
Eta2 (n2) R2 r2 (regression coefficient squared)
eta2 and R2 are both indices of proportion of variance explain and are the SAME THING - but which tests are they reported for by convention?
Eta2 - for ANOVA R2 - for regression
SPSS does not provide eta 2, so it is hand calculated....how?
SS (i.e., sum of sqaures) effect / SS corrected total (NOT TOTAL!!)
What is the difference between R2 and R2change ?
Both are indices of proportion explained....but R2 is is total variance explained and R2change is the change for a particular step (i.e. additional variance explained) in hierachical regression.
What is the difference between partial eta2 and eta2 (factorial ANOVA)
Partial eta2 ignores any other factors or effects in an analysis and only considers SSerror (unexplained variability) and any extra variability form the effect of IV (SSeffect). thus it can't be used to compare effect sizes (it is not standardised). Only MATTERS when there is more than one factor. (if only one factor eta2 = partial eta2) Eta2 is standardised and can be used to compare and judge effect sizes.
cohens d is a ___________ measure of the difference between ______ and is expressed in ________ units
effect size; means; standard deviation
how it cohen's d calculated?
(m1-M2)/ ___________________ (SD1 +SD2)/2 (i.e., average SD for the means)
What are the benchmarks for eta2 effect size measures? (small/medium/large)
small - .010 -.04 (1%) medium - .06 - .130 (6%) large - .14 + (14%)
What are the benchmarks for R2 effect size measures? (small/medium/large)
Small- .02 (2%) medium - .13 (13%) large - .26 (26%)
What are the benchmarks for r effect size measures? (small/medium/large)
small - .1 (1%) medium - .3 (9%) large - .5 (25%)
What are the benchmarks for d (cohens) effect size measures? (small/medium/large)
small - .20 medium - .50 large - .80
What are the problems with relying on a table/conventional benchmarks for effect size measures?
* Never intened them to be indiscriminately applied
* Mainly intended for new areas of research and deciding on the necessary sample size (power analysis) when study being planned.
* were based on 'typically' observed effect sizes for various analyses across numerous behavioural size areas, not on what is important/useful
* they are non-equivalent for different indices (e.g., eta 2 and R2 - even though they are effectively the SAME!)
small variance explained by interactions can still be _______
What are the three types of error bars?
* standard deviations (indicated variability in scores- least common)
* standard erros (allows statistical inference)
* 95% confidenc eintervals (allows statistical inference)
Briefly outline the ethics of research with humans.
Participants' rights outweigh other considerations. Participation must be voluntary Consent must be informed Participants may withdraw at any time Participants response and performance must remain confidential - they also must remain anonymous Harm to participants should be minimised and provisions to adress any stress etc should be done Must have the approval of ethics committee Participants should be debriefed
What are closed-ended questions?
questions that provide respondents ith a fixed set of alternatives from which to choose
What are rating scales (for questionaires)?
Rating scales include Likert scales. Usually 4-7 response alternatives e.g., 5-point rating scale from 1 "strongly disagree" to 5" strongly agree. Strictly speaking, Likert scales assess the extent of the agreement with a statement and so, have anchors such as "Agree" and "Disagree". However, there are many variations on this theme, such as "approve- disapprove" or "Satisfied- dissatisfied", which are simply referred to as 'rating scales' (or Likert-type scales)
What is the difference between unipolar, bipolar and sematic differential scales?
Unipolar - a rating (e.g., 1-6) towards either ends of one construct e.g. satisfaction (1 not at all satisfied -----6 very satisfied Bipolar - a rating (e.g.., 1-6) towards either ends of two constructs e.g., 1 very unsatisfied to 6 very satisfied. Semantic Differential Scales - anchors are polar opposite adjectives, e.g., healthy-sickly, generous-selfish etc
What are some qualities that make questionaires "good"?
Ensure that all possible response alternatives are available Use clear verbal anchors for rating scales Use more, rather than fewer, question to assess a construct (such as self-esteem). (Reliability tends to be greater with more questions because, if there's a poor question it has less effect._ NOTE: response alternatives must match the question. Use existing, validated, questionnaires where possible If designing your own questions, pilot or pre-test them (begin asking friends to read the questions: what appears to be a model of clarity to the author can seem exceedingly muddy to the naive respondent) Determining and reporting reliabilities for questionnaire data is a standard procedure (where relevant or appropriate, e.g., it usually makes no sense to determine reliabilities for basic demographic information)
What are some potential threats/problems with measurement validity and/or reliability?
Response biases - social desirability, response sets (acquiescent response set, deviation response set) Other problems Unclear question or instructions (instructions are important) Appropriate response not available Imprecision of many ordinal scales (differences in interpretation) Question is double-barrelled (or, much worse, quadruple-barrelled!) Respondent doesn't know the answer (e.g., can't remember) Suggestibility: the respondent did not really hold a view on an issue until the questionnaire suggested it. A question might be based on an inappropriate and unwarranted assumptions about respondents (e.g., not applicable) Leading questions Very general questions Sensitivity of an issue.
What is the problem with this questionaire question: I am not againt Australia remaining a monarchy: strongly agree agree don't know disagree
Double negative; unbalanced scale
What is the problem with this questionaire question: Experts agree that compulsory vehicle inspection would save lives. To what extent do you agree that compulsory vehicle inspection should be instituted in this state? (leading) 1 strongly agree 2 agree 3 neither agree nor disagree 4 disagree 5 strongly disagree
What is the problem with this questionaire question?: Finacial assistance for the arts should be reduced so that the funds could be used to attract business to the state 1 strongly agree 2 agree 3 neither agree nor disagree 4 disagree 5 strongly disagree
What is the problem with this questionaire question?: Have you ever left a small child unattended in a car for 30minutes or more? Yes No
What is the problem with this questionaire question?: Are you Australian? Yes No
What does 'Australian" mean?
What is the problem with this questionaire question?: Is your occupation....___blue collar or ___white collar
No other categories? Vague categories. Respondent might not work.
What is the problem with this questionaire question?: How many times have you shopped in Rundle Mall in the past six months? _____times
Will participants remember?
What is the problem with this questionaire question?: Are you currently employed? ___Employed ____Unemployed
Response options don't match the question & is this an exhaustive list? student? retired? parent?
What is the problem with these questionaire question?: What do you usually have for breakfast? Do you eat a healthy diet? ______
'usually' is too vague, what does 'healthy' mean?
What is Standard Deviation (SD)?
The SD can be thought of as the average amount by which any score (in a set of scores) differs from the mean. (It's actually a bit bigger than the strict average). It tells us how typical or atypical the mean is of scores within a group, and how much, on average, scores deviate from the mean (above and below). It uses the same unit of measurement as that which was used to measure the variable. SD is the square root of the variance
What is effect size?
Effect size simply refers to the strength (often called a magnitude) of an effect (or of a relationship or difference), i.e., how big the effect is. It usually tells us something about the importance of a relationship (or of a difference between groups, which is also a relationship)
Can the difference between means be intepreted as an effect size?
Yes, if we are comparing means for different conditions (or times), we can assess effect size just by looking at the degree of difference between means. (can also use cohen's d) For example, if we reduce the number of times that autistic children engage in head-banging from a baseline mean of 26 times per day to a post-treatment mean of 1 time per day, this is much more impressive (larger effect of treatment) and useful outcome than if we'd reduced it from 26-20 times a day.
Does the size of a correlation coefficient indicate the strength of the linear relationship between two variables?
Yes, the size of the correlation indicates the strength of the relationship but its significance level does not. (because very small correlations can attain significance with sufficient participants!)
What is reliability?
Consistency or stability of a measure. Logically, if a measure is valid (a good/accurate measure), it should be reliable.
What is internal reliability?
The degree of correspondence/agreement between individual items that make up a measure of a single characteristic. Logically, if a number of items are meant to measuring the same construct, there should be a high level of agreement between them. Cronbach's Alpha: the average correlation questionnaires (easily calculated [by a computer] and reported) Cronbach's alpha can range from 0-1. SPSS will calculate it and can also indicate the change in Cronbach's if any particular item were omitter. Good (i.e., acceptable, though not outstanting reliability = .80*. the higher the better)
What is item-total correlation (reliability)?
The correlations between each item and the total score (of a measure).
What is split-half reliability?
The correlation between total scores for the two halves of a test of measure.
What is test-retest reliability?
the correlation between scores for 2 administrations of the measure.
What is inter-score or inter-rate reliability?
The correspondence or agreement between the scores of two independent observers or judges. Used when judgement is used by the score or assessor, or where assessment requires skill or training.
What is a Factorial design?
Where 2 or mor ecategorical IVs are manipulated (or measured if it is a quasi-experiment) AND all combinations of the IVs are tested. allows us to see both the separate and combined effects of the IVs.
what is "nominal' level of measurement?
Categorical information (e.g.., employer status, sex)
what is "Ordinal' level of measurement?
when the 'quantification' of th econstruct is imprecise, but we do assume that the scores indicate something about the relative amounts of a particular characteristis, that is, that we could rank scores in a meaningful order (e.g., 0-5 stars on a movie)
What is 'interval' level of measurement?
A more precise measurment, where the difference (or distance or interval) between any two adjacent scores in the same and the unit of measurment does have a universal maning and indicates the same amount of a property across different participants or situation. However the scale does not have an absolute zero. e.g., IQ score, temperatur
what is "ratio' level of measurement?
Precise measurement/quantification whereby 0 means NONE/abscense of the characteristic e.g., heartrate, number of children, number of words recalled
What are nonparametric tests used for?
Continuous DVS that use an ORDINAL DEPENDANT VARIABLES (i.e., they can be put in rank order but there is not equal distance between each value and no true 0) + data that does not meet the assumptions for patametric tests (i.e., skewed data)
chi-squares are used for what type of dependant variable?
The Phi is an effect size measure for what test and how is it intepreted?
chi-square - intepretted like a correlation.
nonparametric tests for continuious DV use _____ rather than ______
ranks; raw scores