Week 6-From crisis to crisis, Reliability and Validity part 2 Flashcards

1
Q

What is the replication crisis (unable to replicate study) a product of?

A

– Actual fraud
– Questionable practices (P-Hacking, tweaking hypothesis slightly)
– Mistakes/lack of understanding of research methods and statistics

-Could it be a problem with how we measure things? whether uncertainties over how to do it (therefore it would be a measurement crisis!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the Measurement Schmeasurement?
Flake and Fried (2020)

A

-Looked at questionable measurement practices

■ Education behaviour - Between 40% and 93% of measures reported in studies lacked reporting of validity evidence (Barry et al., 2014).

■ Emotion research - 356 measurement instances coded, 69% made no mention of the development process e.g., whether a measurement was compatible with a specific population (Weidman, Steckler & Tracy 2017).

■ This lack of consideration of measures is hugely problematic as the reader simply does not know if the scales are valid!

■ The ultimate conclusions made from a statistical analysis are highly dependent on the extent to which the measures are valid

■ If the validity of the tools is not demonstrated how can any conclusions from the study be valid? (Called the Garden of forking paths)

■ Does this produce an opportunity for p-hacking?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Garden of Forking Paths?

A

■ Flexibility in how measures are used

■ A ten-item questionnaire can technically be summarised in 1023 different ways (This means there are lots of paths to explore to find one that leads you to the answer you want!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Measurements Schmeasurement: How are there variations in questionnaires?

A

■ The Hamilton rating scales for depression has multiple versions (all with different psychometric properties)
– 6-item
– 17-item
– 21-item
– 24-item
– 29-item
■ It is important to be clear which version was used, so people can replicate the research and assess the validity of your measure (i.e., be transparent!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is the Garden of forking paths relevant to cognitive tasks?

A

■ Analytic flexibility still present
■ The addiction Stroop

BEER WINE GIN CIDER
BRIDGE TREE ROAD POND
-Have to say the colour and the word and there are similar words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Garden of forking paths: What did Jones et al. (2021) find with analytic flexibility with the computerised alcohol Stroop?

A

Method decisions;
* response (key press vs. voice),
* number of drug-related stimuli used,
* number of stimulus repetitions,
* design (block vs. mixed)

Analysis decisions:
* upper- and lower-bound reaction time cut-offs, removal of individual reaction times based on
* standard error cut-offs
* removal of participants based on overall performance
* type of outcome used
* removal of errors

■ 1,451,520 different possible designs of the computerised alcohol Stroop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the Garden of forking paths affect us?

A

■ Selecting a scale, you need to consider the psychometric properties of the scale and whether it is valid in your population of interest.

When writing your method section you should be explicit in terms of (i.e., make it clear to the reader):
■ the version used
■ number of items
■ response scales (e.g. 1-7 Likert scale)
■ Its reliability from past research or even in your sample!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you choose a scale?

A

■ Find validation papers.
– Check whether the scale has been validated for use with your population of interest.

■ Has the scale got good reliability?
– Check whether the scale is reliable in your population (check the Cronbach’s alpha)

■ Look at the number of citations.
– Is the scale still being cited today?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why does low-reliability matter?

A

■ Low reliability limits your ability to find significant associations

■ Even with two measures with ‘excellent’ reliability of .9, a true correlation of .5 is reduced to .45.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True or false: Internal consistency relates to just questionnaires

A

FALSE You can also report internal consistency for cognitive tasks such as the Visual Probe Task and the Stroop task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is this important? Internal consistency: Cognitive tasks

A

■ Recently, Spanakis, Jones, Field, and Christiansen (2018) contrasted the psychometric properties of a basic (general alcohol words) and an upgraded (personalised pictures) Stroop task administered on a standard computer in a neutral room and on a smartphone in participants’ homes.

■ The researchers found that the Stroop task had acceptable reliability only when administered on a smartphone in a naturalistic environment and not
when completed on a computer in a neutral university room regardless of whether participants were exposed to words generally related to alcohol
(basic type) or personalised pictures of beer (upgraded type).

■ Thus, this research illustrates the importance of naturalistic settings when completing cognitive tasks as the internal reliability of the alcohol Stroop
task was acceptable only when administered on smartphones but not when administered on computers in a university setting (Spanakis, Jones, Field, & Christiansen, 2018).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you report the internal consistency for subscales? (example)

A

■ Internal reliability for the Fear of Negative Evaluation (α= .93) and Social Avoidance and Distress – New (α= .91) subscales was excellent and the internal reliability for the Social Avoidance and Distress – General (α= .83) was good.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can Cronbach’s alpha be useful in?

A

■ Cronbach’s alpha can be useful in determining errors made in scoring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the consequence of reverse scoring?

A

■ In developing scales with positively and negatively worded items is a very bad idea!

■ “But it ensures people read the question!”

■ Perhaps, but it adds systematic error, i.e. variance not caused by the construct we want to measure
– Variance is caused by slightly different ways people interpret things that are negatively vs. positively worded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are smarter ways to ensure people are paying attention?

A

■ Add an attention check question instead

■ “Respond with Strongly agree to this question”

■ Delete anyone who doesn’t do this!

■ Avoids adding systematic error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Internal consistency: What can happen if there is unacceptable reliability?

A

■ If a measure does not have acceptable reliability, a researcher may stop using it but there are also other options. For instance, a researcher may consider removing a particular item on the scale if the Cronbach’s alpha increases once that item is removed.

■ As previously noted, we want the alpha coefficient to be greater than .70. If the coefficient is lower, we could look at ‘Cronbach’s alpha if item deleted’. This is an important diagnostic tool - this column indicates whether deleting an individual item would enhance the reliability of the scale.

■ We may consider removing a particular item on the scale if the Cronbach’s alpha increases once that item is removed

17
Q

What is Syntax?

A

-It calculates the total scores

■ Once you have collected data using a psychometrically sound instrument, you often need to compute the total score (or if there are subscales, total scores).

■ People often think syntax looks hard but it is quite easy to write.

18
Q

Syntax: What is Reverse Scoring?

A

■ To reverse an item you need to subtract that item from one more than the highest possible score.
■ For example, if you have a scale from 1 to 4, to reverse an item you would subtract a score that an individual gave for that item from 5.
– A score of 1 on a 1-4 scale is the lowest score - when we reverse that score (subtract it from 5), we get a score of 4.
– A score of 4 on a 1-4 scale is the highest score – when we reverse that score (subtract it from 5), we get a score of 1.

■ Or if you have a 1 to 7 scale, to reverse an item you would subtract a score that an individual gave for that item from 8.

19
Q

Give a Summary of this lecture

A

■ Selecting a scale with good psychometric properties is important.

■ Sometimes there are issues with a selected measure. Researchers may consider removing a particular item on the scale if the Cronbach’s alpha
increases once that item is removed.

■ Reliability is very important as with decreases in measurement reliability, the range of observable correlations decreases.
– For example, two perfectly correlated measures will only show an association of .7 if both measures sit at the level of reliability considered
acceptable (.7).