Module 6 Flashcards
(17 cards)
Measurement (also known as operationalization)
: means turning abstract conceptual variables into measurable observations. The measurement instruments of variables can take many forms, and these forms depend to a large extent on the research strategy used (survey research, archival research, or experimental research).
Importantly, regardless of the research strategy, -> the quality of measurement instruments should be guaranteed. Without high-quality measurement instruments, at the end of the day, we cannot draw solid conclusions from a research study.
So how do we assess, and ensure, the quality of a study’s measurement instruments?
Two important criteria are:
Measurement reliability and Measurement validity.
In evaluating a measurement instrument, you always need to consider both. When a measure satisfies both criteria, we say it has construct validity.
Measurement reliability
In a research setting, the term measurement reliability refers to the degree to which multiple measurements give the same result.
Measurement validity
In a research setting, validity refers to the degree to which the scores on a measure represent the variable they are intended to.
Demonstrating measurement reliability
A highly reliable measure produces similar results under similar conditions.
All things being equal, repeating a measure should produce similar findings. So, how can we assess or demonstrate the reliability of measures?
We should think about the different ways in which we can “repeat” our measure to see if the results are similar.
Three ways to do so are:
(i) Test-retest reliability:
(ii) Inter-rater reliability
(iii) Internal consistency
(i) Test-retest reliability: (hint : retaking a test)
Test-retest reliability is the degree of agreement between the results when the same measure is repeated sometime later (under the same conditions).
For Example: Suppose you take an IQ test and take another two weeks later. To what extent do those measures correspond? A high test-retest reliability can be demonstrated through a high correlation coefficient between the same measure at two points in time (in this case, the two IQ measures).
(ii) Inter-rater reliability (hint: asking someone else to judge)
Inter-rater reliability is the degree of agreement between the results when (at least) two people (“raters”) administer the measure to the same subject (under the same conditions).
For Example: Suppose we study the effect of user-friendliness of online shops on their sales. We measure the user-friendliness of online shops by having a “judge” rate each online shop in terms of whether it is user-friendly or not. If we let two people judge the user-friendliness of an online shop (that is, we “repeat” the measure), to what extent do their judgments correspond? High inter-rater reliability can, for example, be demonstrated through a high percentage of agreement between the “judges.”
(iii) Internal consistency (hint: asking different tests from one judge)
Internal consistency is the degree of agreement between a single measurement instrument’s different questions (also referred to as items).
Example: We could measure someone’s trust in online banking via three questions rather than just one. Internal consistency measures to what extent respondents’ answers to these three questions correspond.
A high internal consistency can be demonstrated by calculating Cronbach’s alpha.
Demonstrating internal consistency via Cronbach’s alpha
- Cronbach’s alpha is a statistic derived from pairwise correlations between items that are supposed to measure the same construct.
Below explains the intuition behind the formula to calculate Cronbach’s alpha:
why does a high Cronbach’s alpha demonstrate that a measurement instrument that consists of multiple items is reliable?
Because if the average of multiple inter item correlation is low the Alpha is low and if it is high then the alpha is high
a = kr/1+(k-1)r
r = average inter item correlation. K = number of items
Cronbach’s alpha scale
Cronbach’s alpha is mesured from a Zero to One is the scale of cronbach alpha if ts 0 no correlation if its one perfectly correlated so knowing the value of one response provide complete information about the other item.
> 0.9 = Excellent
0.8 = Good
0.7 = Acceptable
0.6 = Questionable
0.5 = Poor
< 0.5 = Unacceptable
In cases of a negative alpha it often happens when one or more items is worded in the opposite direction so you need to appropriately reverse code the item
Having too much of a high alpha is not always good it sometimes can mean that the items are redundant
INTERPRETATIONS OF CRONBACH’S ALPHA FROM SPSS:
The reliability of an item is acceptable [As a rule of thumb, Cronbach’s alpha should be larger than 0.70] if the cronbach alpha is at least acceptable and the item correlation is > 0.3 , If not
We can check in the output if the cronbach alpha can increase by dropping an item from the measurement instrument ,
- However do not delete an item to marginally increase cronbach’s alpha.
- The draw back of dropping items is that we are losing content
- Never drop more than one item at once Drop items sequentially
First column used when items used on the same scale second one used -> when items are not using the same scale (eg 5 scale)
Any item with a correlation lower than 0.3 is not okay and the last column of the second table it tells us how much the cronbach alpha would increase if an item is
After that the retained items can be combines into a single scaled value : ,
Combining the retained items into a single scaled value.
If items were all on teh same scale : (find the average)
x = (z1+x2+…+xn)/n
if items were on various different scales: (standardise the items values, then average the standardised items)
x = X1,std = x1-X_1/(stdX1)
Demonstrating measurement validity
Measurement validity is the extent to which a measure represents the variable it is intended to. A measure can be highly reliable but may have little validity.
An absurd example may make this clear:
Imagine someone who believes that the length of people’s index fingers reflects their self-esteem. He therefore tries to measure self-esteem by holding a ruler up to people’s index fingers. Although this measure would be highly reliable, it would have no measurement validity. The fact that one person’s index finger is a centimetre longer than someone else’s index finger provides no information about who has more self-esteem.
Another example:
A balance scale is not a valid measurement instrument to measure people’s height, even though the same number would appear every time the same person steps on the balance scale (thus indicating high measurement reliability). However, a balance scale is a valid measurement instrument to measure people’s weight
To evaluate (or demonstrate) the validity of measurement instruments,
-> assess (or argue) to what extent the items in the measurement instrument adequately represent the construct they are intended to measure.
- One way to do so is by providing precedence. This means reviewing the literature and referring to other (high-quality) studies that have used the same measurement instrument to measure this particular construct.
- If precedence does not exist (e.g., because an entirely new construct is studied), measurement validity can be assessed through expert judgement. A number of experts in the field can be provided with the construct definition, and asked to evaluate the appropriateness of the measurement items.
Single items measure for abstract constructs and lower measurement validity explain?
Note that single-item measures for abstract constructs tend to have low measurement validity. Single-item measures often fail to capture the full breadth and depth of an abstract construct. Abstract constructs are typically multifaceted and can have various dimensions or aspects. Using only one item limits the scope of measurement and may overlook important nuances within the construct, leading to a low measurement validity.
Internal validity
Internal validity refers to the extent to which a study can eliminate alternative explanations for the association(s) it reports.
- The less chance there is for “confounding” in a study, the higher the internal validity and the more confident we can be in the findings.
- Confounding : refers to a situation in which third factors influence the outcome of a study. In short, we can only be confident that a study is internally valid if we can rule out alternative explanations for the findings.
External validity
External validity refers to how well the findings of a study are generalizable to a broader population.
In particular, do the findings generalise:
- To other subjects (e.g., from the consumers, firms, etc. in the sample to a population)
- To other settings (e.g., from the lab to the real world; from pre-covid to post-covid times, etc.)?
Balancing internal vs. external validity
Internal and external validity can be like two sides of the same coin. Sometimes, one must make a trade-off between external and internal validity: the more applicable a study is to a broader context (external validity), the more difficult it is to control for all extraneous factors (internal validity).
The optimal study design, of course, has both internal and external validity. However, increasing one without decreasing the other is not always possible.