Test 2 Flashcards

(79 cards)

1
Q

Define Reliability

A

the degree to which test scores for an individual test taker or group of test takers are consistent over repeated applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

reliability coefficent

A

the results obtained from the statistical evaluation of reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

define systematic error

A

when a single source of error always increases or decreases the true score by the same amount

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

define true score

A

the amount of the observed score that truly represents what you are intending to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define error component

A

the number of other variables that can impact the observed score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is internal consistency

A

it measures the reliability of test scores on the number of items on the test and the intercorrelation among the items. therefore it compares each item to every other item
- How related the items (or groups of items) on the are to one another. This is whether knowledge on how a person answered one item on the test would give you information that would help you correctly predict how he or she answered another item on the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the bench mark number for internal consistency

A

.30/ .70 . 70% true score and 30% error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is item-total correlations

A

the correlation of the item with the remainder of the items (the percentage of error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

define average intercorrelation

A

the extent to which each item represents the observation of the same thing observed (connection between the items)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a split half

A

refers to determining a correlation between the first half of the measurement and the second half of the measurement
o divide the test into two halves and then compare the set of individual test scores on the first half with the set of individual test scores on the second half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the odd even method

A

refers to the correlation between even items and odd items of a measurement tool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

advantages and disadvantages of the split half/odd-even method

A

Advantages:

  • simplest method- easy to perform
  • time and cost effective
  • because you only need one administration

Disadvantages

  • many ways of splitting (odd-even, 1st vs 2nd half, random)
  • each split yields a somewhat different reliability estimate
  • which is the real reliability of the test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is test-retest reliability

A

measured by computing the correlation coefficient between scores of two administration
the same test is administered to the same group of people but there is a certain amount of time in between each test administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the benchmark number for test - retest reliability

A

.50 and above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

define practice effects

A

occurs when test takers benefit from taking the test the first time (practice) which enables them to solve problems more quickly and correctly the second time they do the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

define memory effects

A

which means that a respondent may recall the answers from the original test, therefore inflating the reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is interrater reliability

A
  • Interrater reliability means that if two different rater scored the scale using the scoring rules, they should attain the same result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how is interrater reliability measured?

A

measured by % of agreement between raters or computer the correlation coefficient between scores of two raters for the set of respondents (the raters’ scoring is the source of error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

intrascorer reliability

A

whether each clinician was consistent in the way he or she assigned scored from test to test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the benchmark score for interrater reliability

A
  • Here the criterion of acceptability is pretty high (ex. a correlation of at least .80 or agreement above 75%), but what is considered acceptable will vary from situation to situation

.80 and above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

define parallel/alternative forms method

A

refers to the administration of two alternate forms of the same measurement device and then comparing the scores.
- Both forms of the tests are given to the same person and then you compare the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

advantages and disadvantages of parallel/alternative forms method

A

Advantages
- eliminates the problem of memory effect
- reactivity effects (ie. Experience of taking the test) are also partially controlled
- can address a wider array of sampling of the entire domain than the test-retest method
possible disadvantages
- are the two forms of the test actually measuring the same thing (same construct)
- more expensive because more man power is required to make two tests
- requires additional work to develop two measurement tools because two tests have to be created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is generlizability theory

A
  • theory of measurement that attempts to determine the multiple sources of consistency and inconsistency- known as factors or facets
  • Identifies both systematic and random sources of inconsistency allow for the evaluation of interaction from different types of error sources
  • Looks at all possible sources of errors and then separates each source of error and evaluates its impact on reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is are the limitations of the generalizability theory

A
  • u cannot measure every single source of error
  • tougher to complete generalizability theory because a lot of the work has to be done upfront. A lot of the upfront work is done in regard to what data to collect, how much data to collect, what measures. All these sources of error have to thought about upfront. With CTT you can do the test first and then look at the factors with regards to reliability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what is standard error of measurement (SEM)
an estimate of how much the observed test score might different from the true test score a statistic that obtains the confidence interval for many obtained scores. It represents the hypothetical distribution we would have if someone took a test an infinite # of times
26
how to calculation SEM
SD(sq root of 1 minus reliability)
27
define confidence interval
Give an estimate of how much error is likely to exist in an individual’s observed score, that is, how big the difference between the individual’s observed score and his or her true score is likely to be
28
what is cronbachs alpha
coefficient of internal consistency- commonly used. Looks at interval scale. Determines which questions on the scale are interrelated. Used for test questions such as rating scales that have more than one correct answer
29
what is Kuder Richardson (KR-20)
used for dichotomous items (ex. 0 or 1, true or false). Dichotomous scale. Ordinal in nature. Used when There is either a right or wrong answer. There is only one correct answer
30
what is spearman brown
used in split- half analysis is used to adjust the reliability coefficient. It is designed to estimate what the reliability would be if the tests had not been cut in half
31
what is cohens kappa
inter-rating reliability
32
what is the benchmark for split-half
.70 and above
33
what is the benchmark for parallel or alternative form
.70 and above
34
define heterogeneity of items
the greater the heterogeneity (differences in the kind of questions or difficulty of the question) of the items, the greater the change for low reliability correlation coefficients. Ex. test contains multiple choice, true and false, fill in the blank, etc
35
define homogeneity of items
the greater the homogeneity (similarity in the kind of questions or difficulty of the question) of the items, the greater the chance for high reliability correlation coefficients. The similarity of the questions ex. test contains only multiple choice
36
define validity
refers to measuring what we intended to measure, can we do it accurately
37
what does the validity coefficient represent
the amount or strength of evidence of validity based on the relationship of the test and criterion
38
define construct validity
gradual accumulation of evidence that the scores on the test relate to observable behaviours in the way predicted by the underlying theory involves comparing a new measure to an existing, valid measure Usually existing valid measures don’t exist. That is often why the new scale is being created in the first place
39
what is evidence based on test content
Involves logically examining and evaluating the content of a test (including the rest questions, format, wording, and tasks required for test takers) to determine the extent to which the content is representative of the concepts that the test is designed to measure without
40
what is evidence based on relations to other variables
Involves correlating test scores with other measures to determine whether those scores are related to other measures to which we would expect them to relate. We would also like to know if the test measures are not related to other measures to which we would not expect them to relate to
41
what is evidence based on internal structure
Focuses on whether the conceptual framework used in test development could be demonstrated using appropriate analytical techniques
42
what is evidence based on response processes
Involves observing test takers as they respond to the test or interviewing them when they complete the test
43
what is evidenced based on consequences of testing
Differentiating between intended and unintended consequences of testing
44
define content validity
is when we evaluate the test and we look at things such as test questions, the format, the scoring and the wording
45
define psychological construct
traits or characteristics that tests are designed to measure (usually not observable)
46
define concrete construct
attirbute or characteristic, make it easier to define and also created items for. these are easily observable when compared to abstract characteristics or traits. ex. playing a piano
47
define abstract construct
characteristics or attributes that are harder to observe for instance intelligence
48
what is construct explication
process of providing a detailed description of the relationship between specific behaviours and abstract constructs. the process of trying to figure out what items are inside or outside the test construct/content
49
the 3 steps of construct explication
1. identify behaviours related to the construct 2. identify other constructs and decide whether they are related or unrelated to the construct being measured 3. identify behaviours that are related to the additional constructs and determine if these are related or unrelated to the construct being measured
50
define nomological network
a method for defining a construct by illustrating its relation to as many other constructs and behaviours as possible
51
define content validity ratio (CVR)
provides a measure of agreement among the judges/experts
52
define face validity
Face validity answers the question “does it appear to the test taker that the question on the tests are related to the purpose for which the test is given” Face validity is only concerned with how test takers perceive the appropriateness of the test
53
advantages of face validity
- If the respondent knows what information we are looking for, they can use “context” to help interpret the questions and provide more useful, accurate answers - The respondent can make an educated decision
54
disadvantages of face validity
- If the respondent knows what information we are looking for, they might try to bend & shape their answers to what they think we want - Ie. Faking good or faking bad
55
define convergent validity
the extent to which the scale correlates with measures of the same or related concepts
56
define divergent/discriminant validity
the extent to which the measure does not correlated with measures of unrelated or distinct concepts
57
what is the multitrat-multimethod (MMTMM) matrix method
The researcher chooses two or more constructs that are unrelated in theory and two ore more types of test to measure each of the constructs used to assess a test’s construct
58
define heterotrait heteromethod
multiple traits and multiple ways of assessing those traits
59
define heterotrait monomethod
more than one trait acorss the same way of assessment
60
define monotrait-heteromethod correlations
same trait measured by two different methods
61
define monotrait-monomethod correlation
same trait using the same method
62
list of the multitrait- multi method matrix pairs from highest to lowest correlation
``` Highest- Monotrait monomethod monotrait heteromethod heterotrait monomethod heterotrait heteromethod -Lowest ```
63
define factor
a combination of variables that are intercorrelated and thus measure the same characteristics
64
define factor analysis
statistical techniques used to analyze patterns. of correlations among different variables and measures - Factor analysis looks at the relationship between all the factors and creates groups of factors based on the relationships between the factors
65
what is the goal of factor analysis
to reduce the numbers of dimensions needed to describe data derived from a large number of data
66
how is factor analysis done
a series of mathematical calculations, designed to extract patterns of intercorrelations among a set of variables (ex. division questions are correlated with division question and multiplication questions with multiplication)
67
what is the subjective element to factor analysis
There is a subjective element to factor analysis because once the statistical results have been computed the researcher must review the grouping to see if they make sense based on the construct the test items were designed to measure
68
define exploratory factor analysis
Researchers do not propose a formal hypothesis about the factors that underlie a set of test scores, but instead use the procedure broadly to help identify underlying components
69
define confirmatory factor analysis
The researcher specifies in advance what they believe the factor structure of their data should look like and then statistically tests how well that model actually fits the data The researcher relies on existing theoretical or empirical knowledge to design the model that is being tested Evidence for construct validity would be provided if the results from the factor analysis fit the model created by the researcher. If not the model should be revised and retested
70
define kaiser guttman criteria
retains factors with eigenvalues greater than 1.0 | to be considered a factor is much have a eigenvalue greater than 1.0
71
define eigenvalue
is the calculation that go into a factor
72
define scree plot
plot factors on the horizontal axis and eigenvalues on the vertical axis. look for an elbow
73
advantages of factor analysis
- Simplifies interpretation | - Can learn more about the composition of variables
74
disadvantages of factor analysis
- Do the combining of factors capture the essential aspects of what is being measured? - Are the factors generalizable to other populations (ex. different cultures, gender, individuals with disabilities)
75
define criterion related validity
measures the relationship between the predictor and the criterion, and the accuracy with which the predictor is able to predict performance on the criterion
76
define concurrent criterion related validity
criterion date are collected before or at the same time that the predictor is administered
77
define predictive criterion related validity
criterion data are collected after the predictor is administered
78
define subjective criteria
based upon a individuals judgement ex. peer ratings
79
define objective criteria
based upon specific measurement (how fast someone is, how many absence from class)