First Exam Flashcards

(105 cards)

1
Q

Reliability

A

How consistent is the entire instrument, the closer it is to 1, the more reliable the instrument is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Psychometric theory looks at 2 things

A

The entire test (reliability) and the other side looks at item quality (non-dichotomous & dichotomous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you construct an instrument?

A
  • looking at the entire test (reliability) and item quality (non-dichotomous & dichotomous)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Entire test had 4 different types of reliability

A

-inter-rater, test-retest, internal consistency, parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Non-dichotomous and how it relates to variance

A

-you want higher variance to have a better normal curve, the more items you increase the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Validity

A

Accuracy, all of probability is based on infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reliability and error

A

Error can affect the consistency of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 types of error

A

Systematic error & random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Systematic error

A

Errors that occurs consistently because of a particular characteristic of the person being tested (reading proficiency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Random error

A

Errors that occur by chance (black out, distraction) (distraction) (more common)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Different types of random error

A

Content differences, subjective scoring and temporal instability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Content differences (content based)

A

Non-standardized administrations (may inadvertently speak differently when administering test) ex: court ordered testing or a child using restroom during test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Subjective scoring

A

Raters difference- subjective viewing of the client maybe different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Temporal instability

A

One day test taker had the flu ex: things change day to day, the first day of testing went good, but second day there was an earthquake their performance went down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some ways to decrease measurement error

A

Writing clear items, making test instructions easily understood, adhering closely to the prescribed conditions for administering a instrument, training raters on themselves, make subjective scoring rules as explicit as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where does most measurement error come from?

A

It has to do with the person administering the test, but it will change as one becomes more experienced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Test-retest reliability (coefficient of stability)

A

When you take a single group of subjects and you repeatedly test on the same instrument at different times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the gold standard for test-retest reliability

A

2 weeks between the first test and the second test, this is where you get optimal test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In test-retest reliability what is the difference between the shorter and longer gap?

A

Longer the time gap lower correlation, shorter the gap we get more similar factors that contribute to the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Artificial inflation

A

When researchers use the shorter gap to get a better correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Parallel forms reliability

A

Assessing if two forms of the same instrument produce similar results when testing the same person (sometimes hard to achieve)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is form A & form B (parallel forms reliability)

A

How reliable they are with one another, the have these two spots to eliminate the practice affects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a key problem with parallel forms reliability?

A

Difficult to randomly divide and hard to create large number of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a key problem with parallel forms reliability?

A

Difficult to randomly divide and hard to create large number of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is a key part of parallel forms reliability?
Developing a large number of items and then randomly divide them into test
26
Coefficient of equivalence
How correlated the scores are of a persons taking similar tests with two different forms of
27
When should the two forms for parallel forms reliability be sent out?
They should be administered at least 2 weeks apart
28
What happens if correlations between two testing is lower than .2?
There is significant measurement error
29
What happens if you administer the forms on the same day for parallel forms reliability?
test may reflect state rather than trait and you will not have a statically significant difference
30
Internal consistency reliability
How related items are within the entire scale and within the subscales
31
What do we want with internal consistency reliability?
The content should be similar for the reliability to be high, you need adequate number of items and want the item to underlie appropriately a particular construct
32
Different types of internal consistency reliability
Split half reliability, Kudar Richardson #20 (KR 20), Cronbach Alpha
33
Split half reliability
Split the examinees scores into halves and then correlate the scores of both halves
34
How does split half reliability look like in speeded tests?
May produce artificially high internal consistency for odd and even split, if he/she runs out of time
35
How to get good idea of split half reliability
They will take the odd questions and split those in half with the even questions, this will allow a better idea of split half reliability
36
What are some problems with split half reliability?
Natural order of test taking (content is not the same with the first half as the second half) & Issue of a timed test (some people don’t get to the second half)
37
Kudar Richardson #20 (KR 20)
Formula that allows for split half reliability that is done under the assumption that the questions are scrambled
38
How does Kudar Richardson stop a confound in your test?
By stopping the natural order
39
The drawbacks of KR 20
Only works with dichotomous scaling systems (only allows for right or wrong question responses)
40
Cronbachs Alpha
Can be used to assess internal consistency for those tests that have different scoring systems
41
When can and how can cronbachs alpha be used?
Can be used on any scoring system and allows for scrambling of the questions, used more than any other measure of internal consistency, equivalent to all split half correlations
42
Internal consistency & cronbachs alpha
High coefficient alpha does not always mean that you are measuring only one factor or latent construct (unidimensionality)
43
What do we assume in internal consistency?
We assume there is unidimensionality but more tests are inadvertently dimensional or multidimensional
44
What do dimensional or multidimensional tests look like?
It can mean more than one factor is being measured (ex. AP history test measures knowledge, but also writing ability)
45
How will cronbachs alpha be increased or artificially inflated?
If test takers are homogeneous group, need heterogeneity in the group (it will be more accurate if it is a general group of people)
46
Interrater reliability
Assessing the degree of consistency between multiple raters
47
2 kinds of interrater reliability
Kendall’s coefficient of concordance & Cohens Kappa
48
Kendall’s coefficient of concordance
Degree of consistency amongst raters that rank order people/objects Rank order consistency: miss universe, rank people in an order of 1,2,3,4,5 among different judges to see if they correlate with another
49
Cohens Kappa
Degree of consistency amongst raters that classify items into discrete categories
50
Example of cohens kappa
Assessing the same group of 30 people between two different raters, cohens kappa will identify who places which patients in depressed or not depressed
51
Normal Curve
The probability that an observation under the normal curve lies within 1 SD of the mean is approx 0.68 & 2 SD of the mean is approx 0.95 & 3 SD of the mean is approx. 0.99
52
Why is SEM important for testing?
SEM is based on the idea that your cannot test an individual infinite amount of times. Standardized error is always present
53
How is JND difficult to apply to psychological constructs?
It is used to determine a level of sensory difference (like hearing or sight) and it is variability in the expression of disorders in humans
54
What is item analysis and how is it related to test construction?
Examining the item quality to map the construct we have defined. We then look at dichotomous and no dichotomous measures to determine item quality (variance, covariance, etc.)
55
How does one construct a test?
Need to determine what area or domain you want to examine; homogenous content; tests made for repeated use require validation
56
Scaling models
Unidimensional, subject centered methods, stimulus centered methods and response centered approaches
57
Subject centered methods
Test developers primary interest is locating the individual at different points on the continuum (likert scale)
58
Stimulus centered methods
Psychophysics & JND- give tones to determine what is the absolute threshold to experiencing a sensation, but not always clear where the difference lies, not all of us agree on what the difference is. Need subject competency to tell the JND
59
Response centered approaches
Each respondent is asked to rank order his or her preference for a set of stimuli or to rank order a set of statements in terms of their proximity to his or her own personal beliefs. Allows to scale psychological distance between separated categories
60
Heterogeneity
Difference in character or content
61
Heterogeneity
Difference in character or content
62
Homogenous
Same character or content
63
Meta-analysis
Multiple studies with the same research questions
64
Bivariate
Split a variable into 2 parts
65
Inferential statistics
Take sample data and make inferences on the population
66
Descriptive statistics
Looks at trends in the sample and understand them based on the sample itself
67
Assessment
Is an overall-testing score in the context of history (holistic)
68
Testing
Is a quantitative score no larger context
69
Niche building
Create, seek out and end up in environments that reinforce your traits, do this consciously and unconsciously
70
Reliability and standard error of measurement
As reliability of the instrument increases the standard error of measurement goes down, if you know your test is getting consistency then of course your error will go down
71
Achievement tests
They are trying to determine if a specific skills set or knowledge base has been acquired
72
Popham & Husek (1969)
Learned that you cannot use traditional reliability since you are not interested in how someone does in comparison to a group of others— you are interested in how someone performs in regard to a specific criterion
73
Criterion
Anything that has real world implication Ex: a lawyer fails bar exam they cannot become a lawyer, these affect your real like because they affect you moving forward in a profession
74
2 objectives achievement tests scores can give you
Relative position of the examinees score in a distribution of scores (z score) & the degree to which the person has attained the goal of a specific instruction (ex: comp exam)
75
Z score
Measured in terms of standard deviations from the mean, Relative position of the examinees score in a distribution of scores
76
Proportion correct score
Percentage of correct answers from a randomly determined number of test items (you don’t need to know how others performed if you know the percentage of correct answers obtained)
77
Criterion referenced tests
Look at development (all of these tests are arbitrary), a test that measures a student's performance against a set of predetermined standards or criteria.
78
Domain score
The proportion of items in the domain that the examinee answers correctly
79
Mastery allocation
Cutoff score that classifies examinees into two categories master vs. non-master (ex: EPPP)
80
What does a z score allow for
Allows for comparison across variables that are calibrated or scaled differently, it is independent of scaling and calibrated
81
what do z scores do for the WAIS/WISC (IQ tests) & MMPI
These both have different scoring which makes them not comparable, but the z-score is able to compare them
82
Absolute error
Using an examinees score as mean score as a representation of his true universal score
83
How is absolute error calculated
By summing all the error variance
84
How criterion referenced reliability is examined
The lower the error the better the examinees score represents his domain referenced true knowledge
85
Reliability of classification
-does the observed match on to what we predict, we want to know who passes and fails as well as how they are classified
86
Predicted
What we observe to happen
87
Observed
Is what has happened
88
The percent that the items people are getting correct will affect
The reliability of achievement tests
89
Where is the true reliability
It is in the middle, not the tails of the scoring, we want the middle to be lower to show a better reliability
90
Should reliability be high?
Yes and all similar
91
Homogenous samples and reliability
They have less reliability compared to heterogeneity
92
Self report and reliability problems (2 major components)
Literal meaning and pragmatic meaning
93
Literal meaning
Semantic understanding of sentence structure
94
Pragmatic meaning
Inferences about the questions intent
95
Issues with reliability & self report
Ex: how are you doing? Leads to interpretation by the participant in the conversation, this can cause issues with reliability because the client may interpret the question differently
96
Self reports and reference periods
When asked to respond to something that occurred last week vs. last year, find differential responding
97
Differential responding
There is an interpretation that the shorter the length implies frequency and longer the event more intensity
98
Self reports and question context
Respondents change their answers based on researchers affiliation, or response categories themselves can change the way a patient may respond
99
Self report and context
Preceding questions in a survey or questionnaire influences the ways in which respondents evaluate items
100
Internet & psychological testing
Internet provides a cheaper and faster way to update tests, translate tests, interpret scores quickly, can get more respondents quickly, can provide access to test materials quite cheaply, allows those in rural areas to be tested
101
Internet and ethical considerations
Test security, keeping the testing items secured, test may discriminate, language barriers, minors taking tests, not giving informed consent accurately, how do you give feedback to individual, how do you deal with emotional trauma from results
102
Psychologists should use what type of tests?
Tests whose validity and reliability have been established for the population being tested
103
How do you evaluate an item or test question is good?
Done through statistical analysis of the test questions
104
Intrinsic traits
qualities that are inherent to something or someone, and are not dependent on external circumstances
105
Difference between multidimensional and unidimensional in cronbachs alpha
Multidimensional has lower cronbachs alpha, uni has higher