Midterm Practice Quiz Flashcards

1
Q

Inferential statistics are used to:
A) Summarize data
B) Make predictions about a population
C) Describe dataset characteristics
D) Organize data into tables

A

Make predictions about a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which measure of central tendency is not affected by outliers?
A) Mean
B) Median
C) Mode
D) Range

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inferential statistics allow researchers to:
A) Make inferences about samples
B) Describe observed data
C) Predict population characteristics
D) Organize data in tables

A

Predict population characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which scale of measurement has magnitude and equal intervals but no absolute zero?

Scale has the properties of magnitude and equal intervals but not absolute 0 (Temperature in degrees Fahrenheit).

Determination of equality of intervals mean or differences

Mode, median. Range standard deviation

A

Interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Absolute zero is a characteristic of which scale of measurement?

Scale that has all three properties

Determination of equality of ratios
measurements have a true zero

Range, standard deviation, coefficient of variation

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of percentiles in statistics?
A) Summarize data
B) Predict population characteristics
C) Organize data in tables
D) Describe distribution of scores

A

Describe distribution of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a z-score represent?
A) Standardized units of data
B) Average deviation around the mean
C) Total frequency for a set of observations
D) Difference between mode and median

A

Standardized units of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A distribution with fewer scores at the positive tail is:
A) Positively skewed
B) Negatively skewed
C) Mesokurtic
D) Leptokurtic

A

Positively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of correlation analysis?
A) Making inferences about populations
B) Describing dataset characteristics
C) Measuring the relationship between two variables
D) Summarizing data

A

Measuring the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which type of sampling prevents sampling bias?
A) Incidental sampling
B) Stratified sampling
C) Purposive sampling
D) Convenience sampling

A

Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main purpose of norms in psychological testing?
A) Indicate individual’s relative standing
B) Provide raw scores for interpretation
C) Determine test reliability
D) Describe population characteristics

A

Indicate individual’s relative standing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Age norms indicate:
A) Average performance of test-takers at different ages
B) The percentage of people with a particular score
C) Performance in different tests
D) Relative standing in a normative sample

A

Average performance of test-takers at different ages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

National norms are derived from:
A) Local population
B) A normative sample representative of the nation
C) A fixed reference group
D) Subgroup norms

A

A normative sample representative of the nation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fixed reference group scoring systems are used in tests like:
A) SAT
B) GRE
C) ANOVA
D) T-test

A

SAT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which type of testing evaluates performance against set standards?
A) Norm-referenced
B) Criterion-referenced
C) Interval testing
D) Ratio testing

A

Criterion-referenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do standard scores represent?
A) The average deviation around the mean
B) Raw scores converted to a different scale
C) The difference between mode and median
D) The percentage of scores below a particular score

A

Raw scores converted to a different scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Inferential statistics are based on:
A) Sample data
B) Population data
C) Descriptive statistics
D) Standard deviation

A

Sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which statistic measures the arithmetic average score?
A) Mean
B) Standard deviation
C) Z-score
D) Variance

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the purpose of correlation analysis?
A) Making inferences about populations
B) Describing dataset characteristics
C) Measuring the relationship between two variables
D) Summarizing data

A

Measuring the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which type of sampling prevents sampling bias?
A) Incidental sampling
B) Stratified sampling
C) Purposive sampling
D) Convenience sampling

A

Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Age norms indicate:
A) Average performance of test-takers at different ages
B) The percentage of people with a particular score
C) Performance in different tests
D) Relative standing in a normative sample

A

Average performance of test-takers at different ages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

National norms are derived from:
A) Local population
B) A normative sample representative of the nation
C) A fixed reference group
D) Subgroup norms

A

A normative sample representative of the nation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Fixed reference group scoring systems are used in tests like:
A) SAT
B) GRE
C) ANOVA
D) T-test

A

SAT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What type of statistics summarize characteristics of a dataset?
A) Descriptive
B) Inferential
C) Nominal
D) Ordinal

A

Descriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In inferential statistics, what do researchers make predictions about?
A) Sample data
B) Population data
C) Standard deviation
D) Mean

A

Population data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Which scale of measurement has magnitude, equal intervals, and an absolute zero?
A) Nominal
B) Ordinal
C) Interval
D) Ratio

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does a z-score indicate?
A) Standardized units of data
B) Total frequency for a set of observations
C) Average deviation around the mean
D) Difference between mode and median

A

Standardized units of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Which statistic approximates the average deviation around the mean?
A) Mean
B) Median
C) Mode
D) Standard deviation

A

Sd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Which assumption suggests that psychological traits and states can vary in endurance?

A

Assumption 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Which type of norm expresses the percentage of people whose score falls on a particular raw score?

To indicate an individual’s relative standing in a normative sample

A

Percentile norms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What type of validity relates more to what a test appears to measure?

A

Face validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Which type of reliability is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test?

A

Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Which measure of reliability is the preferred measure for personality tests?

A

Coefficient Alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Which type of validity involves evaluating the relationship of scores obtained on the test to scores on other tests or measures?

A

Criterion-related validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

To facilitate the interpretation of test scores

A

norming in psychological testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the ratio of true score variance to observed score variance?

A

Test reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Which type of sampling method helps prevent sampling bias?

A

Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

To select a representative group for testing

A

sampling in psychological testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Which measure of reliability estimates the consistency between two or more scorers?

A

Inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Which type of test validity involves evaluating the relationship between test scores and an external criterion measured at the same time?

A

Concurrent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the term for the tendency of a rater to give higher ratings than deserved due to failing to discriminate among different aspects of a rate’s behavior?

A

Central tendency error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

To measure different aspects of a trait or ability

A

multiple assessment measures in testin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Which type of validity refers to a test’s accuracy in predicting future performance?

A

Predictive validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is the term for a judgment resulting from the intentional or unintentional misuse of a rating scale?

A

Leniency/Generosity error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Which type of norm represents the average performance of different samples of test-takers who were at various ages?

A

Age norms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the ratio of true score variance to observed score variance?

A

Error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the primary purpose of norming in psychological testing?

A

To ensure fairness in testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Which type of sampling method helps prevent sampling bias?

A

Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is the primary purpose of sampling in psychological testing?

A

To select a representative group for testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Which measure of reliability estimates the consistency between two or more scorers?

A

Inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Which measure of validity is based on an evaluation of the subjects, topics, or content covered by the items in the test?

A

Content validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

To measure different aspects of a trait or ability

A

multiple assessment measures in testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Which type of norm represents the average performance of different samples of test-takers who were at various ages?

A

Age norms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Which type of test validity involves a comprehensive analysis of how scores on the test relate to other test scores and measures?

validity measures the extent to which a test uniformly measures a single concept?

A

Construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Which measure of reliability estimates the consistency between two or more scorers?

A

Inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Which type of reliability involves correlating two pairs of scores obtained from equivalent halves of a single test administered once?

A

Split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

. Purposes of description to make inferences which are logical deduction about events that cannot be observed directly.

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Used to provide a concise description of a collection of quantitative information.

Summarize and describe the main features of a dataset. Do not allow for making inferences or predictions beyond the observed data; they simply describe what is already present.

A

Descriptive Statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Measures of (mean, median, mode).

A

Central Tendency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Measures of (standard deviation, range).

A

Measures of Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Used to make inferences from observation of a small group of people known as sample to a larger group of individuals known as population.
Allow researchers to make inferences and predictions about populations based on sample data.

A

Inferential Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Under of scales of measurement.

Application of rules for assigning
numbers to objects.

A

Measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

One of Properties of Scales
Scales possess magnitude if they allow for comparison of “more,” “less,” or “equal” between attributes.

A

Magnitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

One of Properties of Scales.
A scale with equal intervals maintains consistent differences between measurement points, ensuring that the difference between two points has the same meaning regardless of their position on the scale.

A

Equal Intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

What are the types of Scales

A

Nominal Objects.

Ordinal Scale.

Interval Scale.

Ratio Scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Scale with the property of magnitude but not equal intervals or an absolute 0.

Determination of. more or less; data can be ranked

Mode, median and range

A

Ordinal Scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q
  • Name objects
  • Determination of equality; data can be placed into classes
  • central tendency: Mode
A

Nominal scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

How many times each of the values on the horizontal axis from the lowest to the highest values.

A

Vertical Axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Answers the question, “what percent of the scores fall below a particular score?”

A

Percentile Ranks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

It divides the total frequency for a set of observation into hundredths.

A

Percentiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

What are the types of Describing Distribution?

A

Mean

Standard Deviation.
Variance.
Z-score.
Norms.
Age-related Norms.
Tracking.
Criterion-referenced Tests
Scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

Arithmetic average score. Total the scores
and divide the sum by the number of cases or N.

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Approximates the average deviation around the mean.

A

Standard Deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

Average squared deviation around the mean.

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Data to standardized unit. Difference between a score and the mean, divided by the standard deviation.

A

Z-score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Performances by defined groups on tests.

A

Norms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Certain tests have different normative groups for age groups.

A

Age-related Norms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Tendency to stay at about the same level relative to one’s peers.

A

Tracking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

compares each person with a norm.

A

Criterion-referenced Tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Set of numbers whose properties model empirical properties of the objects to which the numbers are as signed.

A

Scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

Used to measure a discrete variable.

A

Discrete Scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

Used to measure a continuous variable.

A

Continuous Scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

Set of test scores arrayed for recording or study.

A

Distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

Straightforward, unmodified accounting of performance that is usually numerical.

A

Raw Score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

All scores are listed alongside the number of times each score occurred.

A

Frequency Distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

Test-score intervals, also called class intervals, replace the actual test scores.

A

Grouped Frequency Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

Chart composed of lines, points, bars.

A

Graph.

88
Q

Graph with vertical lines drawn at the true limits of each test score (series of contiguous rectangles).

A

Histogram

89
Q

Here the rectangular bars typically are not contiguous.

A

Bar Graph

90
Q

Expressed by a continuous line connecting the points where test scores meet frequencies

A

Frequency Polygon.

91
Q

fewer scores at the positive tail; more test takers scored low in the test

A

Positively Skewed

92
Q

fewer scores at the negative tail more test takers scored high in the test

A

Negatively Skewed

93
Q

steepness of a distribution in its center

A

Kurtosis

94
Q

What are the Kurtosis (steepness of a distribution in its center)

A

• Platykurtic (-)
• Leptokurtic (+)
• Mesokurtic (0)

95
Q

What are the Standard Scores

A

● z-scores
● T scores
● Deviation IQ scores
● GRE/SAT/CEEB scores
● Stens
● Stanines

96
Q

is a measure of the magnitude and direction of relationships between two variables.

an important tool for making inferences.

A

Correlation

97
Q

a correlation procedure is applied when you want to know the relationship between performance (scores) in two different tests.

A

Assessment language

98
Q

What are the decision and conclusion if the P-value is Less than 5% (p< 0.05) and equals to 5% (p=0.05)?

A

Reject Ho and Significance

99
Q

What is the decision and conclusion if the P-value is more than 5% (p>0.05)

A

Fail to reject Ho and not significance

100
Q

What are the measures of Prediction?

A

Logistic Regression

Multinomial Regression

Simple Linear Criterion

Multiple Regression

Ordinal Regression

101
Q

Criterion: Nominal (Categorized)
Predictor: 1 Continous

A

Logistic Regression

102
Q

Criterion: Nominal
Predictor: 2/More Continuos

A

Multinomial Regression

103
Q

Criterion: Continous
Predictor: 1 Continous

A

Simple Linear

104
Q

Criterion: Continous
Predictor: 2/More Continous

A

Multiple Regression

105
Q

Criterion: Ordinal (Ranking)
Predictor: 2/More Continous

A

Ordinal Regression

106
Q

Trait.
State.
Psychological trait
 Trait is not expected to be manifested in behavior 100% of the time.
 Trait+state = used to refer to a way in which one individual varies from another.
 Reference group can influence one’s conclusions or judgments.

A

Assumption 1: Psychological Traits and States Exist

107
Q

Any distinguishable, relatively enduring way in which one individual varies from another.

is not expected to be manifested in behavior 100% of the time.

A

Trait

108
Q

Distinguish one person from another but are relatively less enduring.

A

State

109
Q

covers a wide range of possible characteristics. Exists only as a construct: informed, constructed to explain behavior.

A

Psychological trait

110
Q

 Specific traits and states to be measured and quantified need to be carefully defined.
 People in general, have many different ways of looking at and defining the same phenomenon.
 considers the types of item content that would provide insight into it.
 Weighing the comparative value of a test’s items comes about as the result of a complex interplay among many factors (technical considerations, a conctruct has been defined, value of society attaches to behaviors).
 Test score is presumed to represent the strength of the targeted ability or trait or state and is frequently based on cumulative scoring: (the more test manual are consistent, the higher the testaker to be an targeted ability).

A

Assumption 2: Psychological Trait and States Can Be Quantified and Measured

111
Q

 Patterns of answers to true-false questions on one widely used test of personality are used in decision making regarding mental disorders.
 Tasks in some tests mimic the actual behaviors.
 Obtained sample of behavior is typically used to make predictions about future behavior.

A

Assumption 3: Test-Related Behavior Predicts Non- Test-Related Behavior

112
Q

Competent test users understand a great deal about the tests they use + understand and appreciate the limitations of the tests they use as well as how those limitations might be compensated for by data from other sources.

A

Assumption 4: Tests and Other Measurement Techniques Have Strengths and Weaknesses

113
Q

 Error: something that is more than expected; it is actually a component of the measurement process°refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test.
 Error variance: component of a test score attributable to sources other than the trait or ability measured.
 Assessees, assessors and measuring instruments themselves are sources of error variance.
 Classical or true score theory of measurement: assumption is made that each test-taker has a true score on a test that would be obtained but for the random action of measurement error.

A

Assumption 5: Various Sources of Error Are Part of the Assessment Process

114
Q

something that is more than expected; it is actually a component of the measurement process°refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test.

A

Error

115
Q

component of a test score attributable to sources other than the trait or ability measured.

A

Error variance

116
Q

assumption is made that each test-taker has a true score on a test that would be obtained but for the random action of measurement error.

A

Classical or true score theory of measurement:

117
Q

 major test publishers strive to develop instruments that are fair when used in strict accordance with guidelines in the test manual.
 One source of fairness-related problems is the test user who attempts to use a particular test with people whose background and experience are different from the background and experience of people for whom the test as intended.
 Tests are tools, just like other tools they can be used properly or improperly.

A

Assumption 6: Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner

118
Q
  • Considering the many critical decisions that are based on testing and assessment procedures, we can readily appreciate the need for tests, especially good tests
A

Assumption 7: Testing and Assessment Benefit Society

119
Q

Used to estimate the extent to which an observed score deviates from a true score

A

Standard error of measurement

120
Q

In regression, estimate the degree of error involved in predicting the value of one variable from another.

A

Standard error of estimate

121
Q

Measure of sampling error

A

Standard error of the mean

122
Q

Used to estimate how large difference between two scores should before the difference considered statiscally significant

A

Standard error of the difference

123
Q

• Norming is basically a procedure that facilitates the test user’s interpretation of test scores.

A

The Nature of Norms

124
Q
  1. They indicate the individual’s relative standing in the normative sample, and thus permit an evaluation of his performance in reference to other persons.
  2. They provide comparable measures that permit a direct comparison of that individual’s performance in different tests.
A

Purposes of Norms

125
Q

considers certain characteristics that must be proportionately represented in the sample (helps prevent sampling bias and ultimately aids in the interpretation of the findings).

A

Stratified sampling

126
Q

– when members from the identified strata are obtained randomly

A

Stratified random sampling

127
Q

– if we arbitrarily select some sample because we believe it to be representative of the population.

A

Purposive sampling

128
Q

– often used for practical reasons, utilizes the most available individuals. Generalization of findings from incidental samples must be made with caution.

A

Incidental / Convenience sampling

129
Q
  • an expression of the percentage of people whose score on a test or measure falls on a particular raw score.
A

Percentile norms

130
Q

number of items that were answered correctly divided by the total number of items an multiplied by 100.

A

Percentage correct

131
Q
  • also known as age-equivalent scores
     They indicate the average performance of different samples of test- takers who were at various ages at the time the test was administered.
     it can be problematic when we talk about age and its relationship with psychological characteristics (i.e., intelligence).
A

Age Norms

132
Q
  • derived from a normative sample that was nationally representative of the population at the time the norming study was conducted (age, background, location, etc).
A

National Norms

133
Q
  • creating norm groups based on how the sample was segmented at the beginning (male/female, upper/middle/lower classes, etc).
A

Subgroup Norms

134
Q
  • typically developed by test users themselves.
A

Local Norms

135
Q

referred to as the fixed reference group, is used as the basis for the calculation of test scores for future administrations of the test.

A

Fixed Reference Group Scoring Systems.

136
Q
  • a method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard.

Example: To pass the board exam, you need to get at least 75% items correctly.

A

Criterion-referenced

137
Q
  • evaluating test scores in relation to other scores on the same test.
    Example: how does your performance compare with others?
A

Norm-referenced

138
Q
  • a raw score that has been converted from one scale to another.
  • Converting raw scores to standard scores are more easily interpretable than raw scores.
A

Standard Score

139
Q

➢ The consistency of test scores obtained by the same persons when they are re-examined with the same test on different occasions.
➢ underlies the computation of the ERROR OF
➢ MEASUREMENT of a single score

can be estimated from the correlation of the observed test score with the true score

A

Reliability

140
Q

➢ Is the ratio of true score variance to observed score variance (Kaplan and Saccuzzo, 2011).

A

Reliability by Kaplan and Saccuzzo, 2011

141
Q

➢ Measurements differ from occasion to occasion as a function of measurement error

A

Reliabity (Cohen and Swerdlik, 2018).

142
Q

error does not imply that a mistake has been made, instead it implies that there will always be some inaccuracy in our measurements.

A

Psychological Testing

143
Q

make it possible to estimate what proportion of the total variance of the test scores is ERROR VARIANCE.

A

Measures of test reliability

144
Q

represents any condition that is irrelevant to the purpose of the test (irrelevant, random sources of variance).
It is reduced by controlling the test environment, instructions, time limit, rapport, etc.

A

Error Variance

145
Q

a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.

A

Random error

146
Q

a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

A

Systematic error

147
Q

❖ Item sampling or content sampling, or the variation among items within a test, as well as to variation among items between tests.
❖ Higher scores may be obtained when the test takers are familiar with the items that were sampled (or made part of the test taken).
❖ Other items were unfamiliar to test takersthat could have been asked in the test, and this would have lowered the test taker’s score.

A

Test Construction

148
Q

❖ Test environment, such as room temperature, level of lighting.
❖ Test-taker variables, such as emotional problems, physical discomfort
❖ Examiner-related variables, such as appearance and demeanor.

A

Test Administration

149
Q

❖ Hand-scoring versus machine scoring.
❖ Objective versus subjective scoring.

A

Scoring & Interpretation

150
Q

A major assumption in classical test theory is that errors in measurement are random.

A

Basics of Test Score Theory (CLASSICAL TEST THEORY)

151
Q

Considers the problems created by using a limited number of items to represent a larger and more complicated construct.
▪ to evaluate one’s spelling ability, instead of using the entire number of words in the dictionary to comprise the items of the test, we decide to use a SAMPLE of words.

A

The Domain Sampling Model

152
Q

is to estimate HOW MUCH ERROR we would make by using the score from the shorter test as an estimate of the test-taker’s true ability

A

Reliability analysis

153
Q

An estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
Retest reliability shows the extent to which scores on a test can be generalized over different occasions.
The higher the reliability, the less susceptible the scores are to random daily changes in the condition of the test- takers.

A

Test-Retest Reliability

154
Q

Occurs when the first testing session influences scores from the second session.
For example, test takers sometimes remember their answers from the first time they took the test.

A

Carryover Effect

155
Q

Some skills improve with practice. When a test is given a second time, test takers score better because they have sharpened their skills than by having taken the test the first time.

A

Practice Effects

156
Q

Sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal.
Also called “Equivalent Forms” or “Parallel Forms” Reliability.
Correlation between the scores obtained on the two forms represents the reliability coefficient of the test.

Measures both temporal stability and consistency of responses to different item samples.
Must always be accompanied by a statement of the length of the interval between test administrations, as well as the relevant intervening experiences.
Can be quite burdensome, considering that you have to develop two forms of the same test.

A

Alternate/Parallel Form Reliability

157
Q
  • It addresses the issue of consistency of the implementation of a rating system. Inter-rater reliability can be evaluated by using several different statistics.
A

Inter-rater reliability

158
Q

obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once

If after dividing the items in half and we find that both halves have equal variances, use this

A

Split-half Reliability

159
Q

allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test

A

Spearman-Brown Formula

160
Q

• Involves administering two tests
• used when the test measures a variable that is expected to be stable
• Examples: Test-Retest Reliability; Alternate Form Reliability

A

Measures of Stability

161
Q

• An estimate of reliability of a test that is obtained from a measure of inter-item consistency
• Inter-item consistency refers to the degree of correlation among all the items of a test, which is calculated from the single administration of a single form of a test
• An index of inter-item consistency helps to assess test homogeneity (i.e., items in the test measure the same rait)
• Examples: Split-Half Reliability, KR20, Coefficient Alpha

A

Measures of Internal Consistency

162
Q

Developed by Cronbach
• By definition, it the the mean of all possible split-half correlations, corrected by the Spearman Brown formula
• Useful for tests with non-dichotomous items (i.e., Likert format)
• It is the preferred measure of internal consistency, best used for Personality tests
• Values range from 0 to 1.

A

Coefficient Alpha

163
Q
  • Refers to the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.
  • This is applied to tests whose scoring system is based on ratings given by judges.
A

Measures of Inter-Scorer Reliability

164
Q

the particular test measures a trait that is stable, use

A

Test-retest.

165
Q

If we are still concerned with trait stability, but we would like to control carryover and practice effects, and if a particular test has more than one form, use

A

Alternate-Form.

166
Q

If we would like to determine the homogeneity of a measure, use

A

Measures of Internal Consistency.

167
Q

If we are assessing interview scores (i.e., more than one interviewer) or behavioral observation scores,

A

use Inter- Scoring and Interpretation.

168
Q

Measures what it claims to measure. It defines the meaning of test scores

A

Validity (Gregory, 2011)

169
Q

Validity of a test measures what the test measures, and how well it does so

A

Anastasi, 1996

170
Q

Validity can be defined as the agreement between a test score or measure and the quality it is believed to measure

A

Kaplan & Saccuzzo, 2011

171
Q

Judgment based on evidence about the appropriateness of inferences drawn from test scores

A

Validity (Cohen and Swerdlik, 2018).

172
Q

is always a matter of degree. Tests may be useful or defensible for some purposes and populations, but less for others.

A

Validity

173
Q

● Performance on the test (test score), and
● Other independently observable facts about the behavior characteristics under observation
● Validation is the process of gathering and evaluating evidence about validity.
● Both the test developer and the test user may play a role in the validation of a test for a specific purpose.

A

How to determine validity?

174
Q

relates more to what a test appears to measure to the person being tested than to what the test actually measures.
- a test’s lack of face validity could contribute to a lack of confidence in the perceived effectiveness of the test with a consequential decrease in the test-taker’s cooperation or motivation to do his or her best.

A

Face validity

175
Q

is a measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures.

A

Criterion-related Validity

176
Q

is a measure of validity that is arrived at by executing a comprehensive analysis of
- how scores on the test relate to other test scores and measures.

A

Construct Validity

177
Q
  • is a measure of validity based on an evaluation of the subjects, topics, or content covered by the items in the test.
A

Content Validity

178
Q
  1. Relevant (pertinent/applicable to the matter at hand; appropriate for the test under investigated).
  2. Valid for the purpose for which it is being use.
A

Characteristics of a Good Criterion

179
Q
  1. Relevant (pertinent/applicable to the matter at hand; appropriate for the test under investigated).
  2. Valid for the purpose for which it is being use
  3. Uncontaminated
A

Characteristics of a Good Criterion

180
Q

relationship between test scores and an external criterion that is measured at approximately the same time.
Example: An arithmetic achievement test would possess concurrent validity if its scores could be used to predict, with reasonable accuracy, the current standing of students in a mathematics course.

A

Concurrent Validity

181
Q

–relationship between test scores and an external criterion that is measured somewhat later.
Example: Employment test can be validated against supervisor ratings after six months on the job

A

Predictive Validity

182
Q

is the extent to which a particular trait, behavior, characteristic, or attribute exists in the population (expressed as a proportion)

A

Base rate

183
Q

refers to the proportion of people a test accurately identifies a possessing or exhibiting a particular trait, behavior, characteristic, or attribute.

A

Hit rate

184
Q

is the proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute.

A

Miss rate

185
Q

is a miss wherein the test predicted that the test-taker did not possess the particular characteristic or attribute being measured.

A

false negative

186
Q

● is a judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct (

A

Construct Validity(Cohen and Swerdlik, 2018).

187
Q

It is the extent to which the test may be said to measure a theoretical construct or trait

A

Content validity (Anastasi, 1996).

188
Q

is an informed, scientific idea developed or hypothesized to describe or explain behavior.

A

construct

189
Q

● The researcher investigating a test’s construct validity must formulate hypotheses about the expected behavior of high scorers and low scorers on the test.
● These hypotheses give rise to a tentative theory about the nature of the construct the test was designed to measure.

A

Performing Construct Validation

190
Q

What are the Evidence of Construct Validity

A
  1. Evidence of Homogeneity
  2. Evidence of Changes with Age
  3. Evidence of pre-test post-test changes
  4. Evidence from Distinct Groups
  5. Convergent and Divergent/Discriminant Evidence
191
Q

● Homogeneity refers to the extent to which a test uniformly measures a single concept.

A

Evidence of Homogeneity

192
Q

● Some constructs are expected to change over time.

A

Evidence of Changes with Age

193
Q

● Evidence that test scores change as a result of some experience between a pretest and a posttest can be evidence of construct validity.

A

Evidence of pre-test post-test changes

194
Q
  • Also referred to as the method of contrasted groups.
A

Evidence from Distinct Groups

195
Q
  • Correlational values computed should be moderate only. High or very high correlations can suggest that the new test is a needless duplication of the already existing valid test
A

Convergent and Divergent/Discriminant Evidence

196
Q

● Choice of appropriate items
● Drawing of test specifications
● It should show the content areas or topics to be covered, the instructional processes or objectives to be tested, and the relative importance of individual topics and processes
It should indicate the number of items of each kind to be prepared for each topic.

A

Doing Content Validation

197
Q

: a plan regarding the types of information to be covered by the items.

A

Test Blueprint

198
Q

For psychometricians, bias is a factor inherent in a test that systematically prevents accurate, impartial measurement (Cohen and Swerdlik, 2018)

A

Test Bias

199
Q

When do we say that test bias exists?
- When test items are what than for another.

A

Easier for one group of people

200
Q

refers to whether a test accurately measures what it was designed to measure.

A

Construct Validity Bias

201
Q

occurs when the content of a test is comparatively more difficult for one group than for others. It can occur when questions are worded in ways that are unfamiliar to certain groups because of linguistic or cultural differences.

A

Content Validity Bias

202
Q

(or bias in criterion-related validity) refers to a test’s accuracy in predicting how well a certain student group will perform in the future.

A

Predictive validity bias

203
Q

is not demographically or culturally representative of the intended test takers, test items may reflect inadvertent bias.

A

test developer

204
Q

may be biased if the “norming process” does not include representative samples of all the tested subgroups.

A

Norm-referenced tests

205
Q

Certain test formats may have an inherent bias toward some groups, at the expense of others.
The choice of language in test questions can introduce bias.
Tests may be considered biased if they include references to cultural details that are not familiar to particular groups.

A

Other factors that can give rise to test bias

206
Q

is a numerical or verbal judgment (or both) that places a person or an attribute along a continuum identified by a scale of numerical or word descriptors known as a rating scale.

A

Rating

207
Q

is a judgment resulting from the intentional or unintentional misuse of a rating scale.

A

Rating error

208
Q

What are the Examples of Rating Errors (Cohen and Swerdlik, 2018)

A
  1. Leniency/Generosity Error
  2. Severity Error
  3. Central Tendency Error
  4. Halo Effect
209
Q

an error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading.

A

Leniency/Generosity Error

210
Q

an error in rating wherein the rater becomes overly strict and gives low ratings.

A

Severity Error

211
Q

rater is reluctant to give extremely high or low ratings; ratings cluster at the middle of the continuum

A

Central Tendency Error

212
Q

the tendency for a rater to give a particular rate a higher rating than he or she objectively deserves because the rater fails to discriminate among conceptually distinct and potentially independent aspects of a rate’s behavior.

A

Halo Effect

213
Q

For them Test is used in an impartial, just, and equitable way

A

Test Fairness (Cohen and Swerdlik, 2018).

214
Q

It is a social, philosophical, or perhaps legal term that represents one’s value judgment

A

Test Fairness(Fuhr and Bacharach, 2014)

215
Q

Given the fact that test results continue to be widely used when making important decisions, test developers and experts have identified a number of strategies that can reduce, if not eliminate, test bias and unfairness

A

Can Test Bias and (lack of) Test Fairness be Avoided?