Psychometric Testing Flashcards Preview

Statistics 2018 > Psychometric Testing > Flashcards

Flashcards in Psychometric Testing Deck (16):

What is a Psychometric Test?

A test is a standardized procedure for sampling behavior and describin git using scores or categories
– Most tests are predictive of some non-test behavior of interest

– Most tests are norm-referenced= they describe the behavior in terms of norms, test results gathered from a large group of subjects (the standardization sample)

– Some tests are criterion-referenced= the objective is to see if the subject can attain some pre-specified criterion.


What are issues of importance in Testing?

Categories of tests -different classification methods
–Content vs non-content
• Uses and users of tests
• Assumptions and questions behind the use of tests
• Creating a test from scratch


What are two categories of Psychometric Testing?

1. Content Tests: which include Mental abilities, Personality traits, tests of achievement and interests/attitudes.

2. Non-content Tests:

Paper and pencil vs. performance
–Respondent selects between predefined answers
–Examinee performs some action and is judged on it.

Speed vs. power
–Former purely interested in speed
–Latter tests limits of knowledge or ability -no time limit imposed
–Usually both are tested at the same time.

Individual vs. group testing
• Maximum vs. typical performance
–Ability tests usually want to know about best performance
–On personality test -how typically extroverted are you

• Norm-referenced vs. criterion referenced performance –Only relative performance considered –How well did you do relative to predefined criteria.

Implicit vs Explicit Association Tests like Stroop Test


What 3 issues should you be concerned about with Psychometric Testing?

• How the test was developed
• Reliability
• Validity


How do you assure a test is reliable when constructing it?

–Need to ensure that all aspects of the construct should be dealt with –anxiety-all the different aspects of construct should be considered

–Need to be long enough to be reliable -start with around 30 and reduce to 20

–Should only assess one trait
–Should be culturally neutral
–Should not be the same item rephrased (mentioned during FA)


How do you establish item suitability? 3 answers.

1: Should not be too many items which are either very easy or very hard –>10% of items with scores < .2 or >.8 is questionable

2: Items should have an acceptable standard deviation. If it is too low then it is not tapping into individual differences

3: If there are different constructs then it is important that an equal number of items refers to each construct.

4:Criterion keying –choosing items based on their ability to differentiate groups –E.g., successful surgeons/pilots/academics etc.

–Atheoretical –Groups must be well defined
–Interpret liberally since there will be overlap in response distributions

5: By FA –items that have a low loading (


What is Cronbach's Alpha?

Cronbach's Alpha is a model of internal consistency, based on the correlation between items and the number of items in test.

Influenced by the average correlation between the items and the number of items in the test

• Boosted by asking the ‘same’ question twice

• Test should not be used if alpha is below .7


What causes bias in Psychometric Testing?

• Response bias
–individuals are more likely to agree than disagree (Cronbach, 1946)

• Does not cause a problem if everyone behaves in same manner –standard score will be unaffected

• But there are considerable individuals differences in acquiescence therefore it can cause a major problem – Changing polarity removes this difficulty.

Social desirability
–Counter acted by lie scales and consistency measures

• Expectation
• Anxiety
• Test specific practise


What is Validity?

Validity simply means that a test or instrument is accurately measuring what it’s supposed to.

Validity with respect to IV:

–Are we truly manipulating that which we think we are
• Often relies on the construct of interest being adequately described
• How do you manipulate something like the unconscious?

• Validity with respect to the DV:
–Extent to which you are measuring what you claim to measure.


What is Reliability?

Reliability is a measure of the stability or consistency of test scores. You can also think of it as the ability for a test or research findings to be repeatable.


What are the different types of Validity 6 types.

1: Content validity.

2: Criterion-Related validity (Convergent/Predictive and Divergent/Concurrent are two types of Criterion Related Validity.

3: Construct Validity.

4: Internal Validity

5: External Validity.

6: Face Validity.


What is Content Validity?

Does the scale measure the full breadth of the concept? Certain amount of subjective and expert opinion to this.


What is Criterion Validity? What are it's two subtypes?

How does the scale measure up to some already validated measure?

Convergent/Predictive Validity: Does the scale give results similar to the results of other measures of the same concept.

Divergent/Concurrent Validity: Does the Scale give results different from questions that are supposed to measure different concepts?


What is Construct Validity?

What is the extent to which the test measures the theoretical construct that it is supposed to measure?


What is Face Validity?

Does the Scale measure the full breadth of the Concept?


What does Cronbach Alpha measure?

Reliability. It measures scale reliability/internal consistency.

You want it to be at least higher than 0.7. Should really be around 0.90-0.95 or above 8.

But it’s easily influenced by number of questions. Redundant questions may increase it.