Class Notes Flashcards Preview

ECPY 767 (Final) > Class Notes > Flashcards

Flashcards in Class Notes Deck (72):

if one level of IV is assigned via randomization

considered experimental -- as consumer be keenly aware of which level is randomized -- e.g., randomized on tx but not gender -- 2 x 2 factorial design


any time you adminster the same measure to the same participants (to collect data on DV) at a later date (at any point -- could be 5 minutes apart)

Repeated measures design


subsumed within repeated measures -- generally speaking, typically more than 20 observations (e.g., clinical trials) no longer need 20 for statisical methods to hold (so need at least 10) -- tend to occur in fixed intervals depends on what you're collecting data on -- logorythmic data collection - time period continues to get bigger (exponentially) between observations

Time-series design


Time-series design

subsumed within repeated measures -- generally speaking, typically more than 20 observations (e.g., clinical trials) no longer need 20 for statisical methods to hold (so need at least 10) -- tend to occur in fixed intervals depends on what you're collecting data on -- logorythmic data collection - time period continues to get bigger (exponentially) between observations


Difference between repeated measures and time series design

number of observations (collected data on DVs)


In non-equivalent group design one should

control with pre-test


Threats that concern ______ relate to statistical conclusion validity

integrity of treatment itself


Threats that concern ______ relate to internal validity

making comparisons between tx groups


What does randomization do?

"Ensure" that participant groups are equal prior to treatment


What does randomization not do?

Ensure anything that happens after treatment -- possibility of history effects


Important issues to look for in time series designs

* change in intercept (level)
* slope
* stability of effect (continuous or discontinuous effect)
* delayed vs immediate effect (instantaneous vs delayed -- when you see effects taking place)


Standard error is based on

sample of samples


What type of research is meta-analysis?

Ex-post facto


Assessment is defined as

* Overarching, sampling behavior
* In contrast to research when we assess people


Measurement is defined as

* Establishing quantitative rules for assigning numbers to represent attributes of persons
* Attributes of people, not to people
* Distinction between observations and inferences
* Must think: How representative is this of behaviors outside of this context?


Test, scales, and measures are defined as

Objective, quantitative measurement using standardized procedures; psychometric properties of scores essential


What are Rating protocols?

Taxonomies, classification and rating systems done by an observer (usually)


What is Evaluation?

Assessing the congruency between what is expected and actually occurs (formal to informal, may be quantitative) Chen, 1990


What is Clinical assessment?

Less formal, typically not fully standardized or quantitative


What is a Scale?

* often used interchangeably (not always) with measure, questionnaire or test
* Some say questionnaire is less formal
* Assumed to be assessing a single construct or domain


What is a construct?

Trait, domain, ability, latent variable, theta 0


What is theta?

Item Response Theory (IRT) uses this to talk about the construct itself -- latent variable


What are the differing types of item responses?

1. Dichotomous
2. Polytomuos
3. Graded responses


What are dichotomous item responses?

Two levels (true or false, yes or no, etc.)


What are polytomous item responses?

Three or more levels, often ordered but not always


What are graded item responses?

More than 2 ordered response options
All graded responses are polytomous but not all polytomous items are graded items


What is Classical Test Theory (CTT)?

* Total sums of squares partitioned into true score variance vs error score variance
* Partitioning variance


What is Modern Test Theory = Item Response Theory (IRT)?

Has to do with probability
* What's the probability that someone will respond in a certain way?
* Not only assigning where the individual is on the construct


What is Standard Error of Measurement (SEM or SEm) ?

* Estimate of extent to which an observed score deviates from true score
* Create confidence interval
* Probability that an individual's true score lies within a range


What is reliability?

* How consistently does a scale measure what it is designed to measure?


What are types of reliability?

1. Test-retest
2. Parallel forms
3. Split halves
4. Internal consistency
5. Inter-rater


Parallel forms

* Administer once in different forms and see how they correlate



* See how one half of test correlates to other half within each participant
* But need twice as many items (think statistical power and N)


Three types of validity

1. Content
2. Criterion
3. Construct


What is content validity?

* Extent to which items are representative of or sampled from content domain being measured
* Most important in tests in classes


What are limitations of content validity?

* Often non-statistical approach is used -- ask panel of experts
* Often relies upon face validity - does this look like it tests what it's supposed to test


What are two types of criterion validity?

1. Predictive
2. Concurrent


What is predictive validity?

Before and after -- look at outcomes


What is concurrent validity?

Measure all items at same time


What are limitations of concurrent validity?

* Often use 1-item scales
* Lack validity and reliability
* Extremely limited variance


What is construct validity?

* How well scores measure a specific trait or construct
* Requires a priori specification and operationalization of the construct


Steps in developing CTT test construction

1. Qualitative research
2. Explicate theorizing
3. Define constructs
4. Generate item pool
5. Content validity study
6. Derivation sample
7. Cross-validation study
8. Subsquent studies testing evidence of construct validity


What are limitations of CTT?

1. Cannot change anything about the scale
2. Standard error of measurement is presumed constant across levels of the construct, items, and scores
3. Statistics are sample dependent


What are main characteristics of IRT?

1. Probablisitc
2. Statistical mathematical theory
3. Scaling items as well as people on latent trait
4. Wide variety of IRT models (dichotomous--more common-- and polytomous)


What are basic assumptions of IRT?

1. Well-known constructs and well-established tests
a. know dimensionality
b. know validity
c. know what's correct and incorrect
2. extensive item banks and data banks
3. large samples (>3000)


In contrast to assumptions of IRT, what is typical of CPY instrumentation?

1. New scales
2. Constructs are not well known or defined
a. confounds
b. unknown dimensionality
c. ordered rating scale data (likert)
3. non-existent item pool; no data banks
4. small sample sizes (in best cases n = ~300-400)


What is measurement invariance?

Cross-cultural applications of tests, scales, and measures


4 questions to ask regarding measurement invariance

To what extent...
1. Can a construct be conceptualized equivalently across cultures?
2. is the same construct being measured equivalently across cultures?
3. can mean scores be compared equivalently across cultures?
4. can measures of association (correlation) be compared equivalently across cultures?)


Multiple regression

* Find lines that best fit the data instead of planes (as above)
* Linear composite of IVs to best explain DVs
* Combination of weights on IVs constitutes linear composites


Confirmatory Factor Analysis (CFA)

* forces the data into this model then assess how well the model fits the data
* Also assess how scales correlate with one another (Orthagonal or Oblique - allows correlation)


Exploratory Factor Analysis (EFA)

* Misapplication of statistical procedure for measurement development
* Analogous to doing "atheoretical" research
* No constraints on data
* See how many factors come out ("sem-magical")
* interpret the factors that come out based on the data that they're based on
* Do a CFA because we have implicit theorizing


Things to look for in articles relating to measurement

1. Look for ceiling/floor effects (Determine possible range for scale or subscale. How many items, what's the rating scale? If mean + or - 1 SD is highly skewed -- No longer have adequate prediction of probability of error when you have skewed scores)
2. Type 1 and Type 2 error rates are not protected
* "I adapted this test"
* Changed the rating scale, items, instructions, order, etc. -- changed anything
* Computing cronbach's alpha is insufficient to protect against these threats
* Different rating scales for same scale/sub-scale


Analysis of variance and multiple regression

are identical!


Multiple regression =

1 DV, always univariate


Multivariate =

multiple IVs and multiple DVs


Multiple regression in linear model

y = a + b1x1+b2x2+...bkXk + e



cannot interpet the overlap of explained variance between variables in multiple regression


Part correlation

unique variance explained by one predictor in the model, controlling for other predictors in the model


If summing all zero order correlations and the value exceeds 1.0...

by definition you have profound multicolliniarity (x1 and x2 are highly correlated)



* Answers for whom or when does this relation apply?
* Affects magnitude and/or strength of relation
* Easier to think about as high low, but better to use continuous


Interactions refer to moderation or mediation effects?



If interaction term is significant...

main effects are meaningless



*How or why a relation exists


Full mediation

Mediator variable completely accounts for relation between IV and DV


Partial mediation

Explains much of, but not all, of the relation between IV and DV (IV and DV path alone still exists but is nearer to non-significance)



1. Maximize experimental variance
2. Minimize error variance
3. Control extraneous variance



Mean square variance (want to increase)


MSE = SSE/dfe

Mean square error (want to decrease)


Maximize experimental variance by

1. Ensure maximum variability in Y due to X
2. Make treatments as different as possible, but realistic


Control extraneous variance a priori by

1. Homogenize on the confounding variable (restriction of range)
2. Match participants on all relevant conditions
3. Randomly assign participants to treatment conditions


Control extraneous variance a priori or post hoc by

1. Build a blocking variable into the design to control the confound
2. Covary any confounding variable (analysis of covariance)


Minimize error variance by

1. Block on any variable that is related to the DV but not related to the IV
2. Covary on any variable that is related to the DV but not related to the IV
3. Maximize the reliability of the measures used ( rtt)
4. Increase the sample size (error and statistical power are a function of N)
5. Use repeated measures designs instead of between groups designs