Chapter 1 Flashcards

Question 1

Q

Deductive

Answer

A

Have a theory of the world and how the world works. Theory First

Question 2

Q

Inductive

Answer

A

Observe the world first. Data First.

Question 3

Q

Cause

Answer

A

The producer of an effect, result, or consequence. Mostly called Inus Conditions [an insufficient but non-redundant part of an unnecessary sufficient condition]

We cannot directly observe causal effects. Causation can only be inferred never known.

Causation requires correlation and counterfactual dependence.

Question 4

Q

Effect

Answer

A

The consequence of an effect.

Best understood through a counterfactual model. A counterfactual is something contrary to a fact. In an experiment we observe what did happen when people receive the treatment. A counterfactual is what would have happened if they did not receive the treatment. The effect is the difference between the two.

Question 5

Q

Experiment

Answer

A

Is a test under controlled conditions, randomly assigned, made to demonstrate a known truth, examine the validity of a hypothesis or determine the efficacy of something - to discover the effects of presumed causes. Experiments require that we have manipulatable variables.

Manipulate levels of an IV (treatments) to observe its affects.

Randomized Experiment: Assign cases to levels of the treatment by some random process such as using a random number table or generator.

Question 6

Q

Control Variables

Answer

A

Control variables are those independent variables which are not part of the research study however their influence cannot be ignored. Hence in SPSS , control variables will constitute the first block of hierarchical regression followed by the regular independent variables in the other block.

Question 7

Q

Moderator Variables

Answer

A

A moderator variable is one that influences the strength of a relationship between two other variables

Moderator variables are those variables which act like a catalyst in a regression relationship. They interact with the independent variables either to bring down or enhance the relationship between the independent and dependent variables.

One which determines the conditions under which a described causal relationship holds (increasing the frequency of broadcast of car commercials in which the dealer himself appears increases his sales among low-income prospective buyers but not among high-income prospective buyers). Effect of the IV depends on the value of the ModV

Question 8

Q

Mediator Variables

Answer

A

A mediator variable is one that explains the relationship between the two other variables.

A link in the causal chain between IV and DV. For example, educational attainment might be called a mediator variable between gender and job category. A variable is a strong mediator if it has a strong association with both IV and DV, but the relationship between the IV and the DV reduces to zero when the MedV is entered into the relationship model

Question 9

Q

Causal Description

Answer

A

Flickeringn the light switch on & illuminating the room.

Causal description is identifying that a causal relationship exists between A and B

Question 10

Q

Molar Causation

Answer

A

Molar causal conditions are characterized in terms of large and often complex objects.

Enables descriptive causation

Thus, for example, a researcher who conceptualizes causal conditions at a relatively molar level might propose that wives married to husbands who are high in negative affectivity (a personality trait) will become increasingly dissatisfied with their marriage. This researcher might argue that husbands who are high in negative affectivity are more likely (than other men) to be irritable and critical, and less likely to be affectionate, and that these behavioral tendencies might, over time, create conflict, ultimately decreasing their wives’ satisfaction.

Question 11

Q

Causal Explanation

Answer

A

Why did the light go on?

Causal explanation is explaining how A causes B

Question 12

Q

Molecular Causation

Answer

A

Molecular causation is knowing which parts of a treatment are responsible for which parts of an effect.

Enables explanatory causation

Question 13

Q

Randomized Experiments

Answer

A

Units are assigned to conditions randomly
Randomly assigned units are probabilistically equivalent based on expectancy (if certain conditions are met)
Under the appropriate conditions, randomized experiments provide unbiased estimates of an effect

Question 14

Q

Quasi-Experiment

Answer

A

Shares all features of randomized experiments except assignment
Assignment to conditions occurs by self-selection
Greater emphasis on enumerating and ruling out alternative explanations
Through logic and reasoning, design, and measurement
HAS a comparison or control group.

Involve comparisons between naturally occurring treatment groups (by self-selection or administrative selection).

Researcher does not control group assignment or treatment, but has control over when/what to observe (DV).

Example might be people who work regular daytime hours vs. the night shift;
Researcher must rely on statistical controls to rule out extraneous variables such as other ways in which the treatment groups differ than the IV of interest.
Search for counterexamples and competing explanations is inherently falsificationist, as is searching for moderators and limiting conditions

Question 15

Q

Natural Experiment

Answer

A

Naturally-occurring contrast between a treatment and comparison condition
Typically concern non-manipulable causes
Requires constructing a counterfactual rather than manipulating one

Might typically involve before and after designs where you look at a DV of interest before and after some phenomenon that has occurred, for example, tying Presidential approval ratings to revelations about bailed-out bank excesses

Question 16

Q

Non Experimental Designs

Answer

A

Often called correlational, descriptive or passive designs (i.e., cross-sectional)
Statistical controls often used in place of structural design elements
Generally do not support strong causal inferences
Does not have a comparison or control group.

Non-experimental designs (correlational studies) are basically cross-sectional studies which are correlational in nature in which the researcher makes an effort to establish causal influence through measuring and statistical control of competing explanations

Question 17

Q

UTOS

Answer

A

Units of analysis [people, schools, molecules…]
Treatments [the variable you are interested in - ie: gender, learning in higher ed,]
Observations [outcomes, dependent & iV vars]made on units
Settings [context] in which the study is conducted

Question 18

Q

Causal Inference

Answer

A

[1. Well specified theory, causal framework or model for relations among constructs/concepts/variables, 2. Close mapping between constructs as theorized and operationalized, 3. Temporal order, precedence, 4. Demonstrated directional, interactive relations, 5. Ruling out confounding, selection.]

Question 19

Q

Prediction

Answer

A

Covariation between indicators of relevant constructs, Change in one indicators is assoicated with change in another, Observed change in one reliably preceds change in other [nonexperiential - correlation], manipulated change in one reliably leads to change in the other [experimental, quasi-experimental, natural]

Question 20

Q

Confounding Variables

Answer

A

can relate to selection - ie: male versus female may gravitate to different activities. Selection as a threat to validity. Location is an example - ie: students in a university.

Question 21

Q

Ecological Fallacy

Answer

A

The drawing of inferences about individuals directly from evidence gathered about groups, societies, or nations.

Occurs when you make conclusions about individuals based only on analyses of group data. For instance, assume that you measured the math scores of a particular classroom and found that they had the highest average score in the district. Later (probably at the mall) you run into one of the kids from that class and you think to yourself “she must be a math whiz.” Aha! Fallacy! Just because she comes from the class with the highest average doesn’t mean that she is automatically a high-scorer in math. She could be the lowest math scorer in a class that otherwise consists of math geniuses!

Question 22

Q

Individualistic Fallacy

Answer

A

The drawing of inferences about groups, societies, or nations directly from evidence gathered about individuals.

It occurs when you reach a group conclusion on the basis of exceptional cases. This is the kind of fallacious reasoning that is at the core of a lot of sexism and racism. The stereotype is of the guy who sees a woman make a driving error and concludes that “women are terrible drivers.” Wrong! Fallacy!

Question 23

Q

Dependent Variables

Answer

A

The dependent variable is what is affected by the independent variable – your effects or outcomes.
[Observed, outcome, criterion]

Question 24

Q

Independent Variable

Answer

A

In fact the independent variable is what you (or nature) manipulates – a treatment or program or cause.

For example, if you are studying the effects of a new educational program on student achievement, the program is the independent variable and your measures of achievement are the dependent ones.

[Causal, True, Latent]

Y [dependent variable - scores] = f(X - independent variable treatment]
X explains or predicts Y

Question 25

Q

Control Variables

Answer

A

Control variables are introduced to reduce the risk of wrongly attributing explanatory power to the independent variable(s) they have selected. Are the relations between the independent and dependent variables spurious? The variable used to test the possibility that he relation between the dependent and independent variables is spurious - in other words, that it can be explained only by the effect of another variable.

Question 26

Q

Internal Validity

Answer

A

A variable threatens internal validity if it threatens interpretation of results.

Did in fact the experimental stimulus make some significant difference in this specific instance? Did the independent and dependent variables covary in a causal relationship?

“Could there be an alternative cause, or causes, that explain my observations and results?”

Internal validity only shows that you have evidence to suggest that a program or study had some effect on the observations and results.

Internal validity is possibly the single most important reason for conducting a strong and thorough literature review.

“Is there a causal relationship between
variable X and variable Y, regardless of what
X and Y are theoretically supposed to
represent?”

If a variable is a true independent variable and
the statistical conclusion is valid, then internal
validity is largely assured.

The concern of internal validity is causal in that
we are asking what is responsible for the change
in the dependent variable.

Researchers must show that the IV caused the change in behavior and not something else. Results due to the independent variable and not other variables

In an experiment, the researcher tries to eliminate the effects (or control for) extraneous variables - other variables in the study.

If there are extraneous variables, you cannot tell if those variables or the IV (or both) influenced behavior.

Extraneous/Confounding variables that may have influenced results are threats to internal validity

Question 27

Q

Construct Validity

Answer

A

Construct validity defines how well a test or experiment measures up to its claims. It refers to whether the operational definition of a variable actually reflect the true theoretical meaning of a concept.

The simple way of thinking about it is as a test of generalization, like external validity, but it assesses whether the variable that you are testing for is addressed by the experiment.

For example, you might design whether an educational program increases artistic ability amongst pre-school children. Construct validity is a measure of whether your research actually measures artistic ability, a slightly abstract label.

Construct validity is an assessment of how well you translated your ideas or theories into actual programs or measures.

Construct validity defines how well a test or experiment measures up to its claims. A test designed to measure depression must only measure that particular construct, not closely related ideals such as anxiety or stress.

Construct validity determines whether the program measured the intended attribute.

“Given there is a valid causal relationship, is the
interpretation of the constructs involved in that
relationship correct?”

Suppose I am doing a study on the impact of font size and face on usability of Web pages by the elderly. If I conduct a study in which I vary Web page default font size (10 pt, 12, pt, 14 pt, 16 pt) and face (serif, sans-serif) and then measure the time of first page pull to 1 minute after last page pull by a group of people in an assisted living facility, I have two sorts of generalizability concerns.
One, called construct validity, is how do I get from the particular units, treatments, and observations of my study to the more general constructs they represent. That is, is this study useful in answering the question I really want to get at, which is, if we make adaptations to Web pages that take into account the physical limitations associated with aging, will people spend more time on a Web site? Do these specific operationalizations tap the actual constructs (page design, time spent on the site) whose causal relationship we are seeking to understand?

Question 28

Q

External Validity

Answer

A

A variable threatens external validity if it threatens generalizability of results

“To what populations, settings, and variables can this effect be generalized”

External validity is related to generalizing.

Do the results of the experiment apply to other people [Population Validity] or to other situations [Ecological Validity]

Threats to external validity are any characteristics of the study that limits the generality of the results.

Question 29

Q

Statistical Conclusion Validity

Answer

A

Were the appropriate use of statistics to infer whether the presumed independent and dependent variable covary.
Was the original statistical inference correct?
Not concerned with the causal relationship
between variables, but whether or not there is any
relationship, either causal or not.
Did the investigators arrive at the correct
conclusion regarding whether or not a relationship
between the variables exists or the extent of the
relationship?

The proper use of statistics to make inferences about :
The nature of the covariation between variables
The strength of that relationship.

Threats to statistical conclusion validity: improper use of statistics to make inferences about the nature of the covariation between variables (e.g., making a type I or type II error) and the strength of that relationship (mistakenly estimating the magnitude of covariation or the degree of confidence we can have in it)
Recommended that statistical hypothesis test reporting be supplemented with reporting of effect sizes (r2 or partial eta2), power and confidence intervals around the effect sizes.

Question 30

Q

Threats to Statistical Conclusion Validity: Low Statistical Power

Answer

A

An insufficiently powered experiment may incorrectly conclude that the relationship between treatment and outcome is not significant.

In particular, a small sample size may have insufficient power to detect a real effect even if it is there. As a result, the researcher claims the manipulation had no effect when in fact it does; he just couldn’t pick it up. As well, different statistical tests have varying sensitivity to detect differences.

Power analysis has the purposes of deciding how large a sample size you need to get reliable results, and how much power you have to detect a significant covariation among variables if it in fact exists.

Beyond a certain sample size the law of diminishing returns applies and in fact if a sample is large it can “detect” an effect that is of little real-world significance (i.e., you will obtain statistical significance but the amount of variation in DV explained by IV will be very small).

Example of low power problem: failing to reject the null hypothesis when it is false because your sample size is too small. So suppose there is in fact a significant increase in side effects associated with higher doses of a drug, but you did not detect it in your sample of size 40 because your power was too low; doctors will then go ahead and prescribe the higher dose without warning their patients that they could experience an increase in side effects. You could deal with this problem by increasing the sample size and/or setting your alpha region error rate to a larger area than .05, for example .10 or .20

The power to detect an effect is a complicated product of several interacting factors such as measurement error, size of the predicted effect, sample size, and Type 1 error rate.

Question 31

Q

Threats to Statistical Conclusion Validity: Violated Assumptions of Statistical Tests

Answer

A

Violations of statistical test assumptions can lead to either overestimating or underestimating the size and significance of an effect.

Failing to meet the assumptions of the test statistic, for example, that observations within a sample are independent in a t-test, which might result in getting significant differences between two samples but the real difference is more attributable to other factors the subjects had in common such as being from the same neighborhood or SES rather than the treatment they were exposed to; violating other assumptions like equality of population means, interval level data, normality of populations with respect to the variable of interest, etc.

Question 32

Q

Threats to Statistical Conclusion Validity: Fishing and the Error Rate Problem

Answer

A

Repeated tests for significant relationships, if uncorrected for the number of tests, can artificially inflate statistical significance.

Type I Error rate when there are multiple statistical tests. What starts out as .05 with one test becomes a very large probability of rejecting the null hypothesis when it is in fact true with repeated consultations of the table of the underlying distribution (normal table, t, etc.). It’s not the done thing to correlate 20 variables with each other (or to do multiple post-hoc comparisons after an ANOVA) and see what turns up significant, then go back and write your paper about that “relationship”

Question 33

Q

Threats to Statistical Conclusoin Validity: Unreliability of Measurement

Answer

A

Unreliability of Measures: Measurement error weakens the relationship between two variables and strengthens or weakens the relationships among three or more variables.

Question 34

Q

Threats to Statistical Conclusion Validity: Extraneous Variance in the Experimental Setting:

Answer

A

Some features of an experimental setting may inflate error, making detection of an effect more difficult.

Question 35

Q

Threats to Statistical Conclusion Validity: Restriction of Range

Answer

A

Reduced range on a variable usually weakens the relationship between it and another variable. avoid dichotomizing continuous measures (for example substituting “tall” and “short” instead of actual height; using dependent variables where the distribution is highly skewed and there are only a few cases in one or the other ends of the scale

Question 36

Q

Threats to Statistical Conclusion Validity: Unreliability of Treatment Implementation

Answer

A

If a treatment that is intended to be implemented in a standardized manner is implemented only partially for some respondents, effects may be underestimated compared with full implementation. Lack of standardized implementation of the treatment or level of the independent variable (we talked about this before in terms of things like instructions being memorized over time, experimenter effects, etc) unless adaptive application of the treatment is a more valid instantiation of how the treatment would occur in the real world.

Question 37

Q

Threats to Statistical Conclusion Validity: Heterogeneity of Units

Answer

A

Increased variability on the outcome variable within conditions increases error variance, making detection of a relationship more difficult.

Within-subjects variability: In most analyses that look at effects of treatments you are going to want your between-treatment variability to be large in accordance with your research hypothesis, and if there is a lot of variability among the subjects within the treatment that may make it more difficult to detect the predicted effect. Trade-off between ensuring subject homogeneity within treatments, which increases power to detect the effect, and possible loss of external validity,

Question 38

Q

Threats to Statistical Conclusion Validity: Inaccurate Effect Size Estimation

Answer

A

Some statistics systematically overestimate or underestimate the size of an effect.

Inaccurate effect-size estimation; recall how we talked about how the mean is affected by outliers. Sometimes there are some extreme cases or outliers that can adversely affect and perhaps inflate the estimates of effect sizes (differences on the DV attributable to the treatment or levels of IV)

Question 39

Q

Threats to Internal Validity: Extraneous Effects & History

Answer

A

Are participants exposed to events, other than the treatments, whose effects on their behavior could obscure the effects of the independent variable?

Events that happen to participants during the research which affect results but are not linked to the IV. In an extended study comparing relaxation to no relaxation on headache occurrence, those in the no relaxation condition sought out other means of reducing their headache occurrence (e.g. took more pills).

Any events which intervene between the treatment and the outcome measure. Example; subjects are presented with anti-smoking messages but are allowed a break before completing the post-test and various events happen during their break such as seeing smokers who are/are not attractive role models, etc. More of a problem in studies which assess effects over long periods of time

Environmental variables –
Features of the environment that that may influence results.
E.g., Room condition – bright, cheery vs. dark, small.

Question 40

Q

Threats to Internal Validity: Statistical regression effects (regression to the mean)

Answer

A

Regression toward the mean: the tendency of extreme (very high or very low) scores to fall closer to the mean on re-testing. Could changes in participants’ responses to the measures be caused by this? Regression to the mean occurs because measures are not perfectly correlated with each other. For example: In any given sample the tallest person is not always the heaviest, nor is the lightest person always the shortest.

Regression towards the mean
group of people who are moderately depressed – start a new type of therapy and see that when tested later, report fewer depressive symptoms- could be regression towards the mean.

Likely to be a problem in quasi-experiments when members of the group were selected (self- or administratively-) based on having high or low scores on the DV of interest. Testing on a subsequent occasion may exhibit “regression to the mean” where the once-high scorers score lower, or the once-low scorers score higher, and a treatment effect might appear when there really isn’t one. Having a really high score on something (like weight, cholesterol, blood sugar) etc might be sufficient to motivate a person to self-select into a treatment but the score might fall back to a lower level just naturally or through simply deciding to “get help,” although it could be attributed to the effects of the treatment.

Question 41

Q

Threats to Internal Validity: Attrition

Answer

A

Do participants drop-out of the groups during the study in a systematic or selective way? This could create differences among groups that would obscure the effects of the independent variable.

More of one type of person may drop out of one of the groups. For example, those less committed, less achievement-oriented, less intelligent.

Selective dropping out of a particular condition or level of the independent variable by people who had the most extreme pre-test scores on the DV, so when they drop out it makes the post-test mean for that condition “look better” and as if that treatment had a stronger effect since its mean would be lower without the extreme people.

Question 42

Q

Threats to Internal Validity: Interaction of temporal and group composition effects

Answer

A

Could changes in the participants’ behavior over time that are related to pre-existing differences among groups obscure the effects of the independent variable?

(more of a problem in correlational studies than in experiments in which you expose respondents to the treatment and then measure the outcome)

Question 43

Q

Threats to Internal Validity: Group composition effects (selection)

Answer

A

If different groups are used to compare the effects of treatments, could pre-existing differences among the groups obscure the effects of the independent variable?

Occurs when more of one type of person gets into one group for a study. For example, the people who return your questionnaire may be different, in some important way, to the people who did not return your questionnaire. The students who volunteer for your project might be different to the ones who do not volunteer (for example, more altruistic, more achievement oriented, more intelligent). Do these variables have an effect on the thing you are trying to measure? We usually do not know.

Question 44

Q

Threats to Internal Validity: maturation; fatigue

Answer

A

Do the participants change with the passage of time in ways unrelated to the effects of the independent variable?

Passage of time may have affected results
Study effects of drug A on learning in rats
Test one drug at 3 months age, another when rats are 1 yr old. See different effects of drugs – could be due to age differences.

Question 45

Q

Threats to External Validity: Nonrepresentative sampling

Answer

A

Are the participants in the research study so unrepresentative of those people who need to be understood? This would preclude generalization of the research results from the former to the latter.

Question 46

Q

Threats to External Validity:Nonrepresentative research context

Answer

A

Is the context in which the research study was carried out so unrepresentative of contexts where the behavior in question takes place as to preclude generalization of the research results from the former to the latter?

Question 47

Q

Validity

Answer

A

Validity: Am I measuring what I intend to measure?

Validity = Accuracy of Results determined by quality of research.

Question 48

Q

Content Validity

Answer

A

Content validity, sometimes called logical or rational validity, is the estimate of how much a measure represents every single element of a construct.

For example, an educational test with strong content validity will represent the subjects actually taught to students, rather than asking unrelated questions.

The relevance of an instrument to the characteristics of the variable it is meant to measure is assessed by face validity - the researcher’s subjective assessment of the instrument’s appropriateness - and sampling validity - the degree to which statements, questions, or indicators constituting the instrument adequately represents the qualities measured.

Content validity addresses the match between test questions and the content or subject area they are intended to assess. This concept of match is sometimes referred to as alignment, while the content or subject area of the test may be referred to as a performance domain.

Question 49

Q

Face Validity

Answer

A

Face Validity refers to the extent to which a test or the questions on a test appear to measure a particular construct as viewed by laypersons, clients, examinees, test users, the public, or other stakeholders. In other words, it looks like a reasonable test for whatever purpose it is being used. This common sense approach to validity is often important in convincing laypersons to allow the use of a test, regardless of the availability of more scientific means.

Question 50

Q

Empirical Validity

Answer

A

If a measuring instrument is valid, there should be a strong relationship between the results it predicts and the results it obtains when measuring the same or related variables. Empirical validity can be supported by comparisons with measurements made by other instruments.

Question 51

Q

Construct Validity

Answer

A

The kind of validity established by relating the measuring instrument to a general theoretical framework. A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity.

Two problems: understanding the construct, and measuring it.

With respect to understanding the construct, the most difficult task is a definitional one in which the researcher decides what the central or prototypical features of the construct are and delimits it such that it is also clear what the construct is not

Question 52

Q

Reliability

Answer

A

The idea behind reliability is that any significant results must be more than a one-off finding and be inherently repeatable.

The ratio of the true-score variance to the total variance in the scores as measured.The variance is the measure of the spread of observations, or scores.

Question 53

Q

Population

Answer

A

Complete set of relevant units of analysis or data ; Mu

Question 54

Q

Sample

Answer

A

Subset of population : x bar

Question 55

Q

Parameter

Answer

A

An attribute found in the population such as medium income or level of educational, can be measured

Question 56

Q

Statistic

Answer

A

An attribute in the sample such as medium income or level of educational, can be measured

Question 57

Q

Variables

Answer

A

Traits such as grades, habits, gender, and ethnic background

Question 58

Q

Sampling Unit

Answer

A

A single member of a sampling population (voter, an event, a household)

Question 59

Q

Sampling Frame

Answer

A

Includes all sampling units in a given population but this is rarely available. There are often incomplete frames, clusters of elements, and blank foreign elements.

Question 60

Q

Representative Sample

Answer

A

A sample is considered to be representative if the analyses made using the sampling units produce results similar to those that would be obtained had the entire population been analyzed.

Question 61

Q

Probability Sample

Answer

A

The ability to specify the probability at which each sampling unit of the population will be included in the sample. Only probability sampling can be used in representative sampling designs. [Simple Random Samples, Systematic Samples, Stratified Samples, cluster samples]

Question 62

Q

Non Probability Sample

Answer

A

No way of specifying the probability of each other’s unit’s inclusion in the sample, and there is no assurance that every unit has some chance of being included. [Types: Convenience, Purposive, Quota]

Question 63

Q

Standard Error

Answer

A

is the standard deviation of the sampling distribution of a statistic.[1] The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate.

The standard error of the estimate is a measure of the accuracy of predictions.

Question 64

Q

Confidence Intervals

Answer

A

gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

Answer 65

A

Stratified random sampling is also known as proportional random sampling. This is a probability sampling technique wherein the subjects are initially grouped into different classifications such as age, socioeconomic status or gender.

Then, the researcher randomly selects the final list of subjects from the different strata. It is important to note that all the strata must have no overlaps.

Researchers usually use stratified random sampling if they want to study a particular subgroup within the population. It is also preferred over the simple random sampling because it warrants more precise statistical outcomes.

Answer 66

A

Temporal precedence is the single most important tool for determining the strength of a cause and effect relationship. This is the process of establishing that the cause did indeed happen before the effect, providing a solution to the chicken and egg problem.

Answer 67

A

Covariation of the cause and effect is the process of establishing that there is a cause and effect to relationship between the variables. It establishes that the experiment or program had some measurable effect, whatever that may be.

Covariation of the cause and effect cannot explain what causes the effect, or establish whether it is due to the expected manipulated variable or to a confounding variable.

It does, however, strengthen the internal validity of the study.

Answer 68

A

Criterion validity assesses whether a test reflects a certain set of abilities.

To measure the criterion validity of a test, researchers must calibrate it against a known standard or against itself.

Answer 69

A

Predictive validity involves testing a group of subjects for a certain construct, and then comparing them with results obtained at some point in the future.

Examples of test with predictive validity are career or aptitude tests, which are helpful in determining who is likely to succeed or fail in certain subjects or occupations. refers to the “power” or usefulness of test scores to predict future performance.

Answer 70

A

Concurrent validity is a measure of how well a particular test correlates with a previously validated measure. It is commonly used in social science, psychology and education.

One common way of looking at concurrent validity is as measuring a new test or procedure against a gold-standard benchmark.

The problem is that the benchmark test may have some inaccuracies and, if the new test shows a correlation, it merely shows that the new test contains the same problems.

Answer 71

A

This threat is when the subject guesses the intent of the test and consciously, or subconsciously, alters their behavior.

For example, many psychology departments expect students to volunteer as research subjects for course credits. The danger is that the students may realize what the aims of the research are, potentially evaluating the result.

It does not matter whether they guess the hypothesis correctly, only that their behavior changes.

Answer 72

A

This particular threat is based upon the tendency of humans to act differently when under pressure. Individual testing is notorious for bringing on an adrenalin rush, and this can improve or hinder performance.

In this respect, evaluation apprehension is related to ecological external validity, where it affects the process of generalization.

Answer 73

A

Researchers are only human and may give cues that influence the behavior of the subject. Humans give cues through body language, and subconsciously smiling when the subject gives a correct answer, or frowning at an undesirable response, all have an effect.

The experimenter transfers expectations to the
participants in a manner that affects performance for dependent
variables.

Answer 74

A

Construct validity is all about semantics and labeling. Defining a construct in too broad or too narrow terms can invalidate the entire experiment.

For example, a researcher might try to use job satisfaction to define overall happiness. This is too narrow, as somebody may love their job but have an unhappy life outside the workplace. Equally, using general happiness to measure happiness at work is too broad. Many people enjoy life but still hate their work!

Mislabeling is another common definition error: stating that you intend to measure depression, when you actually measure anxiety, compromises the research.

The best way to avoid this particular threat is with good planning and seeking advice before you start your research program.

Example: women and “spatial reasoning”: how well or poorly women perform compared to men is a function of the testing environment (paper and pencil vs. 3D immersive)

Explicating a construct like “jurors” in jury research: People who volunteer for jury studies in exchange for a free meal are different from people who resentfully show up for jury duty and try to get out of it but are impaneled anyhow

Answer 75

A

This threat to construct validity occurs when other constructs mask the effects of the measured construct.

For example, self-esteem is affected by self-confidence and self-worth. The effect of these constructs needs to be incorporated into the research.

The measurements may tap some extraneous constructs not part of the construct of interest; subjects in the sample are thought to represent “impoverished urban elderly” because of their participation in free/low cost meal programs but they may also be the healthy/ ambulatory/ psychologically sturdy elderly who can walk to the centers for their meals or who can afford to pay but come for the company. They may differ from other urban seniors on a host of factors

Answer 76

A

This particular threat is where more than one treatment influences the final outcome.

For example, a researcher tests an intensive counseling program as a way of helping smokers give up cigarettes. At the end of the study, the results show that 64% of the subjects successfully gave up.

Sadly, the researcher then finds that some of the subjects also used nicotine patches and gum, or electronic cigarettes. The construct validity is now too low for the results to have any meaning. Only good planning and monitoring of the subjects can prevent this.

Answer 77

A

Variance in scores is a very easy trap to fall into.

For example, an educational researcher devises an intelligence test that provides excellent results in the UK, and shows high construct validity.

However, when the test is used upon immigrant children, with English as a second language, the scores are lower.

The test measures their language ability rather than intelligence.

Answer 78

A

This threat involves the independent variable, and is a situation where a single manipulation is used to influence a construct.

For example, a researcher may want to find out whether an anti-depression drug works. They divide patients into two groups, one given the drug and a control given a placebo.

The problem with this is that it is limited (e.g. random sampling error), and a solid design would use multi-groups given different doses.

The other option is to conduct a pre-study that calculates the optimum dose, an equally acceptable way to preserve construct validity.

Using only one measure of a construct, say only one example of a “pro-safe-sex” message to represent the larger construct or one dependent measure of say, loneliness, where several different measures of the same construct (for example, several subsets of items from the same “item universe”) would lend weight to results

Answer 79

A

This threat to construct validity involves the dependent variable, and occurs when only a single method of measurement is used.

For example, in an experiment to measure self-esteem, the researcher uses a single method to determine the level of that construct, but then discovers that it actually measures self-confidence.

Using a variety of methods, such as questionnaires, self-rating, physiological tests, and observation minimizes the chances of this particular threat affecting construct validity.

Answer 80

A

Convergent validity tests that constructs that are expected to be related are, in fact, related.

Answer 81

A

Discriminant validity (or divergent validity) tests that constructs that should have no relationship do, in fact, not have any relationship.

Answer 82

A

Interrater reliability is the most easily understood form of reliability, because everybody has encountered it.

For example, watching any sport using judges, such as Olympics ice skating or a dog show, relies upon human observers maintaining a great degree of consistency between observers. If even one of the judges is erratic in their scoring system, this can jeopardize the entire system and deny a participant their rightful prize.

Answer 83

A

Internal consistency reliability defines the consistency of the results delivered in a test, ensuring that the various items measuring the different constructs deliver consistent scores.

For example, an English test is divided into vocabulary, spelling, punctuation and grammar. The internal consistency reliability test provides a measure that each of these particular aptitudes is measured correctly and reliably.

One way of testing this is by using a test-retest method, where the same test is administered some after the initial test and the results compared.

However, this creates some problems and so many researchers prefer to measure internal consistency by including two versions of the same instrument within the same test. Our example of the English test might include two very similar questions about comma use, two about spelling and so on.

The basic principle is that the student should give the same answer to both - if they do not know how to use commas, they will get both questions wrong. A few nifty statistical manipulations will give the internal consistency reliability and allow the researcher to evaluate the reliability of the test.

Answer 84

A

The split halves test for internal consistency reliability is the easiest type, and involves dividing a test into two halves.

For example, a questionnaire to measure extroversion could be divided into odd and even questions. The results from both halves are statistically analysed, and if there is weak correlation between the two, then there is a reliability problem with the test.

The split halves test gives a measurement of in between zero and one, with one meaning a perfect correlation.
The division of the question into two sets must be random. Split halves testing was a popular way to measure reliability, because of its simplicity and speed.

However, in an age where computers can take over the laborious number crunching, scientists tend to use much more powerful tests.

Answer 85

A

The Cronbach’s Alpha test not only averages the correlation between every possible combination of split halves, but it allows multi-level responses.

For example, a series of questions might ask the subjects to rate their response between one and five. Cronbach’s Alpha gives a score of between zero and one, with 0.7 generally accepted as a sign of acceptable reliability.

The test also takes into account both the size of the sample and the number of potential responses. A 40-question test with possible ratings of 1 - 5 is seen as having more accuracy than a ten-question test with three possible levels of response.

Of course, even with Cronbach’s clever methodology, which makes calculation much simpler than crunching through every possible permutation, this is still a test best left to computers and statistics spreadsheet programs.

Answer 86

A

Exposure to a test can affect scores on subsequent exposures to that test, an occurrence that can be confused with a treatment effect.

As mentioned before simply taking a test can create change which can be mistaken for a treatment effect; can increase awareness of the DV and induce desire to change independent of what the treatment can produce (called test reactivity)

Answer 87

A

The nature of the measure may change over time or conditions in a way that could be confused with a treatment effect.

Changes in a measure over time (for example, coders may become more skilled, may develop favorite categories as they code more samples) or changes in its meaning over time.

Random assignment can eliminate many potential threats to internal validity in experiments; in quasi experiments where that is not possible the experimenter should try to identify as many threats as possible and eliminate or control for them.

Answer 88

A

The impact of a threat can be added to that of another threat or may depend on the level of another threat.

Answer 89

A

“Individual behaviors may be altered because they know they are being studied.”

Answer 90

A

“Individual behaviors may be altered because they know they are being studied.”

Answer 91

A

This occurs when the groups of participants you test are so unique, your results do not generalize beyond them.

Answer 92

A

Measures what would have happened to “units” in the absence of an intervention. We cannot observe a counterfactual

Answer 93

A

Is estimated by comparing a counterfactual outcome to those observed under the intervention.

Answer 94

A

Examples: Dosage in medical study or word choice in media message. Achievement scores.

Answer 95

A

Examples: Gender because it has so many co-variates. SES, History of drug use, Age, Presence of Psychiatric Disorder, Marital Status.

Answer 96

A

Experimenter bias/effects – changes in behavior are the result of actions of researcher, not IV. Can affect internal and external validity.

How do you reduce experimenter bias?
Double blind procedures
Automate

Answer 97

A

Participant bias/effects - changes in behavior are the result of participants knowing they are in a study, rather than IV. Can affect generalizability of results.

Novelty effects – individuals may react differently when they are in a new situation. Results may not generalize to different situations

Hawthorne effect or reactivity - Subjects known they are being observed and they behave differently. Results may not generalize to different conditions.

How do you reduce participant bias?
Deception
Manipulation check

Answer 98

A

Novelty and disruption effects: Hawthorne effect; positive change may not be due to the treatment but to excitement of being in an experiment (being paid attention, something to liven up the workday)

Answer 99

A

Compensatory equalization: refusal of managers, nonresearchers who administer the treatment to cooperate with the random assignment schedule and want to make benefits available to all.

Answer 100

A

Compensatory rivalry: occurs when people in a control or less-favorable treatment condition are aware of the other more favorable condition and put forth extra effort to score “high” (or low if required) on the outcome measures

Answer 101

A

Resentful demoralization: the opposite effect where receiving the less favorable treatment can cause scores on the outcome measure to be lower than what they would have been without the knowledge that others were getting a “better” treatment. These problems are likely to occur in quasi-experimental, real-world designs but can usually be controlled in the laboratory setting

Answer 102

A

Treatment diffusion: when participants in the control group somehow receive all or part of the treatment. For example, in the Shanghai study of effect of BSE training on breast cancer mortality, no reduction of mortality was found for the trained groups as opposed to the untrained groups. There was some speculation afterwards that women who received the training actually trained their friends and neighbors, some of whom were in the control group (just speculation)

Answer 103

A

Here are ways a causal relationship might not hold across UTOS:

Interaction of the causal relationship with units; effect found with women might not hold for men; might apply in some zip codes but not others, etc; might apply to guinea pigs but not people

Interaction of the causal relationship over treatment variations; for example, relationship between training in using Blackboard and likelihood of using it in class may vary if one group is taught with Power Point slides or handouts and another has to take notes.

Interaction of the causal relationship with observations; variations in measurement of outcome variable will affect whether or not the obtained causal relationship holds , for example, measuring hours spent watching TV with self-report vs. using a usage monitor attached to the TV

Interaction of the causal relationship with settings: in seniors and Internet study, treatment effect (availability of computer classes in increasing Internet-based social support networks) may vary considerably between the new beautiful cybercafe at one meal site and the unglamorous, less well-furnished premises available at another.

Answer 104

A

Between Subjects, Independent Measures, Unlreated Samples

Randomized designs and matched-groups designs are examples of between-subjects designs. This means that every subject is tested under one, and only one, condition. For example, in a randomized experiment with a treatment condition and a control condition, each subject is tested either under the treatment condition or under the control condition.

Advantages: Absence of practice/fatigue effects. Fewer chances of participants working out exactly the aims of the experiment.
Disadvantages: Larger sample is required. Not much control of confounding variables.

Answer 105

A

within subjects, repeated measures, related samples

Sometimes, however, it is desirable to use an experimental design in which each subject is tested under all conditions. This is called a within-subjects design or sometimes a repeated-measures design. For example, the very same subjects might be tested under a quiet condition and a noisy condition to study the effect of noise level on concentration.

Advantages: Since participants take part in all conditions, they allow us to control for many inter-individual confounding variables, as well as requiring smaller sample size.

Disadvantages: familiarity with one condition (fatigue, practice etc) may influence performance (order effects). Therefore, counterbalancing - see diagram 2 (b) - is important: half of the sample completes the first condition followed by the second, while the other half of the sample does the opposite (show diagram below).

Answer 106

A

Control of Extraneous Variables. Remember that random assignment and matching are intended to create groups that are highly similar to each other. Within-subjects designs go a step further, creating groups that are identical to each other in most ways. The IQs of the subjects in one condition are identical to those of the subjects in the other conditions because they are the same subjects. The same holds true for most other person variables like race, sex, age, and so on. These designs do not control all extraneous variables to the same degree, however. Subjects’ moods, for example, can still differ from one condition to the next. Also, situation variables or task variables (e.g., time of day, temperature in the room) are still free to differ across levels of the independent variable.
Efficiency in Terms of Subjects and Time. Within-subjects designs are more efficient in their use of subjects and time. For example, a between-subjects design with three conditions and 20 subjects per condition requires 60 subjects. The same study conducted as a within-subjects design requires only 20 subjects. In addition, the within-subjects version can probably be completed in less time than the between-subjects version.
Statistical Efficiency. Within-subjects designs make it easier to detect differences across levels of the independent variable because each subject’s behavior under one condition is compared to that subject’s behavior under the other condition. The best way to see this is with an example.

Answer 107

A

The major disadvantage of within-subjects designs is that they can produce carryover effects. A carryover effect is when having been tested under one condition affects how subjects behave in other conditions.

Answer 108

A

A purposive sample is a non-representative subset of some larger population and is constructed to serve a very specific purpose. A researcher may have a specific group in mind, such as high level business executives. It may not be possible to specify the population - researcher interviews whoever is available.

Answer 109

A

Picks up sample along the way. Such as drug users who are hard to find. One user leads to another.

Answer 110

A

A method of sampling from a population.

When sub-populations vary considerably, it is advantageous to sample each subpopulation (stratum) independently. Stratification is the process of grouping members of the population into relatively homogeneous subgroups before sampling. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then random or systematic sampling is applied within each stratum. This often improves the representativeness of the sample by reducing sampling error. It can produce a weighted mean that has less variability than the arithmetic mean of a simple random sample of the population.

A real-world example of using stratified sampling would be for a political survey. If the respondents needed to reflect the diversity of the population, the researcher would specifically seek to include participants of various minority groups such as race or religion, based on their proportionality to the total population as mentioned above. A stratified survey could thus claim to be more representative of the population than a survey of simple random sampling or systematic sampling.

Similarly, if population density varies greatly within a region, stratified sampling will ensure that estimates can be made with equal accuracy in different parts of the region, and that comparisons of sub-regions can be made with equal statistical power. For example, in Ontario a survey taken throughout the province might use a larger sampling fraction in the less populated north, since the disparity in population between north and south is so great that a sampling fraction based on the provincial sample as a whole might result in the collection of only a handful of data from the north.

The method required to control for the subject variable of cultural background and ethnicity would be stratified sampling.

While still employing random sampling, the teacher would need to ensure that each culture/ethnic group was represented within the sample in the same proportions as they occur within the school population.

Answer 111

A

Instead of having one score per subject,
experiments are frequently conducted in which
multiple scores are gathered for each case

Repeated Measures or Within-subjects design.

Researchers wish to investigate the effect of caffeine levels on sleep patterns. In order to do so, they obtain a random sample of adults, controlling for age and gender proportions within the general population. These participants were then split into three groups keeping these same proportions, and made to undergo each of the three experimental conditions. In random order, these groups were:

Deprived of all caffeine for one week
Allowed to have what is considered a “normal” amount of caffeine in their diet for one week
Given higher levels of caffeine than “normal” for one week

The researchers in this case have employed a repeated measures design as each group of participants was exposed to all three experimental conditions.

Answer 112

A

Matched pairs design is where participants are grouped through the coupling of participants from similar attributes such as age, height, interests etc.

The procedure for matched random sampling can be briefed with the following contexts,

Two samples in which the members are clearly paired, or are matched explicitly by the researcher. For example, IQ measurements or pairs of identical twins.

Matching controls for confounding extraneous variables by equating the comparison groups on one or more variables that are correlated with the dependent variable.

What you have to do is to decide what extraneous variables you want to match on (i.e.., decide what specific variables you want to make your groups similar on). These variables that you decide to use are called the matching variables.

Matching controls for the matching variables. That is, it eliminates any differential influence of the matching variables.

You can match your groups on one or more extraneous variables.

For example, let’s say that you decide to equate your two groups (treatment and control group) on IQ. That is, IQ is going to be your only matching variable. What you would do is to rank order all of the participants on IQ. Then select the first two (i.e., the two people with the two highest IQs) and put one in the experimental treatment group and the other in the control group (The best way to do this is to use random assignment to make these assignments. If you do this then you have actually merged two control techniques: matching and random assignment). Then take the next two highest IQ participants and assign one to the experimental group and one to the control group. Then just continue this process until you assign one of the lowest IQ participants to one group and the other lowest IQ participant to the other group. Once you have completed this, your two groups will be matched on IQ! If you use matching without random assignment, you run into the problem that although you know that your groups are matched on IQ you have not matched them on other potentially important variables.

Answer 113

A

Takes place at a single point in time
Does not involve manipulating variables
Allows researchers to look at numerous things at once (age, income, gender)
Often used to look at the prevalence of something in a given population

Answer 114

A

Individuals or other units of analysis are regarded as members of categoric groups.

One X is a categorical group marker (e.g. sex) to define multiple populations. Group means can be compared. But the instrument may not be equally valid for all groups. Can be extended to before-after design, yielding a “non-equivalent control group” design. Design can be extended to multiple before and after measures for trend studies.

Answer 115

A

Various X (e.g. treatment programs) are implemented at fixed levels, by manipulation or by selection

Answer 116

A

Including variation over time in non-experiments raises many threats to validity because much may change in uncontrolled ways. Panels follow the same individuals at two or more points in time; approximate before-after designs. Panel designs allow disentangling causation where feedback is present. Sample mortality is a major threat to validity in panels; repeated testing may also be a problem. Time series designs follow a single case with multiple measures over time, using the case as it’s own “control.” Reactive measurement can be separated from X (trend in Y, versus correlation with X). History is always a potential problem, as is regression. Analysis of trends as the method for determining intervention effects.
Control-series designs. Add non-equivalent control groups to time-series designs, and allow evaluation of some effects (e.g. maturation, testing, global history).

Answer 117

A

In a one shot case study, the experimental group is exposed to the independent variable (X), then observations of the dependent variable (O) are made. No observations are made before the independent variable is introduced.

Answer 118

A

A single case is observed at two time points, one before the treatment and one after the treatment. Changes in the outcome of interest are presumed to be the result of the intervention or treatment. No control or comparison group is employed.

Answer 119

A

Advantages: control over extrinsic and intrinsic variables, strengthens the validity of causal inferences (internal validity)
Control the introduction of the IV so that they may determine the direction of causation.

Disadvantages: External validity is weak because designs do not allow researchers to replicate real-life social situations. Researchers must rely on volunteer or self-selected subject for their samples. Sample may not be representative of population interest, preventing researchers from generalizing to the population.

Answer 120

A

Advantages: Allow researchers to study in natural environment. They do not require random assignment of individual cases to comparison groups. while this limits internal validity it does allow study where study would other wise be deemed unethical.

Disadvantages: Difficult to make unambiguous inferences. Direction of causation must be theoretically inferred.

Answer 121

A

The one-group posttest-only design is a very weak research design where one group of research participants receives an experimental treatment and is then post tested on the dependent variable.
A serious problem with this design is that you do not know whether the treatment condition had any effect on the participants because you have no idea as to what their response would be if they were not exposed to the treatment condition. That is, you don’t have a pretest or a control group to make your comparison with.
Another problem with this design is that you do not know if some confounding extraneous variable affected the participants’ responses to the dependent variable.

Answer 122

A

The posttest-only design with nonequivalent group includes an experimental group that receives the treatment condition and a control group that does not receive the treatment condition or receives some standard condition and both groups are post-tested on the dependent variable.

While this design includes a control group (which gives something to compare the treatment group with), the participants are not randomly assigned to the groups so there is little assurance that the two groups are equated on any potentially confounding variables prior to the administration of the treatment condition.

Because the participants were not randomly assigned to the comparison groups, this design does not control for differential selection, differential attrition, and the various additive and interaction effects.

Brainscape's Knowledge GenomeTM

Chapter 1 Flashcards

Brainscape's Knowledge Genome^TM