final Flashcards

Question

Testing validity

Answer 1

Face– not test, visual inspection Content– no test, visual inspection/theoretical deep dive Criterion, convergent, discriminant– hypothesis tests

Answer 2

wanting to establish relationships with grouping variables

Answer 3

wanting to establish relationships with other numeric variables

Answer 4

All research studies have error Research design flaws– confounds, equipment failure, poor measurement/manipulation Participants– lack of motivation/attention/understanding/human error Data coding and entry– coding/entry errors

Answer 5

extreme values, usually impassibly extreme Two types Error outliers– values that look extreme because of a mistake Not real (ex– coding mistake, entering wrong) Interesting outliers– values that are extreme but are not mistakes Exceptions to general trends, usually worthy of follow-up

Answer 6

Our our variables are normally distributed, we have a sense of how unlikely it is to see extreme values Can use z scores Want to find scores higher than 2.24 in either positive or negative direction If they do, calls for further investigation

Answer 7

Verify weather the outliers are meaningful or just errors Determine if they are impossible Check study logs and raw data Impossible if it's outside the scales range Correct errors when you find then If you can't find an error or are unsure, treat it as interesting Influential outlier– an outlier than changes results based on if its present or not If there's an outlier, run it with and without the outlier and report that that's what you are doing Step one is still finding out if its an error or interesting

Answer 8

People who misunderstand or don't carefully respond to what they are being asked Not fully engaged Ex– answering strongly agree for all the answers Typically seen at research pools at universities or surveys, but can happen in lab tasks too Their data is essentially noise

Answer 9

if people are paying attentions, they should all give the response Subtle– I was born on february 30th Does not exist, so should always be false Usually better to go with something more subtle Over– please answer 2 for this question

Answer 10

Asking if participant answered all of the items thoroughly/ what strategy did you use to answer the items Problem is people usually lie Better– asking open ended question asking what their approach was to answer questions Free response questions can be very telling Bots will usually report an answer that doesn't make sense

Answer 11

looking at how fast participant took survey Need to pilot the study to figure out a reasonable response time Then subtract a small value to account for someone exceptionally fast Online survey programs will track time per page In the lab, administers can track time themselves

Answer 12

someone who is always answering the questions in the exact same way Long strings of the same answer Approach– calculate individuals Sds across items to assess variability Choose a minimum SD cutoff in advance Controversial because it only works if you have positively and negatively worded items Poor example– life satisfaction scale Good example People would decibel me as someone willing to share my time with others Maintaining close relationships is difficult to me

Answer 13

Always want to be on the side of inclusion Plan your sample size to allow for someone inattentive respondents can't ethically require people to cooperate or punish them for low effort (withholding compensation) Conduct once with inattentive responses and once without to see if anything changes Choose cutoff in advance and report how many people you dropped

Answer 14

Have to honor when people just don't answer the question– can’t ask people to go back and finish Some R functions won't work Calculation that assume you have complete data may be incorrect Biggest issue– reduces sample size/ generalizability Need to understand how much missing data we have and consider that in our analyses and conclusions

Answer 15

missingness that is unrelated to any study variable Won't impact conclusion No correlation to missing data and other variables in data set

Answer 16

missingness that can be fully accounted for by other variables Won't impact conclusion missingness can be fully accounted for by other variables in data set Reason exists why they dont answer the question

Answer 17

data you chose not to collect Best missing data, not a problem Choosing to give some people some questions and other people other questions Purpose is to shorten survey

Answer 18

Questions that are unclear, too sensitive, or inappropriate Ex– asking gender identity and only having male/female option Questions or measure at the end of a long study Attrition– survey is too long, people stop answering Participants with certain characteristics skip items about those characteristics Ex– people with anxiety won't respond to items about anxiety because of their anxiety

Answer 19

completely delete or ignore any participant that is missing data on any variable in your analysis pros– all analyses now have the same sample size Cons– you're losing data

Answer 20

use all of the data you can, exclude participants only when you don't have enough information to complete an analysis Pros– you can use all the data you have Cons– diff sample sizes for diff analysis mean diff levels of power and precision

Answer 21

maximum likelihood estimation– imputations that occurs during the estimation process of complex analysis Uses all available data for each person Determine their most likely value would be based on available data Estimates model parameters based on these values Works pretty well as long as Proportion of missing data is not too large You are confident that data are missing at random/planned Model will estimate what the person will look like and include them in estimation if you call for it

Answer 22

imputation Mean imputation– the mean is our best guess at any single value Problem– substituting the mean for missing values can distort the variances because were usually trying to explain variance, that's a big problem Replace missing data with avg value However, will mess up variance and may not be reflective of what they actually look like Multiple imputation solves this variance problem Impute several plausible values Run analysis with all plausible values Pool the results to obtain a stable estimate

Answer 23

Response rate– the percentage of people who actually participated out of the total number invited In research, we invite a lot of people who might not actually participate We can't make conclusion about people who didn't participate At minimum, we need to report our response rate when we can

Answer 24

Usually nominal or ordinal variables Typically we don't want to group/classify people Instead, grouping things on a more dimensional level Ex– to what extent are you depressed Want to see if the rank on one scale relates to the rank on another scale Why we use correlation

Answer 25

Statistical methods to measure and describe the linear relationship between two continuous, numeric variables Linear relationship– changes in one variable tend to be accompanied by consistent changes in the other variable Predictable relationship Have a rating for everybody in your data set Almost never talking about an experimental design, instead its an observational study Survey correlating two variables/ peoples scores together Naturally occurring, no manipulation

Answer 26

Individual difference measures– personality, intelligence Key– how high is your intelligence level, not are you intelligent or not

Answer 27

Prediction testing Ex– does SES predict health Can't randomly assign people to have low SES When ethically constrained, correlation testing is the next best option Validity of a questionnaire Validity– measures accuracy Not talking about cause, just want to know if it relates Reliability of a questionnaire Reliability– measures consistency Use for test retest

Answer 28

scatterplots Each person has to have data on two continuous variables Meet at the point where x and y axis score line up Point on scatterplot is defined by score on both variables

Answer 29

Form– linear, curved, clusters, no pattern Direction– positive, negative, no direction Positive– straight line, going from left to right Negative– straight line, going from right to left Strength– how closely the points for the main form If its linear, how close to the line are they If its close, indicates a strong relationship Perfectly horizontal line– no relationship

Answer 30

Correlations are not robust against restriction of range Ex– variables of age is from 40-68 First 40 years are not accounted for, meaning you have a restriction of range Rescaling of variables does not change correlations Outliers can make your correlation look more stronger or weaker than it actually is Could be error or interesting

Answer 31

Absolute value– tells us the strength of the relationship Sign– tells us the direction of the relationship Looking at the extent to which the variables covary with one another More shared variability– stronger relationship, higher correlation Vice versa for lower shared variability

Answer 32

One predictor and one outcome Simple linear regression– assessing the relationship between one predictor and one outcome Matters which variable is predictor and which is outcome In write up can't say x causes y, have to say x predicts y Results are scaled based on the outcome (y) Enables us to predict y based on x with a linear model

Answer 33

We represent the relationship between x and y with a straight line A lot of times data is not linear or necessarily correct, but they can still be useful Using a straight line Keeps things simple and easy to see Identifies the midpoint of the relationship between x and y Takes the actual mean Allows us to make predictions How to draw this line Use formula– y=bx + a Same thing as y=mx + b A– intercept Point to expect line to intercept with x axis When x = 0 B– the slope of the line How steep it is Larger b values = steeper slope Direction Positive slope– sloped up from left to right Negative slope– slopes down from left to right

Answer 34

Least squares solution Want to find the line, on average, that is best representing the data set Minimizes error Want to take the line with the least amount of vertical distance between predicted data point and actual data point Problem– just because it has the lowest error of all possible options, does not mean there is not a error Just because it's the best does not mean its automatically good

Answer 35

Standard error of the estimate– standard distance between the actual and predicted values of y Taking vertical distance and finding average/squaring Want a small standard error Smaller the standard error/closer to 0, better model will perform Problems with just interpreting standard error Depends on scale of measure How to get everyone to agree Effect size/goodness of first– we can find r^2 for our regression model Just like n^2, interpret as the proportion of variance in the outcome that is explained by the predictor How much variance are we explaining in the model r^2 of .34– 34% of the variance in y is accounted for by the model

Answer 36

Regression and ANOVA– same analysis with the same process Both break down variance Borth use omnibus f-ratios to test the overall model before assessing the various components of the model Only diff Use regression when you have continuous predictors Use ANOVA when you have discrete, grouping predictors

Answer 37

Null– slope of regression is zero Alternative– slope of regression is not zero Overall significance of the regression equation can be evaluated by computing a f ratio To compute the f ratio, you first calculate a variance of MS for the predicted variability and for the unpredicted variability

Answer 38

b>0 Positive correlation means slope will be positive as well

Answer 39

2 predictions, 1 outcome Def– regression analysis involving more than one predictor Why psych needs them Things are complex– any one predictor can only explain so much Because things are complex, some people might run several linear regression models Pointless because predictors are all related so we need to do it in one test

Answer 40

many variables are correlated, at least to a small extent Just adding variables to the model does not mean better predictor accuracy predictors with the least overlap possible is the most valuable How much unique variability are you adding? Too related means adding virtually nothing

Answer 41

Determined by a least squared error solution Minimize squared distance between the actual y value and the predicted y value Same as simple linear regression, but now two or more predictors Y = b1x1 + b2x2 + a Adding second slope and second variable X1 and x2– two diff predictor variables B1 and b2– regression coefficients (slopes) for those variables Intercept a is the predicted value when both x1 and x2 are 0

Answer 42

We can calculate standardized regression coefficient by transforming all the raw scores to z scores before we gein the analysis Extremely important to do in multiple regression Looks like italicized b Unstandardized b coefficients don't have a natural scale so they are not directly comparable Interpreted in terms of standard deviation Slopes mean nothing and can't compare the size of slopes if you don't standardize them

Answer 43

Mix between anova and regression Can have continuous and grouping predictors at the same time Usually, the grouping variables are manipulated– independent The continuous variables are measured Start with a simple regression model predicting your dvr from your covariates R After fitting the regression model, use anoa to understand the residual variance

Answer 44

whatever variance wasn’t explained by the initial regression Error or noise

Answer 45

Achieving equal impacts of confounds across levels/conditions of an independent variable Extraneous– anything that differs across people Confounding– when an extraneous variable systematically differs across experimental groups When we anticipate a confound variable, we should measure is and statically control for it in our models

Answer 46

Statistically removes vibrant form extraneous/confounding variables by holding their effects constant across groups Tells us what the effect of the iv is above and beyond the effects of the extraneous or confounding variables

Answer 47

Need to know what the extraneous/confounding variable is beforehand Fixing the problem after the fact Systematic differences already exist– essentially just putting a band aid over it Better to eliminate from the outset More of a hail mary to fix study Suggests that there may be problems with the study

Answer 48

Measures consistency across items

Answer 49

Measures consistency across time Testing people once and then retesting at a later date to see if changes occured Want to do it on constructs you don't expect to change depending on context

Answer 50

Measures consistency across raters

Answer 51

Is the new measure related to other measures of the same construct Want to make sure it correlates to other established measures

Answer 52

Correlating it with test of a different construct that it theoretically should relate to Ex– stress and negative emotion are two diff constructs, but should be relates

Answer 53

Pick an algorithm and then pick the amount of imputations you want to do Algorithm comes out with multiple possible values based on data Allows you do get multiple estimates Taking pooled estimate across all of them instead of just picking one Safer and better than just taking the mean

Answer 54

Too much statistical power Means everything will be statistically significant Larger samples need smaller test statistics to be significant

Answer 55

“While holding x constant”-- multiple regression

Answer 56

Tells us what % of variance is accounted for Ex– 36– 36% of variance is accounted for by predictor variable Also tells us what is not accounted for 64% is not accounted for