Final POLI 399 Quantitative Research Methods Flashcards

1
Q

bivariate regression

A

a way of predicting scores on one variable from those on anther, in which the link between them is represented by a trend line, typically a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

central limits theorem

A

a theorem showing that the sample distribution of the mean becomes normal as sample size increases. it holds for any variable with definable variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

concept

A

universal descriptive word that refers directly or indirectly to something that is observable. act as building blocks and data containers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

deductive theory

A

starts with logical statements linking concepts to form relationships. theoretical relationship formulated in advance. abstract statements about general relationships to concrete statements about specific behavior.
axiom-proposition-hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

determinism

A

key assumption made by the scientific method. an explanation (supported by observable evidence) equals a general pattern which equals determinism. need to assume that a pattern exists justified by how much inter subjective work has been done before you can explain the pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

dummy variable

A

a dichotomous variable created from a categorical variable whose categories we wish to examine separately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

explanation

A

the goals of the scientific method is explanation. explain phenomena using something else which is achieved through variation. explain why some things are related to each other. generalize beyond specific time and place.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

gamma

A

a PRE measure of association for ordinal variables ranging from -1.00-1.00 based on concordant and discordant pairs. if two variables are in perfect agreement, what is the probability of drawing a positive pair. ignores ties (overstates relationship).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

falsifiability

A

scientific claims must be testable so that they CAN be demonstrated to be wrong. scientific claims are never proven or true, there is always an element of uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

hypothesis

A

a conjectural statement of the relationship between two variables. logically implied by a proposition that has clear implications for testing. relationship between IV and DV and what we expect to find.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

inductive theory

A

begins with observations and searches for patterns. move from concrete statements about observations to abstract statements about general relationships. relies heavily on determinism as an assumption. observation to empirical generalization to hypothesis. may not use same data to test theory that prompted its inception.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

informed consent

A

procedure in which individuals choose whether they participate after being informed of facts that would likely influence their decision. whatever a person concerned about their own welfare would need to know prior to making a decision. must be competent, voluntary, have full information and comprehend full risks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

interval

A

one of Steven’s level of measurement in which categories are ordered and we know the precise distances between them. no meaningful zero point so cannot multiply or divide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

intersubjectivity

A

science as a way of knowing, agreement between individuals about how the work is done. shared standards for determining the empirical standard guards against bias. transmissibility: steps must be clear and recorded so someone can repeat your research. replicabililty: when someone does repeat your work, they get the same results. tests for bias, so must be replicable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

intervening variable

A

control variable, held constant while examining IV-DV relationship. has to do with the assumed causal mechanism and when present mediates the relationship between IV and DV. explains why the IV has a causal impact on the DV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

multivariate regression

A

a way of predicting scores on one variable from those on a set of others, in which the link between the dependent variable and each of the others is represented by a trend line, usually a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ordinal

A

one of Steven’s levels of measurement in which the categories are ordered but the distances between the categories and the zero point are not known.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

p-value

A

the probability of obtaining a result of the size we have, or a more extreme one, by chance and it doesn’t exist in the population. 0.05 statistically significant. How confident it isn’t caused by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

proposition

A

abstract statements that express the relationship between two or more concepts in a meaningful way. in the realm of abstract experience where concepts are the components and propositions are the relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

random sample

A

sampling method that ensures every member of the population has a known and non-zero probability of being included in the sample. would need a population list. every member in population has an equal probability of inclusion. can produce extreme samples because every combination of people has an equal probability of inclusion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

spurious relationship

A

one in which the observed correlation between two variables exists because each is affected by a common cause. relationship disappears because spurious variable causes both IV and DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

tau

A

a PRE measure of association that requires a parametric (linear) relationship between two ordinal or higher variables. 0-1 where 0.1 and under is trivially weak, 0.10-0.14 is weak, 0.15-0.19 is moderate, 0.20-0.29 is moderately strong and 0.3 and above is strong. use tau b if square cross tab and tau c if rectangle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

theory

A

enable us to link one concept to another by stating the relationship between them. if arrived at through deduction they are called propositions but if arrived at through induction they are called empirical generalizations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

type 1 error

A

abandoning the null hypothesis when it is true. reject null hypothesis when you should accept. incorrectly infer from the sample that there is a relationship when there actually is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

type 2 error

A

failing to abandon a null hypothesis when it is false. don’t reject null hypothesis when you should reject it. assume there isn’t a relationship when there is one.

26
Q

variation

A

used to describe the distribution of the data or how data is spread out. trying to explain variation is the goal of the scientific method. interquartile range, variance and sum of squares.

27
Q

z-scores

A

individual values of a standardized variable. the exact number of standard deviation units any particular case lies above of below the mean ( most typically occurring value). 2 z-scores out usually 95% of cases on a normal distribution.

28
Q

SPSS

A

statistics package for social scientists

29
Q

What are the hallmarks of the scientific method?

A

empiricism, inter subjectivity, explanation, determinism

30
Q

Empricism

A

every knowledge claim must be verified by systematic observation.
assumes that our senses give us the most accurate info about the world.
explanations must be supported by observable evidence
assumes objective reality exists
guards against bias

31
Q

Bias

A

prejudiced for or against a particular idea or explanation

32
Q

Inter Subjectivity

A

agreement between individuals about how we do work using the scientific method
views science as a way of knowing
shared standards for determining what is acceptable empirically. two parts:
transmissibility: steps followed in research so that someone can repeat your research.
replicability: when your work is repeated, the same results are obtained.
allows for other researchers to test for bias.
journals have replication policies, peer review.

33
Q

Scientific knowledge v.s Common sense

A

risk averse, observe systemically with criteria established in advance, avoid over generalizations, test alternative explanations, make more observations. guard against bias.
v.s
jump to conclusions, overlook contradictory evidence and explain it away.

34
Q

Nature of Scientific Claims

A
  1. never true or proven irrespective of how many times they have been subjected to testing. Still capable of making a knowledge claim.
  2. Must be testable and thus able to be proven wrong or in other words falsifiable.
  3. Disconfirming evidence must always be possible
35
Q

Traditional Critique of Scientific Method

A

push to become more scientific in the 50’s and 60;s in the US
1. human reaction problem
people perform or perceive differently when they know they’re being observed. problem with empiricism because objective reality cannot be observed.
however, reactivity is not an insurmountable methodological barrier.
2. can never be value free because value laden individuals studying value laden phenomena. problem with inter subjectivity and bias. value commitments should be recognized and made explicit as is the case with all scientific work Gunnar Myrdal. testing alternative interpretations and forming definitional agreement across a the spectrum of value commitments is powerful.
3. too complicated to can’t explain in a generalizable way. problem with determinism. also not insurmountable. empirical laws do exist.
4. free will. causation is impossible because humans are free to behave as they wish. problems with determinism. confined by context they are in and bound by law. there are not endless possibilities so a pattern will eventually emerge.
5. every person is unique and behaves differently. the idea that nothing is shared is not a compelling critique and not necessarily true.

36
Q

Feminist Critique of Scientific Method

A

women absent from the work as researchers and subjects. critique methodological norms.

  1. philosophical: previously, researchers asserted their work was value free but this is not the case anymore due to this critique. science cannot be value free.
  2. moral: research ethics. treat people as humans not subjects. foundational work in response to this.
  3. practical: most forceful at this point. the inclusion of women and poc should be mandatory if attempting to generalize to the population otherwise the conclusion will be misrepresentative and the conclusion will be distorted.
37
Q

Chi-Square and Assumptions

A

an inferential statistic (used to generalize from sample to population) used to discern statistical significance or the probability that the relationship occurred by chance. can have type 1 or type 2 error.
theoretical probability distribution that gives the likelihood of each degree of the relationship occurring in the sample if there was no relationship in the population from which the sample was draw.
assumes no relationship and compares it to the relationship at hand.
p value of 0.05 or less indicates that researcher can be 95% confident the relationship did not occur by chance.
1. hypothesized relationship in advance
2. random sample, know odds of inclusion for everyone in pop and no one has a 0 chance.
3. no more than 25% of cells have an expected frequency of less than 5.
4. a non-significant chi square means no relationship but it does not indicate that the sample is not representative of the population.

38
Q

Steps for Chi-Square

A

Step 1: state the null hypothesis which asserts that there is no relationship between the IV and DV. No relationship would mean in a cross tabulation there would be no gaps across the columns. The goal is to reject the null hypothesis.
Step 2: calculate the expected cell frequencies. if there is no relationship between the IV and DV then the cell percentage should be the same as the marginal row percentage. column marginal x row marginal/ total number of cases.
Step 3: compare expected cell frequencies with observed cell frequencies.
Step 4: adjust the sample size
Step 5: calculate degrees of freedom (# of columns-1) (# rows-1).
Step 6: consult the theoretical chi-square distribution.

39
Q

PRE Measures of Associations

A

a measure of association is a descriptive statistics that tells the researcher how strong the observed relationship between the IV and DV is.
the appropriate measure of association is dependent on the level of measurement and must correspond to the lowest one.
PRE’s (proportionate reduction in error) are a specific type of measure of association as they can be interpreted in a certain way. They indicate how much the researchers predictive ability for values of the DV increases when cases of the IV are known. Can be interpreted as the percentage increase in predictive ability from 0-0.1. For example, 0.3 30% increase in predictive ability. strong relationship.
Should always try to use PRE measures of association because they offer the most precise measure of association. Can say with percentage precision how much predictive ability is increased. Non pre can only state the degree of the relationship.
Nominal- Lambda. asymmetric measure. best prediction is the modal category, so analyze what the mode is for every IV category and total error without knowing the IV-total error when the IV is known/total number of cases. Misleading because always 0 if modal category is the same for all categories of IV.
Ordinal or Higher: Gamma, symmetric, if two variables are in perfect agreement, what is the probability of drawing a positive pair: a pair of cases ranked in the same order on both variables. -1-1, perfect inversion to perfect agreement. ignores every case that has a tie for the IV and DV, overstates relationship.
Taub for square tables, tauc for rectangle tables.

40
Q

Designing a Sample

A

a sample is data collected from a select group of people within a population which is every member in the group being studied.
two methodological categories: probability v.s non probability
probability: every person in the population has a non-zero probability of being included in the sample and that probability is known. need a population list or a surrogate population list. Advantageous as it is a way to avoid bias and use inferential statistics.
non-probability: the probability of a person being included from the population are not known and it is not certain if everyone in the population has a non-zero chance of inclusion. cannot do inferential statistics. Economic and convenience considerations as well as no access to full population list.
increase accuracy, increase size of the sample or reduce variability by stratifying

41
Q

Random Sampling 1

A

Simple Random Sample: every member in the population has an equal non zero probability of being included in the sample. can be created using a lottery method for a small population or a random number generator for a large population. can produce extreme samples bc equal prob.

42
Q

Random Sampling 2

A

Systematic Random Samples: the size of the total population is divided by the researchers desired sample size. this creates a sample interval (k). Using a population list, randomly select the first person and then choose every kth person. reduces risk of extreme samples but can produce them if there is a cyclical pattern in the population list.

43
Q

Random Sampling 3

A

Proportionate Stratified Random Samples: when the researcher is aware of certain characteristics that must be sampled from the population. adjustment of groups holding certain characteristics to correct for their population weight, or the proportion of that group within the population. three steps: the population is divided in to groups that hold the same characteristics. these groups are homogeneous and the characteristic by which the groups are being created are called stratification variables but must be related to chosen phenomena that is being studied. use a simple random sample to select a sample from within the homogeneous groups. combine each sample to produce a sample that is representative and weighted to the population.
produce less extreme sample due to stratification but it may be difficult to operationalize a theoretically important characteristic.

44
Q

Random Sampling 4

A

Disproportionate Stratified Random Sample
CES
adjustment of groups holding certain characteristics to correct for their population weight, or the proportion of that group within the population. three steps: the population is divided in to groups that hold the same characteristics. these groups are homogeneous and the characteristic by which the groups are being created are called stratification variables but must be related to chosen phenomena that is being studied. use a simple random sample to select a sample from within the homogeneous groups. combine each sample to produce a sample that is representative and weighted to the population. however, over sample certain groups or strata and under sample others over or under their population weight.
allows for stratum that are small but important from a theoretical perspective to be analyzed statistically.
could be the territories or indigenous people.

45
Q

Random Sampling 5

A

Multi Stage Random Cluster Sample
When there is no population list available the researcher randomly select groupings of members in the population. This process is repeated for different randomly selected groupings of members in the population but this confounds the risk of sampling error.
study a geographic area start with municipal districts, then move to cities or towns then eventually households. three stage random cluster sample.

46
Q

Non Probability Sampling

A

CANNOT USE INFERENTIAL STATISTICS
WORST TO BEST
1. convenience sampling. use whoever is the easier to access, very unlikely to get a sample that is representative of the population. online polls and businesses like A & W who stop people on the street to try their food.
2. purposive. researcher uses own judgement by examining the population and applying own expertise to the sample. try to make sample as representative as possible and can yield representative results but can also be biased.
3. quota sampling similar to stratifying the population in psrs dsrs. create a sample that mirrors the population but ultimately up to the researchers discretion what that would look like.

47
Q

Sample Size

A

population size has nothing to do with sample size, population dispersion does.
to use inferential statistics, must be able to infer from sample to the population.
when formulating sample size must consider the amount of error that researcher is prepared to tolerate, estimation of the population dispersion for a given variable using population standard deviation and the z-value which corresponds to the level of confidence.
when there is low error, the sample is larger. populations with lots of dispersion (widely scattered data) are bigger samples.
Central Limits Theorem: sampling distribution of the sample means. take multiple samples and find the mean of each sample. as the number of samples taken increases or the sample size gets larger, approach normal distribution (mean, median, mode all the same and all highest point on curve. symmetrical. 2 SD 95%). regardless of the population distribution, the mean of all the sample means is a true approximation of the population mean.
If you have a normal distribution(according to central limits theorem sampling distribution of sampling means you should) can convert data in to z-scores to estimate the probability of any range of values occurring around the mean. SD units, refers to what area under the normal distribution is covered in percentages. (1- 68%, 2-95%, 3-99%).
Confidence Intervals are the range around the sample estimate that the researcher is confident the real value is. determine how well the sample reflects the population. 1. estimate the error around the sample’s mean and where it lies on the distribution of the mean of all samples taken. (SD of pop/square root sample size if don’t have SD pop sub SD sample) 2. multiply by the level of confidence (68, 95 or 99 according to how much error willing to tolerate). 3. using a z score 1 z score = 1.64, 2 z-scores - 1.96 and 3 = 2.57

48
Q

Sampling Error

A

the difference between the sample and the population.
however, scarcely are certain enough about the population value to compare the sample.
low error means a larger sample
population with a lot of dispersion a larger sample

49
Q

Central Limits Theorem

A

Central Limits Theorem: N= 30+. sampling distribution of the sample means. take multiple samples and find the mean of each sample. as the number of samples taken increases or the sample size gets larger, approach normal distribution (mean, median, mode all the same and all highest point on curve. symmetrical. 2 SD 95%). regardless of the population distribution, the mean of all the sample means is a true approximation of the population mean

50
Q

Confidence Intervals

A

Confidence Intervals are the range around the sample estimate that the researcher is confident the real value is. determine how well the sample reflects the population. 1. estimate the error around the sample’s mean and where it lies on the distribution of the mean of all samples taken (SD pop/ square root sample size sub sample SD if no pop SD) 2. multiply by the level of confidence (68, 95 or 99 according to how much error willing to tolerate). 3. using a z score 1 z score = 1.64 64% confident, 2 z-scores - 1.96 around 95% confident and 3 = 2.57 99% confident

51
Q

Interpreting Regression Tables

A
  1. look for stars. no stars = support for null hypothesis because no statistical significance
  2. look for sign. negative sign means negative relationship. as IV (x) goes up, DV (y) goes down. positive sign means positive relationship. as IV (x) goes up, DV (y) goes up.
52
Q

Bivariate Analysis Interval/Ratio Regression

A

If both IV/DV interval/ration recode into ordinal variables or dichotomies and run crosstabs
calculate correlation coefficient (r)
run a regression

53
Q

Correlation Coeffecient

A

is Pearsons r measure the strength of the relationship between x and y (IV and DV).
-1 - 1 where -1 is perfect negative relationship and +1 is perfect positive relationship.
measure of association NOT causality so spuriousness is still an issue. cannot definitively state that x causes y.
measures how close the observations are to the straight line of best fit. only works on linear relationships. the further away observations lie from the line of best fit, the weaker the relationship.

+/- 0.01-0.30 is a weak correlation. most polisci work falls
+/- 0.31-0.70 moderate
+/- 0.71-0.1 strong.

54
Q

R^2

A

explains how good the entire model is at explaining variation in the DV. how well IV explains DV
r^2 x 100 is equal to a PRE measure and is the percentage of variation in Y explained by knowing X.
stronger the relationship the steeper the line
the regression line is the best guess for y for every value of x. the regression line counts all variation in y for every x value then draw a straight line that minimizes variation.
DV is equal to a= the constant at which IV=0 and the regression line intersects the y axis + the slope of the regression line (how strong the relationship is) x the independent variable + the error term
error term: estimate of how much error there will be (there is no such thing as no error) called residual and it is the best measure of prediction of error (must be small).

55
Q

Multivariate Regression

A

several x’s and one y or in other words several IV’s and their effect on one DV to reduce spuriousness and find the real causal influence on the DV
more dominant in empirical polisci bc
must examine three statistics:

  1. r2, proportion of variation in the DV that is explained by the combination of IV in the model
  2. F test- test for statistical significance that can be interpreted like chi-square. does this relationship hold true in the population from which the sample was drawn from ORRRRRR nah did it happen by chance
  3. T test- statistical significance we should have in the coefficient being significantly different than zero. the slope of the linear relationship between the IVs and DV
56
Q

Concepts Short Answer

A

Concepts: abstractions with clear definitions. universal descriptive word that must refer to an observable phenomena (indirectly or directly.) act as building blocks for theories as variation in concepts leads to theories. the relationship between components in the abstract level of experience, concepts, form theories. data containers directs what must be observed. can be

can be defined in three ways. move from 2 to 3 is increasing in focus.

  1. real (essential nature or attributes. not definition used in empirical research)
  2. nominal: name the concept and the properties of the associated phenomena that the concept represents. usually some consensus.
    - clear and explicit with assumptions clearly stated. must be in accordance with transmissibility to guard against bias
    - precise. indicates what should be excluded and included.
    - non circular. must find a way to explain a phenomena that does not use or repeat the same words or ideas
    - positive. must define the concept by what it is not what characteristics it lacks. no positive statements
  3. operational: after the nominal definition has been established, operational definitions indicate the observations that will be used to represent the concept in the real of empiricism. related to measurement validity (did we measure what we said we were going to measure).
57
Q

Concepts provide the basis for…

A

Classification: sorting of phenomena by defining concepts and categorizing them in mutually exclusive groupings.
Comparison: discerning whether there is more or less of a given concept
Quantification: measuring how much or less of a concept there is. statistical analysis at this point. anything that is countable falls here.

58
Q

Operationalization of Concepts and Theory Building

A

in order to operationalize a concept, it must have a direct observable counterpart, an indirect observable counterpart through an operational definition of must form relationships with other concepts to form theories. a concepts empirical counterpart is a variable. not yet observation. when operationalizing a concept, can lose parts of the concept that do not transfer over in to operational definitions or variables.

stating the relationship between concepts is the process of theory building. depending on the research method- inductive empirical generalizations or deductive propositions. important as concepts are abstractions, steps to move to the empirical realm. in order to build theories need concepts and theories are important as they explain how and why there is linkage between concepts, organize knowledge and help generate hypothesis. all steps to take before moving form the realm of abstraction to the realm of

59
Q

Reading Crosstabs

A

IV is the column, DV is the row
cell frequency is the number of people in the box
column marginal is total for the column
row marginal is total for the row
ordinal (can’t remove categories but can collapse into 3 or less categories) only interpret the top and bottom row. curvilinear gaps across columns cannot be congruent with the hypothesis since it is a linear conjecture.
nom can remove, interpret row that is named in hypothesis.
percentages calculated based on number of cases in each category of the IV. column percentages.
percentage point gaps ACROSS columns. 1-4 trivial, 8-10 significant.

60
Q

Guest Lecturer Name

A

John Santos

61
Q

Room Number

A

Science Theaters 147

62
Q

Threshold Effect

A

Ordinal level crosstabs
gaps across categories are large then go small. relates to where the IV has the most power to move values on the DV.
ensure that it does not become a curveliner gap across columns.