Chapter 1 Flashcards by Oksana Alksnina

What is “big data”?

explosion in secondary data typified by increases in the volume, variety, and velocity of the data being made available from a myriad set of sources

How well did you know this?

Not at all

Perfectly

What is “bivariate partial correlation”?

simple (two-variable) correlation between 2 sets of residuals (unexplained variance) that remain after the association of other independent variables is removed

How well did you know this?

Not at all

Perfectly

What is “bootstrapping”?

approach to validating a multivariate model by drawing a large number of subsamples and estimating models for each subsample
● Doesn;t rely on statistical assumptions about the population to assess statistical significance, instead makes assessment based solely on the sample data

How well did you know this?

Not at all

Perfectly

What is “causal inference”?

methods that move beyond statistics inference to the stronger statement of “cause and effect” in non-experimental situations

How well did you know this?

Not at all

Perfectly

What is “cross validation”?

original sample is divided into a number of smaller-subsamples (validation samples), the validation fit is the “average” fit across all sub-samples

How well did you know this?

Not at all

Perfectly

What are “data mining models”?

based on algorithms that are widely iused in big data applications
● Emphasis on predictive accuracy rather than statistical inference and explanation as seen in satisical/data models such as multiple regression

How well did you know this?

Not at all

Perfectly

What is “dependence technique”?

classification of statistical techniques distinguished by having a variable or set of variables identified as the dependent variable(s) and the remaining variables as independent
● Objective = prediction of the DV(s) by IV(s)
● Depedent variable → presiumed effect of, or response to, a change in the IV(s)
● Independent variable → presumed cause of any change in the DV

How well did you know this?

Not at all

Perfectly

What is “dimensional reduction”?

reduction of multicollinearity among variables by forming composite measures of multicollinear variable through such methods as exploratory factor analysis

How well did you know this?

Not at all

Perfectly

What is “directed acyclic graph (DAG)”?

Graphical portrayal of causal relationships used in causal inference analysis to identify all “threats” to causal inference. Similar in some ways to path diagrams used in structural equation modeling.

How well did you know this?

Not at all

Perfectly

What is a “dummy variable”?

non metrically measured variable transformed into a metric variable
○ Assigning a 1 or 0 to a subject
○ Always have one dummy variable less than the number of levels for the nonmetric variable
■ The omitted category is the reference category

How well did you know this?

Not at all

Perfectly

Effect size

estimate of the degree to which the phenomenon being studied (e.g. correlation or difference in means) exists in the population

How well did you know this?

Not at all

Perfectly

Estimation sample

portion of original sample used for model estimation in conjunction with validation sample

How well did you know this?

Not at all

Perfectly

Validation sample

potion of the sample “held out” from estimation and then used for an independent assessment of model fit on data that wasn’t used in estimation (holdout sample)

How well did you know this?

Not at all

Perfectly

General linear model (GLM)

Fundamental linear dependence model which can be used to estimate many model types (e.g., multiple regression, ANONA/MANOVA, discriminant analysis) with the assumption of a normally distributed dependent measure.

How well did you know this?

Not at all

Perfectly

Generalized linera model (GLZ or GLIM)

similar in form to GLM, but able to accommodate non-normal depedent measures such as binary variables
● Logistic regression model
● Uses maximum likelihood estimation rather than ordinary least squares

How well did you know this?

Not at all

Perfectly

Indicator

Study These Flashcards

single variable used in conjunction with one or more others variables to form a
● Composite measure → combination of two or more indicators

Measurement error

Study These Flashcards

inaccuracies of measuring the “true” variable values due to the fallibility of the measurement instrument, data entry errors, or respondent errors

Metric data

Study These Flashcards

Also called quantitative data, interval data, or ratio data, these measurements identify or describe subjects (or objects)
not only on the possession of an attribute but also by the amount or degree to which the subject may be characterized by the
attribute. For example, a person’s age and weight are metric data.
● = Quantitative data, interval data, or ratio data

Non-metric Data

Study These Flashcards

Also called qualitative data, these are attributes, characteristics, or categorical properties that identify or describe a subject or object. They differ from metric data by indicating the presence of an attribute, but not the amount.
Examples are occupation (physician, attorney, professor) or buyer status (buyer, non-buyer). Also called nominal data or
ordinal data.
● Difference from metric → these indicate the presence of an attribute, but not the amount

Multicollinearity

Study These Flashcards

Extent to which a variable can be explained by the other variables in the analysis.
- As multicollinearity increases, it complicates the interpretation of the variate because it is more difficult to ascertain the effect of any single variable, owing to their interrelationships.

Mutivariate analysis

Study These Flashcards

Analysis of multiple variables in a single relationship or set of relationships.

Multivariate measurement

Study These Flashcards

the use of two or more variables as indicators of a single composite measure
- For example, a personality
test may provide the answers to a series of individual questions (indicators), which are then combined to form a single score
(summated scale) representing the personality trait.

Overfitting

Study These Flashcards

estimation of model parameters that over-represent the characteristics of the sample at the expense of generalizability to the population

Practical significance

Study These Flashcards

assessing multivariate analysis results based on the substantive findings rather than their statistical significance
● E.g. assesses whether the result is useful in achieving research objectives vs just finding whether the result is attributable to chance

Reliability

extent to which a (set of) variable(s) is consistent in what it’s intended to measure ● If multiple measurements are taken, reliable measures will all be consistent in their values - It differs from validity in that it relates not to what should be measured, but instead to how it is measured. ● Consistency of the measure

Validity

extent to which a (set of) measure(s) correctly represents the concept of study ● Degree to which it’s free from any systematic or nonrandom error ● Concerned with how well the concept is defined by the measure(s) (vs teh consistency of measures, as with reliability)

Specificaiton error

omitting a key variable from the analysis, affecting the estimated effects of included variables

Statistical model

specific model is proposed, then estimated and a statistical inference is made as to its generalizability to the population through statistical tests

Summated scales

method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliability of the measurement through multivariate measurement - In most instances, the separate variables are summed and then their total or average score is used in the analysis.

Treatment

Independent variable the researcher manipulates to see the effect (if any) on the dependent variable(s), such as in an experiment (e.g., testing the appeal of color versus black-and-white advertisements).

Type I error

Type I error → probability of incorrectly rejecting H0 ● Saying an effect exists when it actually doesn’t ● = Alpha (α)

Type II error

Type II error → probability of incorrectly failing to reject H0 ● Chance of not finding an effect when it does exist ● = Beta (β) ● 1 - β = power

Power

probability of correctly rejecting H0 (null hypothesis) when it’s false → correctly finding a hypothesized relationship when it exists ● Function of 1. Statistical significance set by researcher for a type 1 error (α) 2. Sample size used 3. Effect size being examined

Univariate analysis of variance (ANOVA)

statistical technique used to determine, on the basis of one DV whether samples are from populations with equal means

Variate

linear combination of variables formed in the multivariate technique by deriving empirical weghts applied to a set of variables specified by the researcher

Chapter 1 Flashcards

(35 cards)