Tidy Modeling Flashcards

1
Q

Models are mathematical tools that can describe a system and capture

A

relationships in the data given to them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Predicting future events, determining between-group differences, map-based visualizations, and pattern discovery are all

A

Purposes for which models can be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The utility of a model hinges on its ability to be

A

Reductive (reduce complex relationships to simpler terms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Purpose of a descriptive model

A

Describe or illustrate characteristics of some data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Descriptive models need not have a purpose other than visually emphasizing an artifact in the data (T/F)

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Producing a decision for a research question or to explore a particular hypothesis is the goal of

A

Inferential models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

An inferential model starts with a predefined hypothesis about a population and produces a

A

Statistical conclusion (rejection of hypothesis, interval estimate, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Inferential modeling techniques typically produce a __________ output

A

Probabilistic (p-value, CI, posterior probability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

To compute probabilistic outputs, probabilistic assumptions must be made about the data and the underlying processes that generated the data because

A

The quality of statistical modeling is highly dependent on the pre-defined assumptions and how well the data fit them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The primary goal of predictive models is that the predicted values have

A

The highest possible fidelity to the true value of the new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Problem type being resolved by predictive models is

A

Estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In predictive models, more interest is vested in the predicted value than

A

Why the predicted value is what it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Predictive models can include measures of uncertainty (T/F)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Most important factor affecting predictive models…

A

How the model was developed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Predictive mechanistic models produce a model equation that

A

Depends on assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In predictive mechanistic models, data are used to estimate…

A

Unknown parameters of the model equation to generate predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In predictive mechanistic models, differential equations are set based on

A

The model’s assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Unlike inferential models, predictive mechanistic models allow for data-driven statements on how well the model performs based on

A

How well it predicts the existing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Empirically-driven models are created with _____ assumptions

A

Vague

20
Q

Empirically-driven modeling most associated with _______ learning

A

Machine

21
Q

KNN modeling is an

A

Empirically-driven predictive model

22
Q

How does KNN work?

A

Given reference data, a new sample is predicted by using the values of K most similar data in the reference set

23
Q

In predictive models, if the structure of the model is good, then

A

The predictions would be close to the actual values

24
Q

Three types of models

A

Descriptive, inferential, and predictive

25
Q

Two types of predictive models

A

Mechanistic and empirically-driven

26
Q

Ordinary Linear Regression model is descriptive when

A

Restricted smoothing splines (similar to LOESS) are used to describe trends in data using OLR with specialized terms

27
Q

OLR is inferential when

A

Statistical results (p-values for ex) are used for inference

28
Q

OLR is predictive when

A

A simple linear regression produces accurate predictions

29
Q

KNN should not be used for inference because

A

Its nature makes the math required for inference impossible

30
Q

The predictive capacities of descriptive and inferential models should not be ignored because of how they model how

A

How variables relate to the probability of outcomes

31
Q

Predictive performance relates to how close the model’s

A

Fitted values are to the observed data

32
Q

Whether a model is appropriate cannot be determined by ______ alone

A

Statistical significance

33
Q

Unsupervised models learn patterns, clusters, or other characteristics of data (understand relationships between variables) but lack

A

An outcome (dependent variable)

Examples: principal component analysis (PCA), clustering, and autoencoders

34
Q

Supervised models have an outcome variable. Examples are…

A

Linear regression, neural networks, etc.

35
Q

Two sub-categories of supervised models

A

Regression (predictable numeric outcome)

Classification (predicts outcome based on ordered or unordered set of qualitative values)

36
Q

Outcomes (what is being predicted) are also known as…

A

Labels, endpoints, or dependent variables

37
Q

Independent variables (used to make predictions) also known as…

A

Predictors, features, or covariates

38
Q

Exploratory data analysis shows

A

How variables are related to each other (distributions, typical ranges, etc.)

39
Q

During EDA, two main questions should be answered, which are

A

How did I come by these data?

Are the data relevant to the problem?

40
Q

Performance metrics should be identified prior to

A

The analysis process

41
Q

Phases of Modeling

A

EDA (iterate between numerical analysis and visualization)

Feature engineering (use existent variables to create new variables)

Model tuning and selection (specifying or optimizing the structural parameters of models)

Model evaluation (assess Model performance, examine residual plots)

42
Q

A main effect is a Model term that contains a

A

Single predictor variable

43
Q

Root mean squared error (RMSE) is used in regression models by taking the difference (residuals) between the

A

Observed and predicted values in calculations

44
Q

Primary approach for empirical model validation is to split the existing pool of data into two distinct sets WHICH ARE

A

Training set - majority of data; used to build model
Test set - determines whether model is successful (should only be looked at once, or it becomes part of the modeling process)

45
Q

Simple random sampling is the most common method used to

A

Split data into training and test sets