Tidy Modeling Flashcards

(45 cards)

1
Q

Models are mathematical tools that can describe a system and capture

A

relationships in the data given to them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Predicting future events, determining between-group differences, map-based visualizations, and pattern discovery are all

A

Purposes for which models can be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The utility of a model hinges on its ability to be

A

Reductive (reduce complex relationships to simpler terms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Purpose of a descriptive model

A

Describe or illustrate characteristics of some data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Descriptive models need not have a purpose other than visually emphasizing an artifact in the data (T/F)

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Producing a decision for a research question or to explore a particular hypothesis is the goal of

A

Inferential models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

An inferential model starts with a predefined hypothesis about a population and produces a

A

Statistical conclusion (rejection of hypothesis, interval estimate, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Inferential modeling techniques typically produce a __________ output

A

Probabilistic (p-value, CI, posterior probability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

To compute probabilistic outputs, probabilistic assumptions must be made about the data and the underlying processes that generated the data because

A

The quality of statistical modeling is highly dependent on the pre-defined assumptions and how well the data fit them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The primary goal of predictive models is that the predicted values have

A

The highest possible fidelity to the true value of the new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Problem type being resolved by predictive models is

A

Estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In predictive models, more interest is vested in the predicted value than

A

Why the predicted value is what it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Predictive models can include measures of uncertainty (T/F)

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Most important factor affecting predictive models…

A

How the model was developed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Predictive mechanistic models produce a model equation that

A

Depends on assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In predictive mechanistic models, data are used to estimate…

A

Unknown parameters of the model equation to generate predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In predictive mechanistic models, differential equations are set based on

A

The model’s assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Unlike inferential models, predictive mechanistic models allow for data-driven statements on how well the model performs based on

A

How well it predicts the existing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Empirically-driven models are created with _____ assumptions

20
Q

Empirically-driven modeling most associated with _______ learning

21
Q

KNN modeling is an

A

Empirically-driven predictive model

22
Q

How does KNN work?

A

Given reference data, a new sample is predicted by using the values of K most similar data in the reference set

23
Q

In predictive models, if the structure of the model is good, then

A

The predictions would be close to the actual values

24
Q

Three types of models

A

Descriptive, inferential, and predictive

25
Two types of predictive models
Mechanistic and empirically-driven
26
Ordinary Linear Regression model is descriptive when
Restricted smoothing splines (similar to LOESS) are used to describe trends in data using OLR with specialized terms
27
OLR is inferential when
Statistical results (p-values for ex) are used for inference
28
OLR is predictive when
A simple linear regression produces accurate predictions
29
KNN should not be used for inference because
Its nature makes the math required for inference impossible
30
The predictive capacities of descriptive and inferential models should not be ignored because of how they model how
How variables relate to the probability of outcomes
31
Predictive performance relates to how close the model's
Fitted values are to the observed data
32
Whether a model is appropriate cannot be determined by ______ alone
Statistical significance
33
Unsupervised models learn patterns, clusters, or other characteristics of data (understand relationships between variables) but lack
An outcome (dependent variable) Examples: principal component analysis (PCA), clustering, and autoencoders
34
Supervised models have an outcome variable. Examples are...
Linear regression, neural networks, etc.
35
Two sub-categories of supervised models
Regression (predictable numeric outcome) Classification (predicts outcome based on ordered or unordered set of qualitative values)
36
Outcomes (what is being predicted) are also known as...
Labels, endpoints, or dependent variables
37
Independent variables (used to make predictions) also known as...
Predictors, features, or covariates
38
Exploratory data analysis shows
How variables are related to each other (distributions, typical ranges, etc.)
39
During EDA, two main questions should be answered, which are ...
How did I come by these data? Are the data relevant to the problem?
40
Performance metrics should be identified prior to
The analysis process
41
Phases of Modeling
EDA (iterate between numerical analysis and visualization) Feature engineering (use existent variables to create new variables) Model tuning and selection (specifying or optimizing the structural parameters of models) Model evaluation (assess Model performance, examine residual plots)
42
A main effect is a Model term that contains a
Single predictor variable
43
Root mean squared error (RMSE) is used in regression models by taking the difference (residuals) between the
Observed and predicted values in calculations
44
Primary approach for empirical model validation is to split the existing pool of data into two distinct sets WHICH ARE
Training set - majority of data; used to build model Test set - determines whether model is successful (should only be looked at once, or it becomes part of the modeling process)
45
Simple random sampling is the most common method used to
Split data into training and test sets