FA4 + M4 - Sheet1 Flashcards

(53 cards)

1
Q

Which of the following activities are associated with Data Exploration?

Data cleaning
Data augmentation and transformation
Exploratory data analysis
Feature selection
Identify data dependencies and correlations
Identify trends or anomalies in the data

A

Exploratory data analysis
Identify data dependencies and correlations
Identify trends or anomalies in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following activities are associated with Data Exploration?

Group of answer choices

Identify data dependencies and correlations

Identify trends or anomalies in the data

Exploratory data analysis

Data cleaning

Feature selection

Data augmentation and transformation

A

Identify data dependencies and correlations

Identify trends or anomalies in the data

Exploratory data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of the following activities are associated with Data Modification?

Group of answer choices

Data cleaning

Data augmentation and transformation

Exploratory data analysis

Feature selection

Identify data dependencies and correlations

Identify trends or anomalies in the data

A

Data cleaning

Data augmentation and transformation

Identify trends or anomalies in the data

hindi dapat identify, feature selection dapat :/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which activity involves adding new data points or modifying existing ones to improve the dataset?

Group of answer choices

Data augmentation

Data cleaning

Exploratory data analysis

Feature selection

A

Data augmentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following is NOT typically a part of Data Exploration?

Group of answer choices

Cleaning the data

Identifying data dependencies

Identifying trends in the data

Exploratory data analysis

A

Cleaning the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which activity is crucial for understanding the relationships between different variables in a dataset?

Group of answer choices

Identifying data dependencies and correlations

Data cleaning

Data augmentatio

Feature selection

A

Identifying data dependencies and correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the data say will happen?

A

Predictive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What has happened or what is happening now?

A

Descriptive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why it happened?

A

Diagnostic Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What will likely happen?

A

Predictive Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Predictive Analytics Process:

A

Project Design
Data Sampling
Data Exploration
Data Modification
Model Validation
Model Development
Project Design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Project Design:

A

Kickoff meeting
Understand modeling objective
Define acceptance criteria
Document data and deployment requirement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Sampling

A

Data extraction
Apply filters and exclusions
Identify external data sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data Exploration

A

Exploratory data analysis
Identify data dependencies and correlations
Identify trends or anomalies in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Modification

A

Data Cleaning
Data augmentation and transformation
Feature selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Model Validation

A

Model performance review
Feedback based on business knowledge and inputs from subject matter experts (SME’s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Model Development

A

Apply different modeling techniques and select final methodology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Dependent Variable (Value to be predicted)

A

y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Beta coefficient (Rate multiplied to X)

A

6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Independent variable (Value driving prediction)

A

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Alpha intercept (Baseline figure for y)

A

α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Error term (Balancing figure)

23
Q

To account for unexplained variability in the dependent variable for other relevant independent variables, which may not have been included in the model

A

Inclusion for the Error Term

24
Q

To capture measurement error in both the dependent and independent variables

A

Inclusion for the Error Term

25
You can have more than one predictor variable (x1 - xn)
Multiple Linear Regression
26
Training vs. Validation vs. Test Data
Splitting the Dataset
27
Can I use the model already for prediction purposes? You still need to investigate the model’s ______ You need to prove if your predictors are ____
goodness-of-fit. significant
28
The ________ , is a goodness-of-fit measure
coefficient of multiple determination, R^2
29
___ is a figure of merit
R^2
30
the ____ the R^2, the better is the success of the model in explaining the variation in the response using the set of predictors
higher
31
___ is normally expressed as a percentage and is interpreted as the amount of variability in the response explained by the independent variables
R^2
32
The _____ is a decomposition of the total variation in the response into explained (pattern) and unexplained (error) parts
ANOVA
33
ANOVA meaning:
Analysis of Variance
34
The ____ variability is the amount of variation in the response variable that may be attributed to the predictors explicitly state in the model
explained
35
The _____ variability is the amount of variation attribute to random error
unexplained
36
SS refers to
Sum of Squares
37
There is good fit if the Regression Sum of Squares is ____ than the Residual Sum of Squares
much larger
38
The df column refers to the ____
degrees of freedom
39
The df for Regression is always the ________
number of regression parameters minus one
40
The df for Residual, it is the sample size minus the _____
number of regression parameter
41
The total df is the _____
sum of those two degrees of freedom
42
MS refers to _____.
Mean Squares
43
The values in this column are the ratio of each sum of square to their respective degrees of freedom.
Mean Squares
44
have no physical meaning but are instrumental in computing the F-statistic
Mean Squares
45
Mean squares have no physical meaning but are instrumental in computing the _____
F-statistic
46
The ____ determines if regression is meaningful for the data at hand
F-test
47
When the ____ is small. it means that there is at least one significant predictor in the analysis
p-value
48
When the p-value is _____. it means that there is at least one significant predictor in the analysis
small
49
When p is ___, Ho must
low, go
50
The p-value is _____ than the a significance level
low if it is less
51
The ___ helps in assessing if an individual predictor is significant
t-test
52
If p <0.05:
significant predictor
53
if p >0.05:
insignificant predictor