L:2 Data Handling Flashcards

1
Q

What does CRISP-DM stand for?

A

cross industry standard process for data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following statements are TRUE about CRISP-DM?

A) a framework that can help structure the approach for data analytics projects
B) its utility lies in its helpfulness in turning vague business questions into explicit analytical tasks
C) The model incl. a data mapping stage
D) The model incl. a deployment stage
E) The model incl. a data preparation stage

A

A) a framework that can help structure the approach for data analytics projects
B) its utility lies in its helpfulness in turning vague business questions into explicit analytical tasks
D) The model incl. a deployment stage
E) The model incl. a data preparation stage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 6 stages of CRISP-DM

A

1) Business understanding
2) Data understanding
3) Data preparation
4) Modeling
5) Evaluation
6) Deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Business understanding (stage 1) is about turning vague vocal business objectives into quantitative and explicit data analytics task

TRUE/FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data understanding (stage 2) concerns the identification and inspection of data to understanding data limitations, missingness, need for data transformation etc.

TRUE/FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data preparation (stage 3) entails formatting, cleaning, transforming, and combining data to enable the intended analysis.

TRUE/FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Modelling (stage 4) entails applying analytical techniques to analyse the data and thus to identify the appropriate technique for the given business problem as well as tuning considerations

TRUE/FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

_______ involves adjusting hyperparameters to optimize performance. The process can also incl. feature selection, which involves adjusting the variables or features used in the model to improve its predictive capabilities

a) model tuning
b) iterative modelling
c) tuning parameters
d) modelling parameters

A

a) model tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A typical process of model tuning would involve iteratively running several models using default parameters; fine-tuning them; and run them again.
One single run of a single model or parameterisation will not sufficiently address the use case

TRUE/FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assessing model performance (stage 5) involves assessing how well the model performs in terms of e.g., predictive capabilities

In this assessment, the generalisability of the model is directly evaluated based on the model’s performance when used on new data

TRUE/ FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

After model evaluation, which step is crucial before going to the deployment stage?

A

It is crucial to return back to the business goals documented in step 1 (Business Understanding) and reflect whether the results are applicable in a comprehensive way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Deployment (stage 6) entails putting the model into practice to produce value and make considerations such as how can the model be integrated into existing business operations.

TRUE/FALSE

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data that is missing completely at random (MCAR) should not be immediately removed from the data
TRUE/FALSE

A

FALSE

If data is MCAR, usually one should proceed by just deleting these observations with missing values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If data is NOT MCAR, which of the following statements are FALSE?
A) these observations should be deleted
B) the observations where data is not MCAR should not be deleted
C) An alternative to deletion is to change all N.A. observations to the mean value of the observations in the column

A

FALSE: A) these observations should be deleted

Observations should be deleted when the data is MCAR. When not MCAR, you should keep them or change the NA to a zero or mean value to avoid introducing bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly