Data Science Risk Management Flashcards

1
Q

Data Quality

A

Poor data quality is a major risk in data science. Data must be thoroughly cleaned and preprocessed to handle missing values, outliers, and inconsistencies. Also, the accuracy and completeness of data are key to building reliable models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Model Validity

A

The risk that a model is incorrectly specified or uses inappropriate assumptions can lead to incorrect or misleading results. Data scientists must ensure their models are valid for the purpose for which they’re being used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Overfitting

A

This risk involves a model learning the noise along with the underlying pattern in the training data, which makes it perform poorly on unseen data. Techniques such as cross-validation, regularization, and pruning can be used to manage this risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Underfitting

A

The risk where the model is too simple to capture the underlying trend in the data, resulting in poor performance both on the training and the unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Privacy

A

Data science often involves dealing with sensitive data. Ensuring this data is handled ethically and in compliance with privacy laws is a major concern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bias and Fairness

A

Models can inadvertently perpetuate biases in the data they’re trained on. Risk management should involve testing models for fairness and bias, and mitigating these issues when found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reproducibility

A

Data science results should be reproducible. This requires careful management of data, code, and computational environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Operational Risks

A

These include risks related to the implementation of data science results in real-world systems. For example, if a model is used for decision-making, it needs to be robust, reliable, and able to handle different inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpretability

A

Especially in sensitive or regulated domains, it’s important that model predictions can be explained. If a model is a “black box”, it’s hard to trust its predictions or debug them when they’re wrong.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Legal and Regulatory Compliance

A

Depending on the industry, there may be specific regulations that data science needs to comply with. This can involve data privacy laws, regulations around explainability and fairness, and requirements for documentation and reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly