15 Data Quality and Management Flashcards

1
Q

What is quality control?

A

The process of testing data to ensure data integrity

Quality control is essential because bad data can lead to misleading results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When should you check for data quality?

A

Any time there is a major change, such as:
* Data acquisition
* Data transformation
* Data manipulation
* Final product review

Regular checks are also important beyond routine maintenance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data acquisition?

A

The process of obtaining new data

It requires checking for bias and the current state of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does data transformation involve?

A

Changing data from one form to another, including:
* Intrahops
* Pass-throughs
* Conversions

Transformations should ideally be done in new variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is data manipulation?

A

Changing the shape of the data without altering its content

Examples include breaking down or combining variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the data quality dimensions?

A

Key dimensions include:
* Data consistency
* Data accuracy
* Data completeness
* Data integrity
* Data attribute limitations

These dimensions help assess the quality of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is data consistency?

A

Ensuring data is uniform and reported the same way across different levels

This applies to both individual variables and broader databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is data accuracy?

A

Whether the data is correct

Checking data accuracy often involves verifying it against an outside source.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is data completeness?

A

Checking for gaps in data, such as missing values or entire variables

This is essential for valid analyses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does data integrity encompass?

A

It includes consistency, accuracy, completeness, and security

Data integrity is crucial in regulated fields like pharmaceuticals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are data quality rules and metrics?

A

Guidelines that define acceptable data standards and formats

These include cutoff scores and conformity rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is cross-validation?

A

A statistical analysis that checks if results can be generalized

It helps assess model effectiveness and reduce test error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are sample/spot checks?

A

Quick checks focusing on one or two data quality dimensions

They are often prompted by unusual data observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are reasonable expectations in data quality?

A

Assessing whether data values make sense based on historical norms

This can involve formalized processes for flagging outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is data profiling?

A

A formal process that checks data quality across entire databases

It usually includes structure, content, and relationship discovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a data audit?

A

A systematic check to see if a dataset meets specific goals

Audits are often scheduled and performed at all stages of the data lifecycle.

17
Q

What is master data management (MDM)?

A

The process of creating and managing a centralized data system

MDM aims to create a ‘golden record’ for improved data quality.

18
Q

When should MDM be used?

A

During:
* Mergers and acquisitions
* Compliance checks
* Streamlining data access

MDM helps integrate disparate data sources and manage protected data.

19
Q

What is the benefit of having a golden record?

A

It provides a single source of truth with clean, standardized data

This facilitates faster access and higher data quality.

20
Q

What challenges are associated with implementing MDM?

A

It can be labor-intensive and expensive to set up

Many companies may only implement MDM for specific data types.

21
Q

What is policy in the context of data management?

A

Policy is in reference to compliance, ensuring all records are organized for easier regulation checks.

22
Q

What does streamlining data access mean?

A

Streamlining data access allows faster retrieval of data from a single table without complex queries.

23
Q

What is the first step in the MDM process?

A

Consolidation

24
Q

What does consolidation involve in MDM?

A

Consolidation involves creating the golden record by combining data from multiple sources into one place.

25
What is the purpose of standardization in MDM?
Standardization makes data uniform, ensuring all data works together and is consistent.
26
What is a data dictionary?
A data dictionary is a document that defines variables, their attributes, structure, and relationships.
27
Why are data dictionaries important?
They help ensure that multiple users understand the data and its usage.
28
What does data quality control involve?
Data quality control involves checking for accuracy, consistency, and reliability of data.
29
When should data quality be checked?
Data quality should be checked after data manipulation, after data transformation, and before the final report.
30
Which of the following is a data quality dimension: Data completeness, Data retention, Rows passed, or Data manipulation?
Data completeness
31
What is data profiling?
Data profiling is a structured formal process for assessing the quality and efficiency of an entire database.
32
True or False: Acquisitions are an appropriate time to institute MDM.
True
33
Creating a document that explains variables in a dataset represents which part of the MDM process?
Data dictionary
34
Fill in the blank: _______ is the process of combining data from multiple sources into one place.
Consolidation
35
Fill in the blank: A data dictionary provides definitions for every variable, as well as how they are used and how they _______.
relate to other variables
36
What should be included in a data dictionary?
Definitions and attributes for every variable, structure, relationships, and data organization.
37
What is the significance of having a data dictionary in a collaborative database environment?
It ensures that all users understand the data and its usage, preventing confusion.
38
List three circumstances where data quality should be checked.
* After data manipulation * After data transformation * Before the final report
39
What is the main goal of standardization in data management?
To ensure all data works together and is consistent across different sources.