Data Quality Flashcards

1
Q

Data Quality Indicators (TARMAC)

A

Trackability, Acceptibility, Relevance, Measureability, Accountability, Controllability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

TARMAC - trackability

A

Can measure data quality over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

TARMAC - Acceptability

A

be able to define what good looks like

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

TARMAC - Relevance

A

Make sure measuring something relevant to the business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

TARMAC - Measureability

A

What will actually be measured? how can it be measured?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

TARMAC - Accountability/Stewardship

A

Who will be held accountable if it goes wrong?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

TARMAC - Controllability

A

Defining remedial actions in advance of the thing going wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What must you know to be able to define quality quality?

A

the purpose of use of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you manage data quality when you don’t know the purpose?

A

don’t over assume - stick to basic validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Reference Data

A

data not subject to change e.g., identifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Master Data

A

Descriptive attributes of business entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can be defined as data standards?

A

data types, acceptable values, attribute domains, metadata format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two type of data quality management?

A
  1. Governing/Strategic
  2. Tactical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is tactical data quality management?

A

short terms fixing of problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is governing/strategic data quality management?

A

Overarching long term goals e.g., root cause analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Common DQ mistakes

A
  • failing to consider the intended use of the data
  • Confusing Validity and accuracy
  • treating it as a one time activity
  • not fixing at the source
  • applying software quality principle’s
  • laziness, blaming the system
  • believing good data quality is the end goal (not the use of that data)
  • believing that quantity beats quality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data quality firewall

A

taking external data and applying data cleansing before it is stored in the DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Impacts of of poor data

A
  1. aggravation
  2. loss of reputation
  3. loss of business
  4. regulatory risk
  5. loss of life
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why is it important to communicate the cost of poor quality data?

A

to raise awareness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data Quality Management Cycle

A

Plan -> Deploy -> Monitor -> Action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the four data quality governance steps

A
  • standardisation
  • assignment
  • escalation
  • completion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Causes of data issues

A

Human causes
Organisational (system)
Physical causes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

roles of a the data quality oversight board

A
  • setting data quality improvement priorities
  • establishing communications & feedback mechanisms
  • producing certification & compliance policies
  • approving data quality strategies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Data Quality Service Level Agreement (SLA) will include

A

defining roles & responsibilities for data quality

25
What is a key process of defining data quality business rules
separating data that does not meet business needs from the data that dies
26
Why is top down and bottom up profiling best done together?
it balances the business relevance and the actual state of the data
27
Steps in root cause analysis
1. define the problem 2. collect data 3. identify all possible casual factors 4. identify root causes(s) 5. recommend and implement solutions
28
Dimensions of data quality
Completeness Consistency Currency Reasonableness Integrity Timeliness Validity Accuracy Uniqueness (privacy) (precision)
29
Completeness Data Quality DImension
All mandatory values are present
30
Consistency Data Quality DImension
Data of one concept corresponds with the same concept in another system
31
Currency Data Quality DImension
Is the data up to date
32
Reasonableness Data Quality Dimension
Business rules, does it feel right/ is it inline with what is expected
33
Integrity Data Quality Dimension
Child data must have a parent
34
Timeliness Data Quality Dimension
Accessibility/ availability
35
Validitity Data Quality Dimension
Is the value in the correct domain?
36
Accuracy Data Quality Dimension
Does the data correctly represent the real life model
37
Uniqueness Data Quality Dimension
business concept must not be duplicated.
38
How to measure data quality?
Stats (sampling / basic summaries/ process control charts) Profiling ( manual, tools, columnar, intra-table, cross-table, cross-table) information flow diagrations
39
process of profiling
1. identify subset of data 2. understand business use 3. put into profiling tool 4. list potential anomalies 5. prioritise critically
40
Whats the output of profiling tool?
counts, summaries, data types, PKs, percentage of completeness, identification (e.g., duplicated records, out of range values).
41
Inspecting the quality of data using statistical techniques is called
data profiling
42
Advantages of defining data quality rules upfront
- setting clear expectations for data quality - creating the foundation for ongoing data quality measurement - providing the requirements for system control to prevent quality issues - provide data quality requirements to external parties
43
What 3 levels of data granularity should you measure for data quality
Data element value, record & dataset
44
How is data quality management and data governance linked
- both are essential for organisation success - both ongoing efforts - governance supports DQ - DQ sustains governance
45
Where to focus for DQ
- focus on critical data - focus on preventing errors (not just fixing) - address the root cause of problems - enforce quality standards
46
Shewhart data quality cycle stages
plan do check act (plan deploy monitor act)
47
When measuring data quality which three levels of granularity should you measure?
Data element, record, dataset
48
Data profiling software
Data profiling software investigates data to understand its structure, content and quality. It helps us find patterns and problems in the data.
49
Data Quality dimensions
Accuracy Completeness Integrity Uniqueness Consistency
50
Data Accuracy
How closely data represents reality (hard to measure, compared to trusted sources e.g. check that postcodes match real postcodes)
51
Complete data
all data is present, no gaps (depends on mandatory or optional fields)
52
Data consistency
Making sure 2 or more representation of something are the same
53
Data Integrity
Making sure data is complete, accuracy and consistent (making sure data objects are connected properly)
54
Referential Integrity
the connections between data objects it consistent
55
Internal Consistency Problem Example
list of names and emails, where 2 people have the same name, or some names don't have emails.
56
Orphan
A data object with a missing or invalid reference to another data object
57
Data Quality Oversight Board
Provides strategic direction with policies & activities.
58
Data Value Domain
A set of rules that describe the set of values that can be taken.
59
Business rules are not required for....
critical data improvement