Data Quality Flashcards

(23 cards)

1
Q

What messes up reliable and trustworthy data

A
  • Lack of understanding of poor quality data on Org success
  • Bad planning
  • Siloed system design
  • Inconsistent development processes
  • Incomplete documentation
  • Lack of standards
  • Lack of governance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Business Drivers for Establishing a formal Data Quality Management Program

A
  • Increase the value of organizational data and the opportunities to use it
  • Reducing risks and costs associated with poor quality data
  • Improving organizational efficiency and productivity
  • Protecting and enhancing the organization’s reputation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DQ Goals

A
  • Developing a governed approach to make data fit for purpose based on data consumers’ requirements
  • Defining standards and specifiations for data quality controls as part of data lifecycle
  • Defining and implementing processes to measure, monitor, and report on data quality levels
  • Identifying and advocating for opportunities to improve the quality of data, through changes to processes and systems and engaging in activities that measurably improve the quality of data based on data consumer requirements

Approach/standards/processes/proactiveness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DQ Principles

A
  • Criticality: A DQ Program should focus on the data most critical to the enterprise and its customers. Priorities for improvement should be based on the criticality of the data and on the level of risk if data is not correct
  • Lifecycle management: Across lifecycle, from creation/procurement to disposal. Managing data within and between systems
  • Prevention: The focus of DQ Program should be on preventing data errors and conditions that reduce the usability of data; it should not be focused on sumply correcting records
  • Root cause remediation: Improving the quality of data goes beyond correcting errors. Problems with the quality of data should be understood and addressed at their root causes, rather than just their symptoms. Cuz these causes are often related to process or system design, improving data quality often requires changes to processes and systems that support them.
  • Governance: DG Activities must support the development of high quality data and DQ program activities must support and sustain a governed data environment
  • Standards - driven: All stakeholders in the DL have data quality requirements. To the degree possible, these requirements should be defined in the form of measurable standards and expectations agains which the quality of data can be measured.
  • Objective measurement and transparency: DQ levels need to be measured objectively and consistently. Measurements and measurement methodology should be shared with stakeholders since they are arbiters of quality.
  • Embedded in business processes: Business process owners are responsible for the quality of data prodeced through their processes. They must enforce data quality standards in their processes.
  • Systematically enforced: System owners must systematically enforce data quality requirements.
  • Conencted to service levels: DQ reporting and issues management should be incorporated into Service Level Agreement.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Concept: Data Quality

A
  • Data is of high quality to the degree that it meets the expectations and needs of data consumers
  • Data quality is thus dependent on context and on the the needs of the data consumer
  • Expectation related to quality is not always known. Customers may not articulate. DM Professional need to better understand the requirement.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Concept: Critical Data

A
  • One principle of data quality management is to focus on improvement efforts on data that is most important to the org and its customers, in order to make direct, measurable impact on business needs.
  • Data can be assessed based on whether it is required by:
  • Regulatory Reporting
  • Financial reporting
  • BUsiness Policy
  • Ongoing operations
  • Business strategy, especially efforts at competitive differentiation
  • Master data is critical by definition. Data sets or indicidual data elements can be assessed for criticality based on the processes that consume them, the nature of the reports they appear in, or the financial, regulatory, or reputational risk to the organization if something were to go wrong with the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Concept: Data Quality Dimensions: Strong-Wang Framework

A
  • DQ Dim is a measurable feature or characteristic of data

4 General Categories and 15 dimensions:

  • Intrinsic DQ: Accuracy; Objectivity; Believability; Reputation
  • Contextual DQ: Value -Added; Relevancy; Timeliness; Completeness; Appropriate amount of data
  • Representational DQ: Interpretability; Ease of understanding; Representational consistency; Concise representation
  • Accessibility DQ: Accessibility; Access Security
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Concept: Data Quality: Redman Data Model

A

Redman defines a data item as a “representable triple.
“A value from the domain of an attribute within an entity”
Example: 80 years old Donald Trump. Value 80, Attribute age, domain Personal Info, Entity: DJT himself

Redman gives 3 general categories: Data Model, Data Values, Data Representation. More than 20 Dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Redman MVR DQ Model: Model Value and representation

A
  • Content: Relevance of data, the ability to obtain the values, clarity of definitions
  • Level of Details: Attribute granularity, Precision of attribute domains
  • Composition:
    Naturalness: The idea that each attribute should have a simple counterpart in the real world and that each attribute should bear on a single fact about the entity;
    Identifiability: Each entity should be distinguishable from every other entity.
    Homogenity; Minimum necessary redundancy
  • Consistency: Semantic consistency of the components of the model; Structure consistency of attributes across entity types
  • Reaction to Change: Robustness and Flexibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Redman MVR DQ Values:

A
  • Accuracy
  • Completeness
  • Currency
  • Consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Redman MVR DQ Representation:

A
  • Appropriateness
  • Interpretability
  • Portability
  • Format Precision
  • Format Flexibility
  • Ability to represent null values
  • Efficient use of storage
  • Physical instances of data being in accord with their formats
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Larry English Inherent and Pragmatic Quality Charateristic

A
  • Inherent: Definition Conformance, Completemenss of Values, Validity or business rule conformance, Accuracy to a surrogate source, Adccuracy to reality, Precision, Non-dup, Equivalence of redundant or distributed data, Concurrency of redundant or distributed data
  • Pragmatic: Accessibility; Timeliness; Contextual clarity; Usability; Derivation integrity; Rightness or fact completeness

Optus data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DAMA UK 6 Core Dimensions of DQ

A
  • Completeness: The proportion of data stored against the potential for 100%
  • Uniqueness: No entity instance will be recorded more than once based upon how that thing is identified
  • Timeliness: the degree to which data represent reality from the required point in time
  • Validity: Data is valid if it conforms to the syntax of its definition
  • Accuracy: The degree to which data correctly describes the “real world” object or event being described
  • Consistency: the absence of difference, when comparing two or more representations of a thing against a definition.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DAMA UK Other Characteristics

A
  • Usability: Is the data understandable, simple, relevant, accessible, maintainable and at the right level of precision?
  • Timing issues: Is it stable yet responsive to legitimate change requests?
  • Flexibility: Is the data comparable and compatible with other data? Does it have useful groupings and classifications? Can it be repurposed? Is it easy to manipulate
  • Confidence: Are Data Governance, Data Protection, and Data Security processes in place? What is the reputation of the data, and is it verified or verifiable?
  • Value: Is there a good cost/benefit case for the data? Is it being optimally used? Does it endanger people’s safety or privacy, or legal responsibilities of the enterprise? Does it support or contradict the coporate image or the coporate message?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Quality and Metadata

A
  • Metadata is critical to managing the quality of Data
  • Metadata defines what the data represents, having a robust process by which data is defined supports the ability of an org to formalize and document the standardss and requirements by which the quality of data can be measured
  • Data quality is about expectations, and Metadata is a primary means of clarifying expectations
  • Metadata when managed well can support the effort to improve the quality of data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DQ ISO Standard

A
  • The ability to create, collect, store, maintain, transfer, process and present data to support business processes in a timely and cost effective manners requires both an understanding of the chracteristics of the data that determine its quality and an ability to measure, manage and report on data quality.
  • Structure: DQ Planning - Control - Assurance - Improvement
17
Q

DQ Improvement Lifecycle

A

Shewhart/Deming Cycle: Plan - do - check - act

  • For a given dataset, DQM starts by identifying the data that does not meet data consumers’ requirements and data issues that blocks business objectives.
  • Plan Stage: DQ team access the scope, impact, and priority of known issues, and evaluates alternatives to address them.
  • Do Stage: address the root cause and plan for ongoing monitoring of data. Non-technical process: work with process owners. Techinical issue: Implementation and no new error introduced
  • Check stage: Actively monitoring the quality of data as measured against requirements.
  • Act Stage: Activities to address and resolve emerging data quality issues. The cycle restarts as the cause of issues are addressed and solutions proposed. COntinuous improvement is achieved by starting a new cycle.
18
Q

Data Quality Business Rule Types

A

Rules: Describe how data should exist in order to be useful and usable within an organization. These rules can be aligned with dimensions of quality and used to describe data quality requirements.
Rules are commonly implemented in software, or by using document templates for data entry.

19
Q

Common simple business rule types

A
  • Definitional conformance: Confirm that the same understanding of data definitions is implemented and used properly in process across the ORG
  • Value presence and record completeness (missing value)
  • Format compliance
  • Value domain membership: Specify that a data elements’ assigned value is included in those enumberated in a defined data value domain (i.e. asserts an attribute’s value must come from a predefined data domain.)
  • Range conformance: numeric/lexicographic order
  • Mapping conformance: INdicating that the value assigned to a data element must correspond to one selected from a value domain that maps to other equivalent corresponding value domains
  • Consistency rules: Conditional assertions that refer to maintaining a relationship between two attributes based on the actua values of those attributes. (State Name v.s. Post Code)
  • Accuracy verification: value v.s. source
  • Uniqueness verification: only one record eists for each represented real world object
  • Timelines validation: accessbility and availability
20
Q

Common Cause of DQ Issues

A
  • Data Entry
  • Data Processing
  • System Design
  • Mannual Intervention in Automated process
21
Q

Barriers to effective management of data quality include (Leadership lacking):

A
  • Lack of awareness on the part of leadership and staff
  • Lack of business governance
  • Lack of leadership and management
  • Difficulty in justification of improvements
  • Inappropriate or ineffective instruments to measure the value
22
Q

Issues caused by Data Enrty Process

A
  • Data Entry Interface Issues (e.g. Mandatory Fields skipping)
  • List entry placement: order of values within a drop-down list
  • Field overloading: re-use fields (customer ID etc.)
  • Training issues: Lack of process knowledge can lead to incorrect data entry, even if controls and edits are in place.
  • Changes to business processes
  • Inconsistent business process execution. Data created through processes that are executed inconsistently is likely to be inconsistent.
23
Q

Issues caused by data processing functions

A
  • Incorrect assumptions about data sources
  • Stale business rules
  • Changed data structures: Source System may change structures without informing downstream consumers (human and system) or without providing sufficient time to acccount for the changes.