Term Glossary (Topic 1.4-1.6) Flashcards

1
Q

State the three stages of data profiling

A

1) Create simple summary statistics (Counts, means, min/max)
2) Check data quality
3) Identify problems for future data integrations (mislabelled columns, inconsistent data formats)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four options when an error is found

A

1) Accept it - Keep the error (perhaps flag it or offer an explanation)
2) Reject the data entry (remove the entry)
3) Correct the error (if possible, identify and amend the data entry)
4) Create default value (replace the error with a set value to help with data consistency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does CUVCAT stand for

A
C - Completeness
U - Uniqueness
V - Validity
C - Consistency
A - Accuracy
T - Timeliness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe ‘Data Migration’

A

The physical movement of data from one source to a destination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain ‘Master Data Management (MDM)’

A

This describes identifying, protecting and properly handling of data that is core to business operations. It is important to identify the specific datasets that are critically sensitive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

State the three elements that should be considered within ‘Integration Design’

A

1) Rules and Requirements
2) Objectives and Deliverables
3) Support models and SLAs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe ‘Rules and Requirements’ with relation to Integration Design

A

An organisation will likely have a set of rules and requirements that govern how its data is integrated to remain legally compliant, maintain security, retain performance…etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

.

A

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe ‘Support models and SLAs’ with relation to Integration Design

A

Database models (recap from Data analysis concepts) should be set up to support easy data integration. Remember that even dashboards linked to multiple tables are examples of data integration. SLAs can define the level of output required from the data integrating system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

State the three elements that should be considered within ‘Data Integration Tools’

A

1) Future Scalability
2) Implementation
3) Support Costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe ‘Future Scalability’ & state three requirements for future scalability with relation to Data Integration Tools

A

When setting up databases, it is important to account for the possibility that further tables may need to be added in future. Making sure that:
 All tables have primary keys (even tables that do not connect to other tables).
 Keep consistent field names (so that columns can me matched with equivalent columns in other tables).
 Keep consistent data formats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe ‘Implementation’ with relation to Data Integration Tools

A

A major issue is combining data that was previously measured/recorded in different ways. This would lead inconsistent data stores. This would have to be solved prior to integration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe ‘Support costs’ with relation to Data Integration Tools

A

Integrating large amounts of data efficiently will involve significant expense primarily from man-hours needed and new hardware/software required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define ‘Data Synchronisation’

A

This is a form of data integration that aims to keep the records stored in one location consistent with records stored in another location through a continuous updating process. By contrast, data integration is the process of connecting data sets together often in single events. There are number of reasons for synchronising data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define ‘Technical Acceptance Testing (TAT)’

A

Set of tests designed to find whether a piece of software (such as a dashboard) has satisfied its technical requirements. It is often done just before or simultaneously with User Acceptance Testing (UAT). TAT is often done if non-functional requirements of a system change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define ‘User Acceptance Testing (UAT)’

A

This is where a system is tested by end users to see if it functions properly, fulfils business requirements. UAT is often done if functional requirements of a system change.

17
Q

Define ‘Performance Stress Testing (PST)’

A

This form of test to see how a system performs when under excessive and sudden increases in demand/load. For example, a live database needs to be tested with multiple complex queries occurring simultaneously to see if it remains stable.