Warehousing and BI Flashcards

1
Q

Why do we use a data warehouse

A
  • standardise from multiple sources
  • saves time when building reports & analysis
  • allows new ways to report and analyse
  • prevent adverse impact on operational systems
  • enable comparison without underlying data changing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

OLAP

A

Online analytical Processing (large amounts records and heavy duty queries, small number of knowledge areas)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

OLTP

A

Online transaction processing (lots of data coming from lots of different areas)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data warehouse

A

(big warehouse where we store all kinds of information from all of the different source systems in our organization)
subject orientated, integrated, time variant and non-volatile collection of summary and detailed historical data used to support the strategic decision making processes for the corporation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Once data is in a data warehouse…

A

it does not change (non-volatile)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does warehoused data vary from transactional data?

A
  • subject orientated, only a specific subject area
  • integrates multiple sources of data
  • contains data over time periods, not just most recent data
  • data is non-volatile, it will not change once its in the warehouse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Mart

A

Data stored for particular type of analytics, highly processed and easy to navigate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cubes

A

Multi dimensional storage (software e.g., PowerBI, tableau and hardware)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data lake vs warehouse

A

lake is broader and nearer real time, often has data pipelined through, limited processing (no cleaning, integration etc).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data lake data is used by

A

A data scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Active vs classic data warehousing difference

A

active add operational BI (i.e., closer to real time, operational services etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Big Data points of definition

A
  • Viscosity (how difficult to use and integrate)
  • Volatility (how often changes are made, how long is data useful)
  • Veracity (how trustworthy)
  • Variability (the forms that data is stored, inconsistent across data sets)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Veracity

A

How trustworthy data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data warehousing (processes)

A
  • extract processes
  • cleansing processes
  • transformation processes
  • load processes
  • associated control processes
  • the use of meta data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

3 types of datawarehouse

A

conventional, dimensional virtual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Conventional Data warehouse (Inmon)

A

source -> transformation -> staging area -> data warehouse (normalised, atomic) -> data marts (normalised) / cubes (denormalised) -> analysis

entire datawarehouse, pull out data marts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Dimensional data warehouse (kimball)

A

sources -> transformation -> staging area -> data marts (all as one data warehouse) -> cubes

lots of datamarts ready for use, assembled into data warehouse. Denormalised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Dimensions vs facts

A

Do analysis on facts, item/thing being descirbed
Do analysis by dimensio, background/context e.g., time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Another word for a dimensional model

A

star schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Dimensional modelling is (normalised/denormalised)

A

denormalised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What makes up a dimensional model

A

dimension tables (nouns)
fact tables (events/measures)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Hierarchies for dimensions are…

A

stored in the dimensional table itself so there is no need for a separate hierachical table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is hierarchical data

A

Data where items are linked to each other in parent-child relationships e.g., animals, mammals, primates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Additive facts

A

facts that can be added up across multiple dimensions without loosing value

e.g., number of units sold across product and customer etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How to deal with changing dimensions?

A
  • overwrite (loose history)
  • add a new row (for every product which was re categorised, with a new surrogate key which points to the facts)
  • add a new attribute (i.e. new category) (not good if lots of changes)
  • add history table ( )
26
Q

example of changing dimensions

A

sales category changed

27
Q

Problem with changes dimensions

A
  • destroying history
28
Q

How to change a dimension for lots of back to back comparisons

A

add an attribute

29
Q

How to change dimension for in depth over time queries

A

add rows

30
Q

Aggregated facts

A

associating the facts with a certain dimension and creating a data store

31
Q

Types of BI Tools

A

-Query and reporting tools
- OLAP (online analytical processing)
- Analytical Applications (in a box, feed the exact data but then heaps of analytics available)
- dashboard & scorecard
- performance management tools (e.g., budgeting)
- predictive analytics and data mining
- advanced visualisation and discovery tools (interact with the visual)

32
Q

Dashboard

A

dynamic presentation of operational information

33
Q

Scorecard

A

static representation of progress towards longer term goals

34
Q

Data Mining

A

Interactive exploration of data with no specific opening question, explore relationships via algorithms

35
Q

Predictive Anlaytics

A

tools to predict the future and support what if analysis

36
Q

Dimensional data modelling allows

A

ease of use and fast queries (but is difficult to set up).

37
Q

Optimal data mining tool use

A
  • Fraud detection
  • customer segregation and scoring
  • predictive analysis
  • identifying potential loan defaulters
38
Q

Examples of BI

A
  • strategic analytics for business decisions
  • decision support systems
  • supporting risk management decision reporting
  • identifying top quartile customers
39
Q

What do analytic applications provide?

A

a pre-built solution to optimise a functional area or industry segment

40
Q

Slice dice roll-up and pivot are terms used in what type of data processing?

A

OLAP

41
Q

Active Data warehousing

A

(new) architectural approach where volatile data is provisioned in a data warehouse structure to provide transactional systems with a combination of historical and near real time data to meet customer needs

42
Q

ODS acronym

A

Operational data store

43
Q

EDI acronym

A

electronic data interchange

44
Q

What implementation approaches do not support OLAP?

A

Flat Online Analytics Processing

45
Q

Which of these is a core idea of Data Warehousing shared by both Inmon and Kimball?

A

Storing and organising data in a Data Warehouse increases its value

46
Q

Data Warehouses need to be incrementally developed. Which process is critical to the success of such a project?

A

A strong release management process

47
Q

Business intelligence

A

(1) a way of analyzing data to understand how an organization works.
(2) a set of computer tools that help people do this kind of analysis.

48
Q

Purposes of a data warehouse

A

support operational functions
support compliance requirements
enable effective business analysis and decision making,

49
Q

Kimball DW approach

A

A data warehouse is copy of transaction data specifically structured for query and analysis

(dimensional model)

50
Q

Inmon DW approach

A

A data warehouse is a
subject orientated
integrated
time-variant
non-volatile
collection of data in support of managements decision-making process

(More complex to query and analyse)

51
Q

Facts represent…

A

Facts represent the actual events or transactions e.g., number of products sold

52
Q

Dimensions represent…

A

dimensions provide context for those events e.g., names, locations

53
Q

What is a conformed dimension

A

A dimension that is shared across multiple fact tables

54
Q

OLTP
vs
OLAP

A

OLTP captures, stores and processes data from transactions in real time
OLAP uses complex queries to analyse aggregated historical data from OLTP systems.

54
Q

Inmon vs kimball

A

Inmon focusses on organisation and normalised data
Kimball makes data analysis easy

55
Q

ERP

A

enterprise resource planning

56
Q

Active Data Warehousing

A

the technical ability to capture transactions when they change and integrate them into the warehouse, along with maintaining batch or scheduled cycle refreshes.
(automated updating to OLTP)

57
Q

What implementation approaches support OLAP?

A

Relational Online Analytical Processing
Hybrid Online Analytical Processing
Multi-dimensional Online Analytical Processing

58
Q

When supporting multi-dimensional business report requests, which would be best?

A

OLAP

59
Q

ESB =>
(tight or loose)

A

Loosley coupled

60
Q

What is true of a data vault

A

Data is cleansed and standardized as part of the staging process
History is stored in a normalized atomic structure, dimensional surrogate, primary and alternate keys are defined
Business key and surrogate key relationships should remain intact to provide the history of the data mart
Reloading facts is possible when later increments introduce grain changes

61
Q

When performing source to target mapping, one of the main challenges with any mapping effort is:

A

Determining valid links or equivalencies between data elements in multiple systems