ADBS - Data Warehousing Flashcards

1
Q

What is data warehousing?

A

Data warehousing is a set of hardware and software components used to better analyse data to make better business decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the benefits of data warehousing?

A
  • Understand business trends and make better forecasting decisions
  • Bring better products to market in a more timely manner
  • Analyse daily sales information and make quick decisions that can significantly affect your company’s performance

ROI - return on investment can range between 40% and 600%

Competitive Advantage - discover trends, customers and demands

Increase in productivity from decision makers - contains integrated database of consistent, subject oriented, historical data

Transforms data into meaningful information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name 3 applications of data warehousing

A
  • Sales and marketing analysis across all industries
  • Inventory turn and product tracking in manufacturing
  • Category management, vendor analysis, and marketing program effectiveness analysis in retail
  • Profitability analysis or risk assessment in banking
  • Claims analysis or fraud detection in insurance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a data warehouse?

A

A data warehouse is a subject-oriented, integrated, non-volatile, and time-variant collection of data in support of management’s decisions. (Inmon05)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is subject oriented?

A

data stored in a warehouse is about the major subjects of an enterprise (such as customers, products, and sales)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is integrated?

A

source data for a warehouse is from different enterprise-wide applications systems. The data is often inconsistent, for example, in different formats. The source data must be integrated into a consistent format in order to present a unified view of the data to users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is non-volatile?

A

New data is always added as a supplement to the database, rather than a replacement.

The database continually absorbs this new data, incrementally integrating it with the previous data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is time-variant?

A

data in a warehouse is only accurate and valid at some point in time or over some time interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Operational Data?

A

data used to run business. This data is normally stored, retrieved, and updated by an Online Transactional Processing (OLTP) system. An OLTP system may be, for example, a reservations system, an accounting application, or an order entry application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the characteristics of operational data?

A

Updated often and through online transactions

Non-historical data (not more than three to six months old)

Optimised for transactional processing

Highly normalised in the relational database for easy update, maintenance, and integrity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is informational data?

A

Informational data is typically stored in a format that makes analysis much easier. Analysis can be in the form of decision support (queries), report generation, executive information systems, and more in-depth statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the characteristics of informational data?

A

Summarised operational data
 De-normalised and replicated data
 Infrequently updated from the operational systems
 Optimised for decision support applications
 Possibly “read only” (no updates allowed)
 Stored on separate systems to lessen impact on operational systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are differences between operational data and informational data

A

Operational Data
current, detailed, dynamic, repetitive processing, transaction driven, application oriented, day to day, serves clerical/operational users

Informational
historical data, summarised, static data, ad hoc unstructured heuristic, analysis driven, subject oriented, managerial, strategic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name 10 problems of data warehousing

A
Underestimate resources for data loading
Hidden problems with source systems
Required data not captured
Increased end user demands
data homogenisation
high demand for resources - space
data ownership - sensitivity is lost
high maintenance
time to implement
difficulty in integration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the architecture of a data warehouse

A

Operational data, ODS, Load Manager, Warehouse Manager, Query manager,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the architecture, describe the operational data

A

data from multiple sources sych as hierarchical or network databases, VSAM, RMS, Relational DBMS, private data, data from workstations or servers

17
Q

In the architecture, describe the operational data store

A

this is a repository of operational and integrated operational data used for analysis, already extracted and cleaned

18
Q

In the architecture, describe the Load Manager

A

responsible for extracting and loading data into the warehouse

19
Q

In the architecture, describe the Warehouse manager

A
Management of data in the warehouse
analyzes data for consistency
transforms and merges data
create index and views
aggregates data
back up and archiving
20
Q

In the architecture, describe the Query manager

A

management of user queries and directs to appropriate data source

consists of end user access tools, monitoring tools, OLAP, data mining

21
Q

What is meta data?

A

data about data
map data to a common view of data
automate the production of summary tables
direct to most appropriate data source
helps to define data source, consists of technical and business data.

22
Q

What is a data mart?

A

a subset of a data warehouse used to support the requirements of a particular department

23
Q

Name three differences between a Data Mart and a Data Warehouse

A

focuses on requirements of users within department
does not contain detailed operational data
contains substantially less data making it easier to navigate

24
Q

Name 5 reasons for implementing a data mart

A

reduced cost and time

users can access the data they need

less management of data - reduction in volume of data to be accessed

Less data - cleansing, loading, transformation and integration is easier

provide appropriately structured data in conjunction with end-user access tools

all users have the same collective view of the data in a particular department

easily targeted users for department as opposed to corporation

25
Q

Name 6 issues of Data Marts

A
functionality
size
load performance
accessing data in multiple data marts
administration
user access
data mart versioning, integrity and metadata
installation
26
Q

What is dimensional modelling?

A

A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access.

27
Q

what are the characteristics of dimensional modelling?

A

uses the concepts of er modelling
composed of one table with composite primary key (fact table)
contains smaller tables called dimension tables

28
Q

What is the star schema?

A

it is a database structure that has a fact table surrounded by dimension tables containing reference data

29
Q

what are facts?

A

these are generated by past events and are unlikely to change

30
Q

what are dimensions?

A

dimensions contain descriptive textual information. Used as constraints in data warehouse queries

31
Q

What is the snowflake schema?

A

simila to teh star schema however dimension tables do not contain denormalised data

32
Q

What is the starflake schema?

A

this is a mixture of star and snowflake schemas which contains denormalised star and normalised snowflake

33
Q

What are the advantages of dimensional modelling?

A

Efficiency - efficient access to data

Change - Handles changing requirements, supports ad hoc queries

Extensible - add new facts consistent with granularity, add new dimensions, add new attributes to dimensions

Model business situations

Query processing is predictable

34
Q

What are the differences between dimensional modelling and entity relational modelling?

A

ER modelling usually a database for OLTP, identifies relationships among entities, removes redundancy in data, normalised, focused on transactional processing

Dimensional modelling usually a component of the data warehouse, supports ad-hoc queries, decomposed into multiple DM from ER

35
Q

What is the 4 step proces for dimensional modelling design for data warehouses?

A

Choose the process - what is teh subject matter of the data mart

Declare the grain - What is the fact table going to represent e.g.

Identify Dimensions - Dimensions set the context for asking questions about the facts in the fact table. A well built set of dimensions makes the data mart understandable and easy to use.

Identify facts - The grain of the fact table determines which facts can be used in the data mart. All the facts must be expressed at the level implied by the grain.

36
Q

What are the issues of choosing a DBMS?

A

potential size

parallelism in the database

performance

scalability

availability

manageability

37
Q

What are the requirements of a DBMS?

A

Load performance - incremental loading frequently, must be done in minimum amount of time

Load processing - steps to load new or updated data like conversions, filtering, reformatting, integrity checks, physical storage, indexing

data quality management

query performance

terabyte scalability

user scalability

concurrency with users

Networking data warehouses

Administration

Integrated dimensional analysis

Advanced query functionality