Week 9 Flashcards

(62 cards)

1
Q

what a data warehouse

A

subject oriented
integrated
time varaint
non updatable

COLLECTION OF DATA1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 CHARACTERISTICS

A

subject oriented: CUSTOMERS/PATIENTS,STUDENTS

integrated: consistent naming conventions, info from multiple sources

time varaint: can study trends and changes

non updatable: read only, periodicaly refershed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why do we need daata warehousing 2 NEEDS FOR WAREHOUSING

A
  1. searaintg operational and info systems and data
  2. integrated company wife view of high quality info form disprate databaseds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is OLTP

A

OPERATIONAL SYSTEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OLTP/ operational system characterisitcs
- primary purppose
- type of data
- primary users
- scope of usage
- design goal
- volume

A
  • run the business on a current basis

current representaiton of state of the business

clerks, salespersons, admins

narrow, planned and simple updates and queries

erformance: throguhput, availability, relaibility, alignment with business rules

many constant updates and quesies on one or fw table rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

INFO systems/ OLAP
- primary purppose
- type of data
- primary users
- scope of usage
- design goal
- volume

** FOR DATA MINING

A

support managerial desciion making

historical point in time SNAPSHOTS + predictions

managers, business analysts, customers

broad, ad hoc, complex quieries ad analysis

ase and low cost of flexible access and use

periodic batch updates and queries requring many or all rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the issue with comany wide views?

A

incosnistent key structures, snyonyms, FREE FORM VS STRUCTURED FIEDLS, MISSING/INCSSINSTNT DATA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 DATA MART ARCHITECTURES

A

-INDEPENDANT
-DEPENDANT/OPERATIONAL
-LOGICAL & REAL-TIME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a data mart

A

smaller data structure that stores the data for a particualr purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A DATA WAREHOUSE IS JUST ASSEMBLED VERSION OF DATA MARTS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is ETL

A

extract, trasnform, and load (ETL) process of combining data from multiple sources into large central repo (DATA WAREHOUSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

INDEPENDANT DATA MART ARCHIETECTURE:

WHAT IS flow of data from source system to end user

A

source data systems (compters, internal, external)

> extract >

data staging areas (proceessing) / ttransform; PUT INTOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO DM

> load >

data & metadata storage area (data warehouse consisting od several smaller marts)

>

end user presentation tools : ad-hoc query, OLAP tools, modeling, visualzing, business performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

limitaitons of indepednant data marts

A
  • separate ETL processes for each mart leads to redundant data & prcoessing
  • incosistency between marts
  • difficult to drill down for related facts
  • excess scaling costs when ore app are built
  • high cost fo robratining consstisncy between marts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DEPENDANT DATA MART WITH OPERATIONAL DATA STORE ARCHITECTURE

ODS provides options for transofrming current data, single ETL for ent data warehouse (EDW), dependant data marts loaded from EDW

A

source data systems

> extract >

data staging area

> load into> data and metadata storage area
feed> end user presentation tools

–Data and metadata storage area set up–
instead of separate data mart for everything ,there is one central data base (enterprise EDW), AND YOU LOAD OUT DIFFERENT DATA MARTS AFTERWARDS!

–Data storage area– IS ODS!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is ODS

A

smaller data warehouse that stores recently ahppened data for operationa luse

FOR ANALYTICAL PURPOSE NOT TRANSACTIONAL PURPOSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

logical DATA MART AND REAL TIME WAREHOUSE

A

There is a lot of data! so updating data real time must be really hard

this is very similar to dependant data mart with ODS,b ut the ODS & EDW [data staging area & data/metadata storage area] they BECOME ONE

DATA MARTS ARE NOT SEPEARATE DATABASE, BUT LOGICLA VIEWS OF THE DATA WAREHOUSE!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

pros of Logical data mart & real time wahreohsueo

A

EASIER TO CREATE NEW DATA MARTS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

logical DATA MART AND REAL TIME WAREHOUSE ARCHITECTURE

A

source data systems

> extract>

data staging area + data/metadata storage area

> feed>

end user presentaiton tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

data warehouse vs data marts: scope comparison

A

dw: applciation independant, centralized/ent wide, planned

dm: specfiici dss applicaiton, decnetralized, organic/unplanned at times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

data warehouse vs data marts: subject and sources comparison

A

DW: data is historical, detialed, summarized
- lightly denomalized (STILL TONS OF SEPARATER DATa)
- multi subjects
- many internal and external sources

DM: data has some history, detail and summarizd
- highly denomrialize (less tables!)
- one central subject or conecrn to usesr
- few internal and external sources
HIGHLY DENORMALIZED DATA MARTS= MROE EFFIECTIN AT TIMES CUZ REPEATED DUPLICATED DATA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

DW vs DM : other comaprisons

A

DW- flexible, data orientedd, long llife, large, single complex structure

DM: restircitve, project oreindted, sohrtlife, starts small become large, multi-simi-complex structures together complex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

3 layer data archietecture for data warehous

A

TOP: Derived data (from DATA mart METADATA)

MIDDLE: reconciled data (from EDW metadata)

BOTTOM: operational data (from oeprational metadata)

YOU MOVE UPWARDS!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

DW data characteristics

A

Status vs Event Data:
- status: before and after images
- event: something that causes change to status (like a transaction)

Transient vs periodic data
- transient: changes to existing records are written over previous records (DESTORY PREVIOUS DATA)
- Periodic: data is never deleted after being added to the store, just updated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

transient vs periodic daata

A

Transient vs periodic data
- transient: changes to existing records are written over previous records (DESTORY PREVIOUS DATA)
- Periodic: data is never deleted after being added to the store, just updated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Derived data is ?
DATA MART
26
oBJECTIVES OF DERIVED DATA
EASE OF USE FOR DS applications fast response to predefined user queries Customized data for particualr target audiences Ad-hoc query support data mining capabilities
27
Characteristics of derived data
detailed/periodic data aggregate (for summary) distributed (to deparmetnetal servers)
28
most common derived data model?
dimensional model -> star scema
29
What is star schema
CONTAINS A FACT TABLE AS THE CENTRE, AN ASSOCIATIVE ENTITIY and the dimensions taht are relevant ot this fact are the tables that come off the centre one dimension: only one "arm" of tables comming off the fact table... if you have multi dimension- yo uhave tables coming out of the tables
30
what are dimesnion table keys
SURROGATE KEYS (NON-intelligent & non-business related) BECAUSE - keys may change - helps keep track of non key attribute value - surrgoate keys are simpler and shorter - surrogate keys can be same length these are the alternative keys in the table- like AW000010001 then AW000010002...
31
WHAT is granularity of the fact table
GRANULARITY- WHAT LEVEL OF DETAIL DO YOU WANT? Transactional grain: finest level Aggregated grain: more summarized
32
Finer grains bring beter market based anlsysis caabitliy Finer grain implies more dimensions tables, more rows in fact table..
33
in web based commerce, the finest granularity
is A CLICK
34
IS Date improatnt in fact table?
YES! for every fact, there should be a date, and for each of them there should be a key so like diff attributes are: OrderDateKey, OrderDate in the dimension table you need a table for DimDate
35
some variations of the star schema
multi fact table: can improve performance, often used to store facts for differnt combo of dimensions, confromed dimensions Factless Facts tables: no non key data, but foreign keys for associated dims, used for TRACKING EVENTS / INVETORY COVERAGE
36
Role of Metadata Operational metadata: Describe the data in the various operational systems that feed the enterprise DW. It may exist in a number of different formats. EDW: Derived from the enterprise data model, describes the reconciled data layer and the rules for ETL operational data into reconciled data. Data mart metadata: describes the derived data layer and the rules for transforming reconciled data into derived data
37
factless fact table
no data in fact table, just keys assocaiting dimensions records, just erecords the currency of something like attendance
38
2 OPTIONS WHN NORMALIZING DIMENSION TABLES
- include all info in a single denormalized table - normalize the dimensions
39
Fixed product hierarchy
dimension hierarchies help to provide levls of aggregation for users who want to summary info in a DW this is snowflake schema
40
snowflake schema= searching is more efficeint
41
slowly changing dimensions KIMBALL APPROACH
for some dimensions taht change slowly youw ant to keep history of the past Kimball’s approach: — Create a new dimension table row each time the dimension object changes, with all dimension characteristics at the time of chanG BASICALLY HAVE A STARTDATAE AND END DATE AS DIMESNSIONS
42
ETL PROCESS
ETL = Extract, transform, and load — Capture/Extract — Scrub or data cleansing — Transform — Load and Index
43
DATA RECONCILIATION PROCESS
Data reconciliation involves capture/extract, cleanse, transform, and load/index. https://d2l.ucalgary.ca/d2l/le/content/617907/viewContent/6720112/View 36
44
RECORD LEVEL TRANSFORMAITON FUNCTIONS Transofrming info at the record level
Selection – the process of partitioning data according to predefined criteria Joining – the process of combining data from various sources into a single table or view Normalization – the process of decomposing relations with anomalies to produce smaller, well-structured relations Aggregation – the process of transforming data from detailed to summary leve
45
single filed transofrmations: basic
sourec record: datapoint x goes through transformationg t target record: f(x)
46
single filed transofrmations: algortihimic
same as basic, but transformed by using some formula or logical expression
47
single filed transofrmations table lookup
same as basic but transformation refecens a different table
48
multifield taransformations
source record: select 2+ attributes transofrmatinon T Target record: combined the 2 atteibutes to one attribute
49
multifield taransformations one source to many targets
source record: select 1 tranformaiotn to split it into separate ares target recrod: 2+
50
pros and cons of fact tabel
pro: excellent for ad hoc queries cons: bad for onlin tps
51
A class of database technology used to store textual and other unstructured data is called:
nosql
52
DW IS A COLLECTION OF DMS
53
DW does nnot have current data
it is not supporting operational data processing
54
the independant data marts approach does not create one dataa warehous
creates many separaate data marts
55
Operational data store (ODS) An integrated, subject-oriented, continuously updateable, current- valued (with recent history), enterprise-wide, detailed database designed to serve operational users as they do decision support processing.
56
A logical data mart is a(n):
data mart created by a relational view of a slightly denormalized data warehouse.
57
FACTLESS FACT TBALES MAY APPLY WHEN
we are deleting correlated data.
58
Data that are detailed, current, and intended to be the single, authoritative source of all decision support applications are called ________ data.
RECONCILED
59
Informational and operational data differ in all of the following ways EXCEPT:
LEVEL OF DETAIL
60
A data mart is a(n):
DW that is limited in scop
61
All of the following are tasks of data cleansing EXCEPT:
creating FKS
62