Data Warehouse Flashcards

1
Q

Characteristics

A
  1. Object-Oriented:
    ○ Main object types: .COM and .NET
    1. Integrated:
      ○ Integration in a data warehouse involves bringing together similar data from different databases into a reliable and shared entity.
      ○ The integration process ensures consistent naming conventions, format, and codes, enabling effective analysis.
    2. Time-Variant:
      ○ The data warehouse contains not only the current data but also a historical perspective, allowing users to analyze and understand changes and trends over time.
    3. Non-Volatile:
      ○ Data in a data warehouse is permanent and not erased or deleted when new data is inserted.
      ○ Data is read-only, refreshed at intervals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Operations

A

○ Two primary data operations in a data warehouse are Data Loading and Data Access.
○ Data loading involves extracting, transforming, and loading data into the warehouse. (ETL Operation)
○ Data access enables users to retrieve and analyze integrated data for decision-making and reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Other Characteristics

A

→ Concurrent Use:
○ Data warehouses support concurrent use, allowing many different users to access and utilize the data simultaneously.
○ This is particularly useful in scenarios where multiple users want to interact with the data at the same time, such as when booking airline tickets to the same destination

→ Metadata: 
	○ Metadata is crucial in a data warehouse environment. 
	○ It provides information about the data, such as its source, meaning, relationships, and transformations. 
	○ Metadata helps users understand and manage the data within the warehouse.

→ Multi-dimensional Data Model: 
	○ Data warehouses often use a multi-dimensional data model, which means they organize data into dimensions and facts. 
	○ Dimensions are the perspectives by which data can be analyzed (e.g., time, geography), and facts are the measurable metrics (e.g., sales, revenue).

→ Aggregated Data: 
	○ Data in warehouses is often aggregated, meaning that it is collected and summarized from various sources to provide a higher-level perspective and support efficient querying and reporting. 
	○ Aggregations can include summaries, averages, totals, or other statistical measures.

→ Dimension Tables: 
	○ Data warehouses organize information into dimension tables, where each table represents data from a specific perspective. 
	○ For instance, understanding the effect of inflation on the Euro from a time-based perspective.

→ Synchronization Periodically: 
	○ Data warehouses synchronize periodically, typically through periodic transactions rather than after every transaction. 
	○ This approach is chosen to prevent potential crashes that might occur if synchronization happens too frequently.
	
→ Software Independent:
	○ Definition: Data warehouses are software-independent, meaning they are not tied to a specific database software, and they can handle various data, relationships, schemas, and database languages.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

OLAP and Cube Operations

A

→ OLAP stands for Online Analytical Processing, which is a technology that enables multi-dimensional analysis of business data.
→ OLAP is used to support business intelligence and decision-making processes.
→ Grouping of data in a multidimensional matrix is called data cubes.
→ A data cubes are used to handle multidimensional data that allows for complex analysis and querying.
→ Operations on data cubes typically involve manipulating and aggregating data along different dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data cube Operations

A
  1. Roll-up:
    ○ Roll-up involves summarizing data at a higher level of abstraction.
    ○ It is the process of moving from detailed data to more aggregated levels.
    ○ For example, rolling up monthly sales data to quarterly or yearly totals.
    1. Drill-down:
      ○ Drill-down is the opposite of roll-up.
      ○ It involves breaking down aggregated data into more detailed levels.
      ○ For instance, drilling down from yearly sales to quarterly or monthly details.
    2. Slice:
      ○ Slicing involves selecting a single layer or “slice” of the cube by fixing one dimension at a specific value.
      ○ This operation provides a 2D view of the data cube.
      ○ For example, selecting sales data for a particular region and a specific time period.
    3. Dice:
      ○ Dicing involves selecting a sub-cube by fixing values for two or more dimensions.
      ○ This operation allows you to focus on a specific subset of the data.
      ○ For example, selecting sales data for a particular region, product category, and time period.
    4. Pivot (Rotate):
      ○ An imaging operation that rotates the data to present it from a different angle. Pivoting rotates the data on a specific axis.
      ○ This operation can be useful for analyzing data from different angles.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Commands used in slicing

A
  • Statistical Functions (MIN(), MAX(), SUM(), …)
    • WHERE
    • GROUP BY
    • ORDER BY
    • DISTINCT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

N dimensions data model (Data warehouse Schema Types)

A
  1. Star Schema:
    ○ Description: In a star schema, there is a central computer (or node) that connects to multiple computers around it. These peripheral computers represent different dimensions of the data.
    1. Snowflake Schema:
      ○ Description: Similar to the star schema, the snowflake schema also has a central device, but the computers around it can have additional computers around them, forming subnetworks. This structure resembles the branches of a snowflake.
    2. Galaxy Schema:
      ○ Description: In a galaxy schema, everything is connected without a central node. All computers or nodes are interconnected, forming a network without a specific center.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Business Intelligence

A

encompasses a range of activities and tools aimed at transforming raw data into meaningful insights for business decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Steps for Business Intelligence

A
  1. Basic Querying and Reporting:
    → Querying: Involves retrieving specific data from a database using SQL or other query languages.
    → Reporting: Involves presenting the queried data in a structured format, often through tables and charts. Reporting tools allow users to create and schedule regular reports.
    1. Business Analysis (BA):
      → Involves the examination and interpretation of data to support decision-making.
      → Business analysts use various techniques and tools to analyze trends, patterns, and relationships within the data to provide insights that can guide strategic decisions.
    2. Data Mining:
      → Data mining is the process of discovering patterns, trends, correlations, or valuable insights from large sets of data.
      → It involves using various techniques, algorithms, and methods to extract useful and previously unknown information from raw or unstructured data.
      → It involves the use of a range of techniques, including slicing, dicing, roll-up, pivoting, statistical techniques, and machine learning, contributing to comprehensive data analysis and to uncover hidden information within the data.
    3. Dashboards:
      → Provide a visual representation of key performance indicators (KPIs) and other important metrics.
      → Dashboards consolidate and display data in a way that is easy to understand, allowing users to monitor the health and performance of their business at a glance.
      → They often include charts, graphs, and other visual elements.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Goals of Data Warehouse

A
  • Prediction: Involves forecasting future trends or outcomes based on historical data.
    • Descriptive: Focuses on summarizing and presenting historical data to understand past events.
    1. Prediction:
      → To forecast future trends, outcomes, or events based on historical data and patterns
      → Good methods:
      § Probability: Involve calculating probabilities to estimate the likelihood of specific events or outcomes.
      § Fourier Transformation: is a mathematical technique used to analyze patterns in time-dependent data, often in the frequency domain. It’s valuable for understanding periodic trends and cycles. Fourier analysis can enhance the accuracy of predictions by capturing cyclic behavior in the data.
      § Extension of Data: involves incorporating additional historical data or expanding the dataset to enhance the predictive model’s robustness. This can include gathering more historical records, leading to more accurate predictions.
    2. Descriptive:
      → To provide a detailed and comprehensive understanding of historical data.
      → Involves summarizing and presenting historical data to describe what has happened in the past.
      → Benefit: Understanding historical patterns and trends helps in gaining insights into the factors that have influenced business performance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Warehouse vs Data Models
(Dominant Operations, Goals, Type of model, Technology)

A
  • Dominant Operations:
    → Data warehouse: Data cube operations (slicing, dicing,…)
    → Data Model: DISTINCT, GROUP BY, SELECT, WHERE, SORT (not for data mining)
    • Goals:
      → Data warehouse: Prediction and descriptive
      → Data Model: normalization, independence, data integrity, referential integrity, structuring data
    • Type of model:
      → Data warehouse: star, snowflake, and galaxy models
      → Data Model: relational, entity, and hierarchical models
    • Technology:
      → Data warehouse: Software independent, concurrent users, meta data, aggregated data…
      Data Model: relations, constraints, DBMS, SCAN, SEEK
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

OLTP vs Warehousing
(Dominant Operations, Goals, Type of model, Technology)

A

OLTP:
Dominant Operations

  1. Maintenance Operations: Involves inserting, modifying, and deleting data as part of transactional processes.

Optimal Goals

Improving Maintenance Efficiency: Achieving acceptable transaction processing times for routine maintenance operations (insertion, modification, deletion).

Model

Normalized model with a focus on maintaining data integrity, often in a 2D structure

DB Technology

  1. Fundamentally Access by Content: Designed for content-based access, which may involve scanning data.
  2. Navigation by Foreign Key: Utilizes foreign keys for navigating relationships between tables.

VS.

Data Warehousing:

Dominant Operations

  1. Complex Queries: Involves querying large datasets for analytical purposes, often with aggregations, filtering, and grouping.
  2. Grouping and Ranking: Organizing and ranking individuals (objects, events) based on specified criteria and hierarchical structures.

Optimal Goals

Improving Query Efficiency: Achieving acceptable response times for complex queries and reporting.

Model

Multi-dimensional model with N dimensions and data blocks, supporting efficient querying and analysis.

DB Technology

  1. Relational Database with Star Structure: Tables exhibit relationships in a star schema, optimized for analytical queries.
  2. Multidimensional Database: Utilizes a multidimensional structure for faster access, often based on navigation indicators.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly