Introduction Flashcards

1
Q

What is the difference between the Star and Snowflake schema?

A

Star schemas have only one fact table located at the center of the model and then one level of dimension tables spread around the fact table in a star pattern. Since there is only one level of dimension table,
they’re very easy to query.

Snowflake schema is simply a more complex version of a star schema.
The main difference is that the dimension tables can have multiple levels.
increasing levels of dimension tables can greatly increase the amount of
processing required to run queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Are Star Schemas usually normalised?

A

NO

Star schemas are often not normalized because of only having one dimension table level.

Dimension tables are normalized in snowflake schema because data from the second and third dimension tables can be joined to higher level dimension
tables. This largely removes the need for duplicate data in a database resulting in normalized data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the main differences between SQL and NoSQL

A

SQL Database

FIXED schema

Vertically scalable

Table based

ACID

NoSQL Databases

Not FIXED Schema (structures/semi/unstructured)

Horizontally scaled

Type: Document, Key-Value, Graph

BASE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define the key differences between ETL and ETL Processes

A

ETL

Extract (Data Factory)

Transform (Databricks)

Load (into SQL)

ETL

Extract ( into staging area Data Factory)

Load (Polybase into Data Lake)

Transform (Databricks)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the key benefits of ETL v ETL?

A

ETL (mainly SQL)

Not good for loading massive amounts of data

May transform data that is not used

ETL (Synapse/Cosmos)

Transform when needed

Need good governance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Data Democratization and Data Governance

A

Data Democratization

Data democratization is just how accessible is your data

Data Democratization means we can make the data accessible, not just to the, not just IT or through IT.

Data Governance

Data democratization can absolutely be a bad thing if employed that way. The other side of that coin is data governance. So data governance is organizing who should have access to what data, how we have the data stored and how we’re going to control or release that data, to other interested parties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly