Data Engineering Part 3 Flashcards
(15 cards)
What is a relational database?
A database that stores data in tables with rows and columns using SQL.
What is SQL?
Structured Query Language, used for managing and querying relational databases.
What is a primary key?
A unique identifier for rows in a table.
What is a foreign key?
A reference to a primary key in another table.
What is normalization in databases?
Organizing data to reduce redundancy and improve integrity.
What is a data warehouse?
A central repository for structured data used for analysis and reporting.
How does a data warehouse differ from a database?
Warehouses are optimized for analysis; databases are optimized for transactions.
What is OLAP?
Online Analytical Processing — used for complex analytical queries in warehouses.
What is OLTP?
Online Transaction Processing — used for frequent, simple transactions in operational databases.
Name examples of cloud data warehouses.
Amazon Redshift, Google BigQuery, Snowflake.
What is a data lake?
A storage system that holds raw structured and unstructured data at scale.
How is a data lake different from a data warehouse?
Lakes store raw, unprocessed data; warehouses store processed, structured data.
What are typical file formats in a data lake?
Parquet, Avro, ORC.
What is schema-on-read?
Schema is applied when the data is read, not when it’s written.
What is a data swamp?
A poorly managed data lake with inaccessible or low-quality data.