Lakehouse Flashcards
(49 cards)
What is a lakehouse?
A lakehouse presents as a database and is built on top of a data lake using Delta format tables.
What capabilities do lakehouses combine?
The SQL-based analytical capabilities of a relational data warehouse and the flexibility and scalability of a data lake.
What types of data formats can lakehouses store?
All data formats.
What is the advantage of lakehouses being cloud-based?
They can scale automatically and provide high availability and disaster recovery.
What processing engines do lakehouses use?
Spark and SQL engines.
What is the schema-on-read format?
Data is organized in a schema-on-read format, meaning the schema is defined as needed rather than having a predefined schema.
What does ACID stand for in the context of lakehouses?
Atomicity, Consistency, Isolation, Durability.
What are the roles of different users in a lakehouse?
Data engineers, data scientists, and data analysts access and use data.
What is the ETL process?
Extract, Transform, Load.
What types of data sources can be ingested into a lakehouse?
Local files, databases, or APIs.
What are Fabric shortcuts?
Links to data in external sources, such as Azure Data Lake Store Gen2 or OneLake.
What tools can be used to transform ingested data?
Apache Spark with notebooks or Dataflows Gen2.
What is the purpose of Data Factory pipelines?
To orchestrate different ETL activities and land prepared data into the lakehouse.
What familiar tool do Dataflows Gen2 utilize?
Power Query.
What can you analyze using a lakehouse?
Using SQL.
What can be developed in Power BI using a lakehouse?
Reports.
How is lakehouse access managed?
Through workspace roles or item-level sharing.
What are sensitivity labels used for in lakehouses?
Data governance features.
True or False: Item-level sharing is best for granting access for read-only needs.
True.
Fill in the blank: Lakehouses support _______ transactions through Delta Lake formatted tables.
ACID
What is a key benefit of using a lakehouse for analytics?
Scalable analytics solution that maintains data consistency.
What three items are automatically created in your workspace when you create a new lakehouse?
Shortcuts, folders, files, and tables.
The lakehouse serves as a central hub for data management.
What does the Semantic model (default) provide for Power BI report developers?
An easy data source.
The Semantic model simplifies data representation for reporting.
What is the purpose of the SQL analytics endpoint in a lakehouse?
Allows read-only access to query data with SQL.
This endpoint enables SQL-based interaction with the lakehouse data.