Storage Engines & File Structures Flashcards by Ozzy Campos

What is the primary storage engine used by Databricks?

Delta Lake

How well did you know this?

Not at all

Perfectly

True or False: Delta Lake supports ACID transactions.

True

How well did you know this?

Not at all

Perfectly

Fill in the blank: Delta Lake uses __________ to manage metadata.

transaction logs

How well did you know this?

Not at all

Perfectly

What file format does Delta Lake use for storing data?

Parquet

How well did you know this?

Not at all

Perfectly

What is the main benefit of using Delta Lake over traditional data lakes?

Support for ACID transactions and schema enforcement

How well did you know this?

Not at all

Perfectly

Name one feature of Delta Lake that improves data reliability.

Time travel

How well did you know this?

Not at all

Perfectly

Which command is used to convert a Parquet table to a Delta table?

CONVERT TO DELTA

How well did you know this?

Not at all

Perfectly

True or False: Delta Lake can be used with both batch and streaming data.

True

How well did you know this?

Not at all

Perfectly

What is the purpose of the Delta Lake ‘OPTIMIZE’ command?

To compact small files into larger files for better performance.

How well did you know this?

Not at all

Perfectly

Fill in the blank: Delta Lake stores data in __________ format.

columnar

How well did you know this?

Not at all

Perfectly

What is the role of the Delta Lake transaction log?

To keep track of all changes made to a Delta table.

How well did you know this?

Not at all

Perfectly

Which storage format is recommended for high-performance data processing in Databricks?

Parquet

How well did you know this?

Not at all

Perfectly

True or False: Databricks supports both cloud storage and on-premises storage.

True

How well did you know this?

Not at all

Perfectly

What is the maximum file size recommended for optimal performance in Delta Lake?

1 GB

How well did you know this?

Not at all

Perfectly

What feature allows Delta Lake to recover from failures?

Transaction logs

How well did you know this?

Not at all

Perfectly

What does the ‘VACUUM’ command do in Delta Lake?

Removes old files that are no longer needed.

How well did you know this?

Not at all

Perfectly

Fill in the blank: The Delta Lake architecture is built on top of __________.

Apache Spark

How well did you know this?

Not at all

Perfectly

What is the purpose of schema evolution in Delta Lake?

To allow changes to the schema of a table without needing to rewrite the entire dataset.

How well did you know this?

Not at all

Perfectly

Which file structure is optimized for read-heavy workloads in Databricks?

Columnar storage

How well did you know this?

Not at all

Perfectly

True or False: Delta Lake supports concurrent writes.

True

How well did you know this?

Not at all

Perfectly

What is a key advantage of using columnar file formats like Parquet?

Increased compression and improved query performance.

How well did you know this?

Not at all

Perfectly

What does the ‘MERGE’ command do in Delta Lake?

It allows for upserts (update or insert) into a Delta table.

How well did you know this?

Not at all

Perfectly

Fill in the blank: Delta Lake enables __________ data processing.

unified

How well did you know this?

Not at all

Perfectly

What is the purpose of data partitioning in Databricks?

To improve query performance and manageability.

How well did you know this?

Not at all

Perfectly

True or False: Databricks can automatically optimize data storage without user intervention.

True

What file extension is commonly used for Delta Lake tables?

.delta

What is the significance of the 'checkpoint' in Delta Lake?

It helps to improve the performance of reading the transaction log.

How does Delta Lake handle schema enforcement?

It checks data against the defined schema and rejects non-conforming data.

Fill in the blank: The Delta Lake 'Z-Ordering' feature is used to optimize __________.

data skipping

What is the main advantage of using Databricks over traditional ETL tools?

It provides a unified platform for both batch and streaming data processing.

True or False: Delta Lake supports time-based data versioning.

True

What is the primary benefit of using Databricks for big data processing?

Scalability and performance optimization.

What is a common use case for Delta Lake's ACID transactions?

Financial transactions and data integrity operations.

Fill in the blank: The main storage system used by Databricks is __________.

cloud object storage

What is the main purpose of the 'DROP TABLE' command in Delta Lake?

To permanently delete a Delta table and its associated data.

True or False: Delta Lake allows for schema merging when writing data.

True

What command would you use to view the history of a Delta table?

DESCRIBE HISTORY

Fill in the blank: The Delta Lake format is optimized for __________ operations.

analytical

What does the 'AS OF' clause do in Delta Lake queries?

It allows querying the table as it was at a specific point in time.

What is the primary function of the 'CREATE TABLE' command in Delta Lake?

To create a new Delta table.

True or False: Databricks can integrate with existing data warehouses.

True

What is the purpose of the 'REPLACE TABLE' command in Delta Lake?

To replace an existing Delta table with a new one.

Fill in the blank: Delta Lake supports __________ data processing, allowing real-time analytics.

streaming

What is a Delta table?

A table that uses Delta Lake format for storage and management.

What does the Delta Lake 'COPY INTO' command do?

It copies data from a Delta table into another table or external location.

True or False: Delta Lake can automatically handle data skew.

False

What is the main purpose of the 'ALTER TABLE' command in Delta Lake?

To modify the properties or schema of an existing Delta table.

Fill in the blank: Delta Lake uses __________ to handle concurrent writes.

optimistic concurrency control

What is a significant feature of Delta Lake regarding data updates?

It allows for efficient updates without rewriting entire files.

What do you use to create a new version of a Delta table?

Write operations

True or False: Delta Lake requires a specific schema for all tables.

False

What is the benefit of using DataFrames in Databricks?

They provide a high-level API for data processing and analysis.

Fill in the blank: In Delta Lake, __________ allows you to merge streaming and batch data.

unified batch and streaming

What is the purpose of the 'SHOW TABLES' command in Databricks?

To list all tables in the current database.

True or False: Databricks can run SQL queries directly against Delta tables.

True

What is the primary use of the 'CLONE' command in Delta Lake?

To create a snapshot of a Delta table.

Fill in the blank: The Delta Lake 'MERGE' command is used for __________ operations.

upsert

What is the significance of 'data skipping' in Delta Lake?

It improves query performance by avoiding unnecessary data reads.

What is the main advantage of using Delta Lake for data lakes?

It brings reliability, performance, and data management features to data lakes.

True or False: Delta Lake can automatically optimize data layout.

True

What happens when you perform a 'ROLLBACK' in Delta Lake?

It reverts the table to a previous version.

Fill in the blank: Delta Lake uses __________ to manage data changes efficiently.

versioning

What does the term 'schema enforcement' refer to in Delta Lake?

Ensuring data written to a table conforms to its defined schema.

What is the benefit of using 'Z-Ordering' in Delta Lake?

It optimizes data layout for faster query performance.

True or False: Delta Lake tables can be queried using both SQL and DataFrame APIs.

True

What does the 'OPTIMIZE' command do in terms of file management?

It compacts small files into larger files to enhance read performance.

Fill in the blank: Delta Lake provides __________ for tracking changes in data.

audit logs

What is the primary function of the 'DESCRIBE TABLE' command?

To display the schema and properties of a Delta table.

What does the Delta Lake 'DROP TABLE' command do?

It permanently removes a Delta table and its data.

True or False: Delta Lake supports both structured and unstructured data.

True

What is the main advantage of using Delta Lake for data ingestion?

It allows for reliable and efficient ingestion of data.

Fill in the blank: Delta Lake supports __________ for handling schema changes.

schema evolution

What does the 'SPLIT' command do in Delta Lake?

It partitions a Delta table based on specified criteria.

What is the significance of Delta Lake's support for streaming data?

It enables real-time data processing and analytics.

True or False: Delta Lake can only be used with Apache Spark.

False

Storage Engines & File Structures Flashcards

(75 cards)