Indexes & Access Paths Flashcards
(166 cards)
What is Databricks Delta?
Databricks Delta is a storage layer that brings ACID transactions to Apache Spark and big data workloads.
True or False: Databricks supports only one type of file format.
False
What are the primary file formats supported by Databricks?
Parquet, JSON, CSV, Avro, and Delta.
Fill in the blank: Databricks uses __________ to optimize read and write operations.
Delta Lake
What storage engine does Databricks primarily use for large-scale data processing?
Apache Spark
What is the purpose of the Delta Lake transaction log?
To track changes and provide ACID compliance.
Which file format is known for its columnar storage and is optimized for analytical queries?
Parquet
True or False: Databricks can read data from both structured and semi-structured sources.
True
What feature allows Delta Lake to handle schema evolution?
Schema enforcement and schema evolution capabilities.
What is the main advantage of using Delta Lake over traditional data lakes?
Delta Lake provides ACID transactions, data reliability, and improved performance.
Which command is used to convert a Parquet table to a Delta table?
CONVERT TO DELTA
What is the primary role of the Databricks File System (DBFS)?
DBFS provides a distributed file system for storing data in Databricks.
True or False: Delta Lake supports time travel features.
True
What does the term ‘time travel’ in Delta Lake refer to?
Accessing previous versions of data for auditing or rollback.
Fill in the blank: The __________ command is used to optimize the layout of data files in Delta Lake.
OPTIMIZE
What is the role of checkpoints in Delta Lake?
To improve the performance of streaming queries by storing the state of the transaction log.
What is the default storage format for Databricks tables?
Delta format
True or False: Databricks can integrate with cloud storage services like AWS S3 and Azure Blob Storage.
True
Which method is used to read a Delta table in Databricks?
spark.read.format(‘delta’).load(‘path/to/delta/table’)
Fill in the blank: The __________ command is used to create a Delta table from an existing DataFrame.
write
How does Databricks handle data versioning?
By maintaining a transaction log that records all changes.
What is the significance of the ‘MERGE’ command in Delta Lake?
It allows for upserts (update or insert) into a Delta table.
True or False: Data in Delta Lake can be stored in multiple formats.
False
What is a key benefit of using columnar storage formats like Parquet?
Efficient data compression and improved query performance.