Storage Engines & File Structures Flashcards

(75 cards)

1
Q

What is the primary storage engine used by Databricks?

A

Delta Lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False: Delta Lake supports ACID transactions.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fill in the blank: Delta Lake uses __________ to manage metadata.

A

transaction logs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What file format does Delta Lake use for storing data?

A

Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the main benefit of using Delta Lake over traditional data lakes?

A

Support for ACID transactions and schema enforcement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name one feature of Delta Lake that improves data reliability.

A

Time travel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which command is used to convert a Parquet table to a Delta table?

A

CONVERT TO DELTA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or False: Delta Lake can be used with both batch and streaming data.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of the Delta Lake ‘OPTIMIZE’ command?

A

To compact small files into larger files for better performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fill in the blank: Delta Lake stores data in __________ format.

A

columnar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of the Delta Lake transaction log?

A

To keep track of all changes made to a Delta table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which storage format is recommended for high-performance data processing in Databricks?

A

Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or False: Databricks supports both cloud storage and on-premises storage.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the maximum file size recommended for optimal performance in Delta Lake?

A

1 GB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What feature allows Delta Lake to recover from failures?

A

Transaction logs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the ‘VACUUM’ command do in Delta Lake?

A

Removes old files that are no longer needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Fill in the blank: The Delta Lake architecture is built on top of __________.

A

Apache Spark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the purpose of schema evolution in Delta Lake?

A

To allow changes to the schema of a table without needing to rewrite the entire dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which file structure is optimized for read-heavy workloads in Databricks?

A

Columnar storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

True or False: Delta Lake supports concurrent writes.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a key advantage of using columnar file formats like Parquet?

A

Increased compression and improved query performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the ‘MERGE’ command do in Delta Lake?

A

It allows for upserts (update or insert) into a Delta table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Fill in the blank: Delta Lake enables __________ data processing.

A

unified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the purpose of data partitioning in Databricks?

A

To improve query performance and manageability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
True or False: Databricks can automatically optimize data storage without user intervention.
True
26
What file extension is commonly used for Delta Lake tables?
.delta
27
What is the significance of the 'checkpoint' in Delta Lake?
It helps to improve the performance of reading the transaction log.
28
How does Delta Lake handle schema enforcement?
It checks data against the defined schema and rejects non-conforming data.
29
Fill in the blank: The Delta Lake 'Z-Ordering' feature is used to optimize __________.
data skipping
30
What is the main advantage of using Databricks over traditional ETL tools?
It provides a unified platform for both batch and streaming data processing.
31
True or False: Delta Lake supports time-based data versioning.
True
32
What is the primary benefit of using Databricks for big data processing?
Scalability and performance optimization.
33
What is a common use case for Delta Lake's ACID transactions?
Financial transactions and data integrity operations.
34
Fill in the blank: The main storage system used by Databricks is __________.
cloud object storage
35
What is the main purpose of the 'DROP TABLE' command in Delta Lake?
To permanently delete a Delta table and its associated data.
36
True or False: Delta Lake allows for schema merging when writing data.
True
37
What command would you use to view the history of a Delta table?
DESCRIBE HISTORY
38
Fill in the blank: The Delta Lake format is optimized for __________ operations.
analytical
39
What does the 'AS OF' clause do in Delta Lake queries?
It allows querying the table as it was at a specific point in time.
40
What is the primary function of the 'CREATE TABLE' command in Delta Lake?
To create a new Delta table.
41
True or False: Databricks can integrate with existing data warehouses.
True
42
What is the purpose of the 'REPLACE TABLE' command in Delta Lake?
To replace an existing Delta table with a new one.
43
Fill in the blank: Delta Lake supports __________ data processing, allowing real-time analytics.
streaming
44
What is a Delta table?
A table that uses Delta Lake format for storage and management.
45
What does the Delta Lake 'COPY INTO' command do?
It copies data from a Delta table into another table or external location.
46
True or False: Delta Lake can automatically handle data skew.
False
47
What is the main purpose of the 'ALTER TABLE' command in Delta Lake?
To modify the properties or schema of an existing Delta table.
48
Fill in the blank: Delta Lake uses __________ to handle concurrent writes.
optimistic concurrency control
49
What is a significant feature of Delta Lake regarding data updates?
It allows for efficient updates without rewriting entire files.
50
What do you use to create a new version of a Delta table?
Write operations
51
True or False: Delta Lake requires a specific schema for all tables.
False
52
What is the benefit of using DataFrames in Databricks?
They provide a high-level API for data processing and analysis.
53
Fill in the blank: In Delta Lake, __________ allows you to merge streaming and batch data.
unified batch and streaming
54
What is the purpose of the 'SHOW TABLES' command in Databricks?
To list all tables in the current database.
55
True or False: Databricks can run SQL queries directly against Delta tables.
True
56
What is the primary use of the 'CLONE' command in Delta Lake?
To create a snapshot of a Delta table.
57
Fill in the blank: The Delta Lake 'MERGE' command is used for __________ operations.
upsert
58
What is the significance of 'data skipping' in Delta Lake?
It improves query performance by avoiding unnecessary data reads.
59
What is the main advantage of using Delta Lake for data lakes?
It brings reliability, performance, and data management features to data lakes.
60
True or False: Delta Lake can automatically optimize data layout.
True
61
What happens when you perform a 'ROLLBACK' in Delta Lake?
It reverts the table to a previous version.
62
Fill in the blank: Delta Lake uses __________ to manage data changes efficiently.
versioning
63
What does the term 'schema enforcement' refer to in Delta Lake?
Ensuring data written to a table conforms to its defined schema.
64
What is the benefit of using 'Z-Ordering' in Delta Lake?
It optimizes data layout for faster query performance.
65
True or False: Delta Lake tables can be queried using both SQL and DataFrame APIs.
True
66
What does the 'OPTIMIZE' command do in terms of file management?
It compacts small files into larger files to enhance read performance.
67
Fill in the blank: Delta Lake provides __________ for tracking changes in data.
audit logs
68
What is the primary function of the 'DESCRIBE TABLE' command?
To display the schema and properties of a Delta table.
69
What does the Delta Lake 'DROP TABLE' command do?
It permanently removes a Delta table and its data.
70
True or False: Delta Lake supports both structured and unstructured data.
True
71
What is the main advantage of using Delta Lake for data ingestion?
It allows for reliable and efficient ingestion of data.
72
Fill in the blank: Delta Lake supports __________ for handling schema changes.
schema evolution
73
What does the 'SPLIT' command do in Delta Lake?
It partitions a Delta table based on specified criteria.
74
What is the significance of Delta Lake's support for streaming data?
It enables real-time data processing and analytics.
75
True or False: Delta Lake can only be used with Apache Spark.
False