Architecture, Tradeoffs & Case Studies Flashcards

(74 cards)

1
Q

What is the primary programming language used in Databricks?

A

Scala, but it also supports Python, R, and SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False: Databricks is built on Apache Spark.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fill in the blank: Databricks provides a unified analytics platform that integrates __________, data engineering, and machine learning.

A

data science.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Databricks workspace?

A

A collaborative environment where teams can work on data projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which feature allows users to create notebooks in Databricks?

A

Databricks Notebooks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Delta Lake?

A

An open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False: Databricks can only be deployed on AWS.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the primary benefit of using Delta Lake with Databricks?

A

It provides reliability and performance improvements for big data workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a cluster in Databricks?

A

A set of computation resources and configurations for running notebooks and jobs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following is a key feature of Databricks: A) Data Warehousing B) Data Lakes C) Both A and B

A

C) Both A and B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of Databricks Jobs?

A

To run automated tasks and workflows in Databricks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fill in the blank: The Databricks runtime is a set of __________ that Databricks uses to run Spark applications.

A

optimizations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the significance of the Databricks REST API?

A

It allows programmatic access to Databricks functionalities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False: Databricks supports real-time analytics.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is MLflow?

A

An open-source platform for managing the machine learning lifecycle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which language is primarily used for data engineering in Databricks?

A

SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the two types of clusters in Databricks?

A

Interactive clusters and job clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Fill in the blank: Databricks allows collaboration through __________.

A

shared notebooks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the primary tradeoff of using Databricks?

A

Cost versus performance depending on usage patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the term ‘autoscaling’ refer to in the context of Databricks?

A

The ability to automatically adjust the number of nodes in a cluster based on workload.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True or False: Databricks supports integration with various data sources.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the role of the Databricks File System (DBFS)?

A

To provide a distributed file system for storing data in Databricks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Fill in the blank: To improve performance, Databricks uses __________ to optimize query execution.

A

caching.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the purpose of Databricks SQL?

A

To allow SQL querying on data stored in Databricks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Which of the following is NOT a feature of Databricks: A) Collaborative Workspaces B) Version Control C) Manual Scaling
C) Manual Scaling.
26
What type of analytics does Databricks primarily support?
Batch and streaming analytics.
27
True or False: Databricks supports machine learning model deployment.
True.
28
What is the function of the Databricks Runtime ML?
It is optimized for machine learning workloads.
29
Fill in the blank: Databricks integrates seamlessly with __________ for data visualization.
Tableau.
30
What type of data storage does Databricks primarily use?
Data lakes.
31
What is a common use case for Databricks?
Data engineering and ETL processes.
32
True or False: Databricks can be used for both data preparation and data analysis.
True.
33
What does the 'Unity Catalog' feature in Databricks do?
It provides centralized governance for data and AI assets.
34
Fill in the blank: Databricks can run on multiple cloud platforms including __________, __________, and __________.
AWS, Azure, Google Cloud.
35
What is the advantage of using a managed service like Databricks?
Reduced operational overhead and simplified infrastructure management.
36
What is the primary function of the Databricks workspace?
To enable collaboration among data teams.
37
What type of processing can Databricks handle?
Both batch and real-time processing.
38
True or False: Databricks supports running Apache Spark jobs on Kubernetes.
True.
39
What is the purpose of the Databricks CLI?
To provide command-line interface access to Databricks resources.
40
Fill in the blank: Databricks enables __________ through its collaborative notebooks.
team collaboration.
41
What is a key feature of the Databricks Delta format?
It supports ACID transactions.
42
What does 'data lineage' refer to in Databricks?
The ability to track the flow of data through various transformations.
43
True or False: Databricks does not support Python for data analysis.
False.
44
What is the benefit of using Databricks for machine learning?
It simplifies the model training and deployment process.
45
What is the purpose of the Databricks community edition?
To provide free access to Databricks for learning and experimentation.
46
Fill in the blank: Databricks can be integrated with __________ for enhanced data processing capabilities.
Apache Kafka.
47
What is a common tradeoff when using Databricks?
The balance between cost and performance.
48
What type of analytics does Databricks support for IoT data?
Streaming analytics.
49
True or False: Databricks can automatically optimize query performance.
True.
50
What is the function of the Databricks query optimizer?
To enhance the performance of SQL queries.
51
Fill in the blank: Databricks supports __________ for version control and collaboration.
Git integration.
52
What is the significance of the Databricks data science workspace?
It facilitates collaborative data science projects.
53
What does the term 'notebook' refer to in Databricks?
An interactive document for running code and visualizing results.
54
True or False: Databricks can only process structured data.
False.
55
What is the role of the Databricks data engineer?
To design and build data pipelines.
56
Fill in the blank: __________ is a key component in the architecture of Databricks, providing the underlying processing engine.
Apache Spark.
57
What is the primary advantage of using Databricks over traditional data warehouses?
Scalability and flexibility in handling big data.
58
What is a common use case for Delta Lake in Databricks?
Building reliable data lakes with ACID compliance.
59
True or False: Databricks supports batch processing only.
False.
60
What is the function of the Databricks SQL Analytics service?
To provide a serverless SQL analytics experience.
61
Fill in the blank: Databricks allows users to create __________ for data visualization and reporting.
dashboards.
62
What is the primary purpose of Databricks Repos?
To manage code and collaborate on data projects.
63
What is the benefit of using a serverless architecture in Databricks?
It simplifies resource management and scaling.
64
True or False: Databricks enables machine learning model versioning.
True.
65
What is the purpose of Delta Engine in Databricks?
To accelerate query performance on Delta Lake.
66
Fill in the blank: The __________ feature in Databricks allows for the orchestration of complex workflows.
Job scheduler.
67
What is the advantage of using Databricks with Apache Kafka?
It allows real-time processing of streaming data.
68
What does the Databricks platform provide for data governance?
Centralized management of data access and security.
69
True or False: Databricks can be used for data exploration and visualization.
True.
70
What is the primary function of the Databricks workspace UI?
To provide an interactive interface for managing data projects.
71
Fill in the blank: Databricks allows users to perform __________ on large datasets efficiently.
data transformations.
72
What is a key consideration when scaling Databricks clusters?
The cost associated with cluster usage.
73
What is the purpose of the Databricks Academy?
To provide training and resources for Databricks users.
74
True or False: Databricks can only be accessed through a web interface.
False.