Architecture, Tradeoffs & Case Studies Flashcards
(74 cards)
What is the primary programming language used in Databricks?
Scala, but it also supports Python, R, and SQL.
True or False: Databricks is built on Apache Spark.
True.
Fill in the blank: Databricks provides a unified analytics platform that integrates __________, data engineering, and machine learning.
data science.
What is a Databricks workspace?
A collaborative environment where teams can work on data projects.
Which feature allows users to create notebooks in Databricks?
Databricks Notebooks.
What is Delta Lake?
An open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.
True or False: Databricks can only be deployed on AWS.
False.
What is the primary benefit of using Delta Lake with Databricks?
It provides reliability and performance improvements for big data workloads.
What is a cluster in Databricks?
A set of computation resources and configurations for running notebooks and jobs.
Which of the following is a key feature of Databricks: A) Data Warehousing B) Data Lakes C) Both A and B
C) Both A and B.
What is the purpose of Databricks Jobs?
To run automated tasks and workflows in Databricks.
Fill in the blank: The Databricks runtime is a set of __________ that Databricks uses to run Spark applications.
optimizations.
What is the significance of the Databricks REST API?
It allows programmatic access to Databricks functionalities.
True or False: Databricks supports real-time analytics.
True.
What is MLflow?
An open-source platform for managing the machine learning lifecycle.
Which language is primarily used for data engineering in Databricks?
SQL.
What are the two types of clusters in Databricks?
Interactive clusters and job clusters.
Fill in the blank: Databricks allows collaboration through __________.
shared notebooks.
What is the primary tradeoff of using Databricks?
Cost versus performance depending on usage patterns.
What does the term ‘autoscaling’ refer to in the context of Databricks?
The ability to automatically adjust the number of nodes in a cluster based on workload.
True or False: Databricks supports integration with various data sources.
True.
What is the role of the Databricks File System (DBFS)?
To provide a distributed file system for storing data in Databricks.
Fill in the blank: To improve performance, Databricks uses __________ to optimize query execution.
caching.
What is the purpose of Databricks SQL?
To allow SQL querying on data stored in Databricks.