General Flashcards

1
Q

Syntax for Generated Column

A

GENERATED ALWAYS AS (CAST(orderTime as DATE))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the main difference between AUTO LOADER and COPY INTO?

A

Auto loader supports both directory listing and file notification but COPY INTO only supports directory listing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why does AUTO LOADER require schema location?

A

Schema location is used to store schema inferred by AUTO LOADER

Explanation: the next time AUTO LOADER runs faster since it does not need to infer the schema every single time by trying to use the last known schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You are designing a data model that works for both machine learning using images and Batch ETL/ELT workloads. Which of the following features of data lakehouse can help you meet the needs of both workloads?

A

Data lakehouse can store unstructured data and support ACID transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Where does Databricks architecture host jobs/pipelines and queries?

A

Control Plane

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Databricks Repos can implement what CI/CD operations?

A

Pull the latest version of code into production folder

Explanation: Not stuff like PRs and reviews - that’s handled by github (e.g.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s the syntax for creating or overwriting an existing delta table?

A

CREATE OR REPLACE TABLE

Explanation: When creating a table in Databricks by default the table is stored in DELTA format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When a managed table is dropped, what happens to the data, metadata, and history?

A

They are also dropped from storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When a notebook is detached and re-attached, what happens to session-scoped temporary views?

A

They are lost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When a notebook is detached and re-attached, what happens to global temporary views?

A

They can still be accessed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Use colon (:) syntax in queries to access subfields in ____ and use period (.) syntax in queries to access subfields in ____

A

JSON strings

Struct types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assert syntax

A

assert row_count == 10, “Error message”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Python error handling

A

try: except:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Python Spark Syntax to create a view on top of the delta stream(stream on delta table)?

A

Spark.readStream.table(“sales”).createOrReplaceTempView(“streaming_vw”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

You are currently asked to work on building a data pipeline, you have noticed that you are currently working with a data source that has a lot of data quality issues and you need to monitor data quality and enforce it as part of the data ingestion process, which of the following tools can be used to address this problem?

A

Delta Live Tables

Explanation: Delta live tables expectations can be used to identify and quarantine bad data, all of the data quality metrics are stored in the event logs which can be used to later analyze and monitor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the different ways you can schedule a job in Databricks workspace?

A

Immediate, CRON, Continuous, when new files arrive

17
Q

Databricks SQL queries are running slow. All the queries are running in parallel and using a SQL endpoint(SQL Warehouse) with a single cluster. What can you do to improve the performance/response times of the queries?

A

Increase the maximum bound of the SQL endpoint’s scaling range.

Explanation: The question is looking to test your ability to know how to scale a SQL Endpoint(SQL Warehouse) and you have to look for cue words or need to understand if the queries are running sequentially or concurrently. if the queries are running sequentially then scale up(Size of the cluster from 2X-Small to 4X-Large) if the queries are running concurrently or with more users then scale out(add more clusters).

18
Q

What does the Auto Stop feature do?

A

It automatically terminates the cluster when you are not using it

19
Q

Unity catalog simplifies managing multiple workspaces, by storing and managing permissions and ACL at _______ level

A

Account

20
Q

What section in the UI can be used to manage permissions and grants to tables?

A

Data Explorer

21
Q

What is not a privilege in the Unity catalog?

A

DELETE

Explanation: DELETE and UPDATE permissions do not exit, you have to use MODIFY which provides both Update and Delete permissions.

Also: TABLE ACL privilege types are different from Unity Catalog privilege types, please read the question carefully.

22
Q

Syntax for transferring ownership of a table to a group

A

ALTER TABLE table_name OWNER to ‘group’

23
Q

What is the array function that takes an input column and returns a unique list of values in an array?

A

collect_set()

24
Q

What is the default location where spark stores user databases?

A

dbfs:/user/hive/warehouse

25
Q

When can INSERT OVERWRITE update the schema?

A

when spark.databricks.delta.schema.autoMerge.enabled is set true

26
Q

Which of these is NOT a valid messaging option for job notifications? (SMS, Email, PagerDuty, Messaging Webhook, SES, SNS)

A

SMS

27
Q

Databricks Web Application is hosted in the Control Plane or Data Plane?

A

Control Plane

28
Q

Notebooks and Jobs are hosted in the Control Plane or Data Plane?

A

Control Plane

29
Q

What are the output modes for the trigger command while writing to a streaming table?

A

Append and Complete

30
Q

What command can be used to write data into a Delta table while avoiding the writing of duplicate records?

A

MERGE

31
Q

Why does AUTO LOADER require schema location?

A

Schema Location is used to store the location of schema inferred by the AutoLoader

32
Q

What tool provides Data Access control, Access Audit, Data Lineage, and Data discovery?

A

Unity Catalog

33
Q

What is the default trigger interval for structured streaming queries?

A

half a second

34
Q

How do you establish a trigger to micro-batch data every 5 seconds?

A

.trigger(processingTime=”5 seconds”)

35
Q

How do you return a GroupedData object?

A

DataFrame.groupBy()

36
Q

How would you describe a database named db_hr?

A

DESCRIBE DATABASE db_hr;

37
Q
A