Concepts Flashcards

(33 cards)

1
Q

What does a JOIN operation do in Databricks SQL?

A

A JOIN operation in Databricks SQL combines rows from a left table with a right table based on specified join criteria.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the default join type in Databricks SQL?

A

The default join type in Databricks SQL is INNER JOIN, which returns only rows with matching values in both tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between INNER JOIN and LEFT OUTER JOIN?

A

INNER JOIN returns only matching rows from both tables, while LEFT OUTER JOIN returns all rows from the left table and matched rows from the right, filling in NULLs for unmatched rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does RIGHT OUTER JOIN differ from LEFT OUTER JOIN?

A

RIGHT OUTER JOIN returns all rows from the right table and matched rows from the left table, inserting NULLs where there is no match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is FULL OUTER JOIN in Databricks SQL?

A

FULL OUTER JOIN returns all rows from both tables, inserting NULLs in columns where there is no match from the other table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does LEFT SEMI JOIN return?

A

LEFT SEMI JOIN returns rows from the left table that have at least one match in the right table, without including columns from the right table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the function of LEFT ANTI JOIN?

A

LEFT ANTI JOIN returns rows from the left table that do not have any match in the right table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What result does a CROSS JOIN produce?

A

A CROSS JOIN returns the Cartesian product of two tables, combining each row of the first table with all rows from the second table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of NATURAL JOIN in Databricks SQL?

A

NATURAL JOIN implicitly matches columns with the same names in both tables, combining rows with equal values in those columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens if you omit the join criteria in a JOIN operation?

A

If the join criteria is omitted, any JOIN type will behave like a CROSS JOIN, creating a Cartesian product.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the ON clause work in JOINs?

A

The ON clause defines a boolean condition that specifies how rows from the two tables should be matched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the USING clause do in a JOIN?

A

USING clause matches rows by comparing equality across specified columns that must exist in both tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What effect does USING or NATURAL JOIN have on SELECT *?

A

Using USING or NATURAL JOIN removes duplicate columns from the result when SELECT * is used, showing matched columns only once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a table alias in the context of JOINs?

A

A table alias is a temporary name assigned to a table in a JOIN, optionally with column identifiers for clarity or brevity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does SELECT * behave differently with USING compared to ON?

A

With USING, SELECT * displays only one instance of each matched column, followed by all other columns from the left and then the right table, excluding the duplicated columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can USING (a, b) be rewritten using ON?

A

USING (a, b) is equivalent to ON first.a = second.a AND first.b = second.b, with SELECT * excluding repeated a and b columns.

17
Q

What does a LATERAL JOIN do in Databricks SQL?

A

A LATERAL JOIN allows a subquery on the right side of the JOIN to reference columns from the left table, effectively making the subquery dependent on each row of the left table.

18
Q

What is the difference between LATERAL INNER JOIN and LATERAL LEFT JOIN?

A

LATERAL INNER JOIN returns only rows with matches in the subquery, while LATERAL LEFT JOIN returns all left rows and NULLs for non-matching subquery results.

19
Q

What practical example illustrates an INNER JOIN between employee and department?

A

An INNER JOIN returns only employees whose deptno exists in department. For example, Paul (deptno 3) joins with Engineering.

20
Q

What practical example illustrates a LEFT JOIN between employee and department?

A

A LEFT JOIN returns all employees, with department info for matches and NULLs otherwise. Chloe, Evan, and Amy have NULL deptnames.

21
Q

What result does a FULL JOIN produce in a real-world example?

A

It returns all employees and departments, matching where possible and inserting NULLs for unmatched entries.

22
Q

How does a SEMI JOIN behave in an employee-department example?

A

It returns only employees with matching department keys, excluding any department columns.

23
Q

How does an ANTI JOIN behave in an employee-department example?

A

It returns only employees without a corresponding department in the department table.

24
Q

What does a CROSS JOIN look like with employee and department?

A

It creates a Cartesian product where each employee appears once for every department, greatly increasing row count.

25
How do flows in Lakeflow Declarative Pipelines handle processing steps?
Flows orchestrate processing steps automatically to ensure correct execution order and maximum parallelism. They retry transient failures starting at the task level, then the flow, and finally the entire pipeline if needed.
26
What does declarative processing in Lakeflow Declarative Pipelines enable?
It allows users to express complex logic with minimal code, reducing manual Spark and streaming code to a few lines. The AUTO CDC API supports handling CDC events, SCD Type 1 & 2, and simplifies streaming processing.
27
How does incremental processing work in Lakeflow Declarative Pipelines?
It enables processing only new or changed data using materialized views. You write transformation logic with batch semantics, and the engine handles incremental updates automatically.
28
What is a flow in Lakeflow Declarative Pipelines?
A flow is a unit of data processing supporting both batch and streaming semantics. It reads from a source, applies logic, and writes to a target. Types include Append, AUTO CDC, and Materialized View.
29
What is a streaming table in Lakeflow Declarative Pipelines?
A streaming table is a managed table that serves as a target for streaming flows like Append and AUTO CDC. Flows can be defined independently or within the table definition.
30
How do materialized views function in Lakeflow Declarative Pipelines?
They are managed batch tables with one or more flows that only process new or changed source data. Flows are always defined implicitly in the view definition.
31
What are sinks in Lakeflow Declarative Pipelines?
Sinks are streaming targets that support destinations like Delta tables, Kafka, and Azure EventHubs. They can receive one or more Append flows.
32
What is a pipeline in Lakeflow Declarative Pipelines?
A pipeline defines and runs multiple flows, tables, views, and sinks. It determines execution order and parallelism based on dependencies automatically.
33
How does Databricks SQL support Lakeflow Declarative Pipelines?
Databricks SQL supports creating and refreshing streaming tables and materialized views using standard SQL. These have the same semantics and processing behavior as their pipeline equivalents.