Advance SQL Concepts Flashcards

(54 cards)

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are window functions?

A

Functions that perform calculations across sets of rows related to the current row without collapsing them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does ROW_NUMBER() do?

A

Assigns a unique sequential integer to rows within a partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does RANK() do?

A

Gives ranks with gaps for ties in ordered results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does DENSE_RANK() do?

A

Gives ranks without gaps for ties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What clause defines partitions for window functions?

A

PARTITION BY inside the OVER() clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What clause defines order for window functions?

A

ORDER BY inside the OVER() clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between GROUP BY and window functions?

A

GROUP BY collapses rows into one per group, window functions keep individual rows while computing aggregates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a common table expression (CTE)?

A

A temporary result set defined using WITH that can be referenced within the main query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between a CTE and a subquery?

A

CTEs can be referenced multiple times and improve readability; subqueries cannot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate cumulative sums using window functions?

A

SUM(column) OVER(ORDER BY column) computes cumulative totals across rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you find the previous row’s value in a window?

A

Use LAG(column) OVER(ORDER BY column) to access the previous row value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you find the next row’s value in a window?

A

Use LEAD(column) OVER(ORDER BY column) to access the next row value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you select the top N rows per group?

A

Use ROW_NUMBER() or RANK() with PARTITION BY to rank rows, then filter on the rank.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate percentiles or top X% per group?

A

Use NTILE(n) OVER(PARTITION BY group_column ORDER BY value_column) to divide rows into n buckets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you combine multiple CTEs?

A

Chain them using WITH cte1 AS (…), cte2 AS (…) SELECT … FROM cte2 JOIN cte1 …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you filter aggregated groups using window functions?

A

Window functions don’t filter rows directly; use ROW_NUMBER()/RANK in a subquery or CTE and then filter in the outer query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you calculate running averages per user?

A

Use AVG(column) OVER(PARTITION BY user_id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you find gaps or sequences in data?

A

Use LAG() or LEAD() to compare current row with previous/next rows, then filter differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you handle ties in ranking?

A

RANK() leaves gaps, DENSE_RANK() does not; choose based on requirement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you calculate cumulative counts per group?

A

COUNT(*) OVER(PARTITION BY group_column ORDER BY sort_column ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you write a recursive CTE?

A

WITH RECURSIVE cte_name AS (anchor_query UNION ALL recursive_query) SELECT * FROM cte_name;

23
Q

How do you calculate a moving average over last N rows?

A

AVG(column) OVER(ORDER BY date ROWS BETWEEN N PRECEDING AND CURRENT ROW);

24
Q

How do you join multiple CTEs?

A

Define them in WITH clause separated by commas, then reference them in the main SELECT with JOINs.

25
How do you implement a top-N-per-category query?
Use ROW_NUMBER() OVER(PARTITION BY category ORDER BY metric DESC) as rank, then filter WHERE rank <= N.
26
How do you calculate cumulative sums per category?
SUM(amount) OVER(PARTITION BY category ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).
27
How do you identify first and last events per group?
Use ROW_NUMBER() OVER(PARTITION BY group_column ORDER BY date ASC) = 1 for first, DESC = 1 for last.
28
How can you rank rows across the entire table vs per group?
Omit PARTITION BY for entire table ranking; include PARTITION BY for per-group ranking.
29
How do you calculate difference between current and previous row?
Use column - LAG(column) OVER(ORDER BY column) to get the difference.
30
How do you filter top X% of rows in SQL?
Use CTE/subquery with RANK()/NTILE() and filter rank <= X% of total count.
31
32
Return only rows from A that have at least one match in B — what’s this join called?
Left semi-join.
33
Return only rows from A that have no matches in B — what’s this join called?
Left anti-join.
34
Write the most portable SQL pattern for a left semi-join between users(u) and orders(o) on user_id.
SELECT u.* FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id);
35
Write a left semi-join using IN (and name the caveat).
SELECT u.* FROM users u WHERE u.user_id IN (SELECT o.user_id FROM orders o); Caveat: NULLs in the subquery can cause unexpected behavior.
36
Write a left semi-join using LEFT JOIN (and filter) and say when DISTINCT might be needed.
SELECT DISTINCT u.* FROM users u LEFT JOIN orders o ON o.user_id = u.user_id WHERE o.user_id IS NOT NULL; Use DISTINCT if multiple matches in B would duplicate A.
37
Write the preferred SQL pattern for a left anti-join between users(u) and orders(o) on user_id.
SELECT u.* FROM users u WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id);
38
Write a left anti-join using NOT IN and state the risk.
SELECT u.* FROM users u WHERE u.user_id NOT IN (SELECT o.user_id FROM orders o); Risk: if the subquery returns any NULL, the result can be empty.
39
Write a left anti-join using LEFT JOIN (and filter).
SELECT u.* FROM users u LEFT JOIN orders o ON o.user_id = u.user_id WHERE o.user_id IS NULL;
40
Why prefer EXISTS/NOT EXISTS over IN/NOT IN for semi/anti joins?
Better NULL semantics and typically more optimizer-friendly; avoids NOT IN + NULL trap.
41
If table B has multiple matching rows per A, how many rows from A appear in a semi-join result?
One per A (semi-join acts like a filter on A).
42
You need columns from B in the result—should you use a semi-join?
No. Use an inner/left join and aggregate/limit as needed.
43
Which column should you reference in WHERE o.col IS NULL for an anti-join via LEFT JOIN?
A column from B that participates in the join (to correctly detect “no match”).
44
What index helps most for EXISTS/NOT EXISTS on orders(user_id)?
An index on orders(user_id).
45
Engine-specific: How do you write a semi-join in Spark SQL/PySpark?
dfA.join(dfB, on=..., how="left_semi")
46
Engine-specific: How do you write an anti-join in Spark SQL/PySpark?
dfA.join(dfB, on=..., how="left_anti")
47
SQL dialects (Postgres/MySQL/BigQuery/Snowflake): is there a SEMI JOIN keyword?
No; use EXISTS/NOT EXISTS or LEFT JOIN + (NOT) NULL filters.
48
Interview check: “Return users who placed at least one order last month.” Which pattern?
Left semi-join via EXISTS.
49
Interview check: “Find users who have never ordered.” Which pattern?
Left anti-join via NOT EXISTS.
50
Interview check: “Filter products that appear in a top_sellers table.” Which pattern?
Left semi-join via EXISTS (or IN if no NULLs).
51
Give a minimal EXISTS template you can adapt quickly.
SELECT a.* FROM A a WHERE EXISTS (SELECT 1 FROM B b WHERE b.key = a.key AND );
52
Give a minimal NOT EXISTS template you can adapt quickly.
SELECT a.* FROM A a WHERE NOT EXISTS (SELECT 1 FROM B b WHERE b.key = a.key AND );
53
Explain the NOT IN + NULL pitfall in one line.
If the subquery returns any NULL, x NOT IN (… NULL …) evaluates to UNKNOWN, often yielding zero rows.
54
True/False: Planners often rewrite EXISTS into a semi-join internally.
True.