SQL - 05: Window functions Flashcards

1
Q

What is the WHERE clause used for in SQL?

A

The WHERE clause is used to filter records before any groupings are made. It restricts which rows are included in the result set based on specific conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the HAVING clause used for in SQL?

A

The HAVING clause is used to filter groups after aggregations have been performed. It is used in conjunction with the GROUP BY clause to filter results based on an aggregate condition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do WHERE and HAVING clauses differ in their execution?

A

The WHERE clause filters rows before grouping and aggregation, reducing the number of rows that need to be processed. The HAVING clause filters after grouping and aggregation, applying conditions to the grouped data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can you use a HAVING clause without a GROUP BY clause?

A

Yes, the HAVING clause can be used without a GROUP BY clause to filter rows based on an aggregate function directly. However, this is less common and typically it is used with GROUP BY.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give an example of a query using both WHERE and HAVING clauses.

A

SELECT department_id, AVG(salary) AS average_salary
FROM employees
WHERE department_id IS NOT NULL
GROUP BY department_id
HAVING AVG(salary) > 50000;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why can’t you use column aliases in WHERE clauses but can use them in HAVING clauses?

A

Column aliases are not recognized in WHERE clauses because WHERE is processed before column aliases have been assigned, which happens during the select statement. HAVING is processed after the select phase, so it recognizes column aliases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What types of conditions can you specify in HAVING and WHERE clauses?

A

WHERE clauses can contain conditions on any of the individual table columns, including checks for values, ranges, and patterns. HAVING clauses can contain conditions on aggregate functions like SUM, AVG, COUNT, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the presence of WHERE and HAVING clauses affect the performance of SQL queries?

A

Using WHERE clauses can significantly improve the performance of SQL queries by reducing the number of rows processed in the early stages of query execution. HAVING clauses are applied after aggregation and can also enhance performance by filtering groups but might be less impactful since the data has already been processed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a practical use case for combining WHERE and HAVING clauses?

A

A practical use case is when you need to analyze specific subsets of data, such as finding departments in certain locations (filtered by WHERE) that have an average salary above a certain threshold (filtered by HAVING).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a window function in SQL?

A

A window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions, window functions do not cause rows to become grouped into a single output row — the rows retain their separate identities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is a window function different from regular aggregate functions?

A

Window functions allow you to perform calculations across sets of rows without collapsing these rows into a single output row, whereas aggregate functions collapse rows. Window functions also preserve the detail of each row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some common types of window functions?

A

Common types include:
ROW_NUMBER(): Assigns a unique number to each row starting from 1.
RANK(): Assigns a rank to each row within a partition of a result set, with gaps in rank values if there are ties.
DENSE_RANK(): Similar to RANK, but without gaps in the ranking values.
SUM(), AVG(), MIN(), MAX(): Perform sum, average, minimum, and maximum calculations over a set of rows, respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the syntax for using a window function?

A

The syntax for using a window function generally looks like this:

SELECT column_name, WINDOW_FUNCTION() OVER (PARTITION BY column_name ORDER BY column_name)
FROM table_name;
The PARTITION BY clause divides the result set into partitions to which the window function is applied. ORDER BY dictates the order within each partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of the PARTITION BY clause in window functions?

A

The PARTITION BY clause is used to divide the result set into partitions where the window function is applied independently. This is similar to the role of GROUP BY in aggregate functions but without collapsing the rows into a single output row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Can window functions be used in UPDATE statements?

A

No, window functions cannot be directly used in UPDATE statements. They are typically used in SELECT statements to calculate values over a window of rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give an example of a window function used to calculate running totals.

A

SELECT order_id, order_date, amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you handle ties in ranking functions like RANK and DENSE_RANK?

A

RANK gives the same rank to tied values but leaves gaps in the ranking for subsequent values. DENSE_RANK also gives the same rank to ties but does not leave gaps, ensuring ranks are consecutive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the LEAD and LAG window functions used for?

A

LEAD and LAG are used to access data from the next row (LEAD) or the previous row (LAG) in the result set, without using a self-join. They are useful for comparing current row values with values in other rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Can window functions be nested within other window functions?

A

No, window functions cannot be nested within other window functions. Each window function can be used only in the select list or ORDER BY clause of the OVER clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a window frame in SQL?

A

A window frame is a subset of rows in a partition of a result set that a window function operates over. It defines the range of rows used to perform the calculation for each row in the partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you specify a window frame in SQL?

A

A window frame is specified using the ROWS or RANGE clause within the OVER clause of a window function. For example:

SUM(column_name) OVER (PARTITION BY column_name ORDER BY column_name
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the difference between ROWS and RANGE in defining a window frame?

A

ROWS specifies a window frame in terms of physical rows within the partition, while RANGE defines the frame in terms of logical ranges of values, which might include rows with equal values. RANGE is often dependent on the ORDER BY clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING mean?

A

This defines a window frame that includes the row before the current row, the current row itself, and the row after the current row for each row in the partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Can window frames overlap in SQL?

A

Yes, window frames can overlap. Each row in a partition might calculate its result using a window frame that overlaps with frames of adjacent rows, especially when using PRECEDING and FOLLOWING keywords.

25
Q

What does ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW mean?

A

This specifies a window frame that starts from the first row of the partition and extends to the current row, inclusively. This is useful for running totals or cumulative calculations.

26
Q

How do you use RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING?

A

This frame includes all rows in the partition, allowing the window function to compute a value based on all rows in the partition for each row. It’s useful for calculations that depend on the entire set, such as percentages or proportions relative to totals.

27
Q

What is a practical example of using window frames in a business analysis context?

A

A practical example might be calculating a moving average of sales over a specific period, like the last three months, to smooth out fluctuations and better observe trends:

AVG(sales) OVER (PARTITION BY product_id ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)

28
Q

What happens if you do not explicitly define a window frame?

A

If no window frame is explicitly defined, the default window frame, which includes all rows from the start of the partition to the current row (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), is used.

29
Q

Are there any performance considerations when using window frames?

A

Yes, using window frames can impact performance, especially with large datasets and complex window frames that include many rows. Indexing the columns used in the ORDER BY clause of the window function can help improve performance.

30
Q

What is the LAG function in SQL?

A

The LAG function in SQL is a window function that allows you to access data from a previous row in the same result set, relative to the current row. It is useful for comparing current values with values from previous rows.

31
Q

What is the LEAD function in SQL?

A

The LEAD function is a window function that allows you to access data from a following row in the same result set, relative to the current row. It is used to compare current values with those of upcoming rows.

32
Q

How do you specify the number of rows to look back or ahead using LAG and LEAD?

A

Both LAG and LEAD take an optional argument that specifies the number of rows to look back or ahead. If not specified, the default is 1. For example:

LAG(column_name, 1) OVER (ORDER BY column_name) – looks one row back
LEAD(column_name, 1) OVER (ORDER BY column_name) – looks one row ahead

33
Q

What happens if there is no previous or next row when using LAG or LEAD?

A

If LAG or LEAD tries to access a row that does not exist, such as when the current row is the first or last row of the dataset, the result is NULL, unless a default value is specified as the third argument in the function.

34
Q

Can you provide an example of using LAG to calculate the difference in sales between two consecutive months?

A

Yes, to calculate the month-over-month sales difference, you might use:

SELECT month, sales, LAG(sales) OVER (ORDER BY month) AS previous_month_sales,
sales - LAG(sales) OVER (ORDER BY month) AS difference
FROM sales_data;

35
Q

Can you provide an example of using LEAD to forecast next month’s sales?

A

Sure, to forecast next month’s sales, you might use:

SELECT month, sales, LEAD(sales) OVER (ORDER BY month) AS next_month_forecast
FROM sales_data;

36
Q

What types of analyses are LAG and LEAD particularly useful for?

A

LAG and LEAD are particularly useful for time series analysis, financial analysis for computing gains or losses over time, and any analysis that requires comparing sequence data such as customer purchase behavior over consecutive visits.

37
Q

How do LAG and LEAD contribute to data integrity in reporting?

A

By allowing direct access to rows that precede or follow without complex joins, LAG and LEAD help maintain data integrity and simplify SQL queries, reducing the risk of errors in data transformation and analysis.

38
Q

What is a practical application of LAG and LEAD in a business context?

A

In a business context, LAG might be used to compare this quarter’s revenue with last quarter’s, while LEAD could be used to prepare for inventory needs based on expected future sales, as indicated by trends.

39
Q

Are there any performance considerations when using LAG and LEAD?

A

While LAG and LEAD are generally efficient, their performance can be impacted by the size of the dataset and the complexity of the partitioning and ordering clauses. Proper indexing can help improve performance.

40
Q

What is the FIRST_VALUE() function in SQL?

A

FIRST_VALUE() is a window function that returns the first value in an ordered set of values within the partition. It’s often used to compare or retrieve the first entry of a specified column over a defined range or the entire partition.

41
Q

What is the NTH_VALUE() function in SQL?

A

NTH_VALUE() is a window function that returns the value of a specified column at the nth position within an ordered partition of the result set. It allows for more flexibility than FIRST_VALUE() by letting you specify which position to pull from the ordered set.

42
Q

How do you use FIRST_VALUE() in a query with a window frame?

A

To use FIRST_VALUE(), you specify the column from which to pull the first value, followed by the OVER() clause where you can define partitioning and ordering. For example:

SELECT employee_id, department_id, salary,
FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS highest_salary
FROM employees;

43
Q

How do you specify NTH_VALUE() in a SQL query?

A

Use NTH_VALUE() by specifying the column, the nth position, and the OVER() clause with partitioning and ordering. Here’s an example:

SELECT employee_id, department_id, salary,
NTH_VALUE(salary, 2) OVER (PARTITION BY department_id ORDER BY salary DESC) AS second_highest_salary
FROM employees;

44
Q

What are the syntax options for window frames with FIRST_VALUE() and NTH_VALUE()?

A

You can define a window frame using the ROWS or RANGE clauses within the OVER() clause. For example:

NTH_VALUE(salary, 3) OVER (
PARTITION BY department_id
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS third_highest_salary

45
Q

What is the default window frame for FIRST_VALUE() and NTH_VALUE()?

A

The default window frame when using FIRST_VALUE() and NTH_VALUE() is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which considers all rows from the start of the partition to the current row.

46
Q

How do you handle ties or duplicates with FIRST_VALUE()?

A

When ties or duplicates occur, FIRST_VALUE() returns the first occurrence of the value based on the order specified in the OVER() clause. The function does not differentiate between tied values.

47
Q

Can NTH_VALUE() handle cases where n is larger than the number of rows in the partition?

A

If n exceeds the number of rows in the partition, NTH_VALUE() returns NULL unless otherwise specified with a default value as the third parameter in the function.

48
Q

What is a practical use case for combining FIRST_VALUE() and NTH_VALUE()?

A

In financial analysis, you might use FIRST_VALUE() to retrieve the earliest transaction and NTH_VALUE() to check a specific subsequent transaction within each customer’s activity log, providing insights into spending behavior over time.

49
Q

What are performance considerations when using FIRST_VALUE() and NTH_VALUE()?

A

Both functions can be resource-intensive when used over large datasets or complex partitions. Efficient use of indexing on the order and partition columns can help mitigate performance issues.

50
Q

Can you give an example of a simple SQL aggregation without GROUP BY?

A

Certainly! Here’s an example that calculates the total salary of all employees in a company:

SELECT SUM(salary) AS total_salary FROM employees;

51
Q

What happens when you use an aggregate function without a GROUP BY clause in SQL?

A

Using an aggregate function like SUM, AVG, MIN, MAX, or COUNT without a GROUP BY clause aggregates over the entire result set, returning a single value.

52
Q

How can you use aggregate functions without GROUP BY in a query with a WHERE clause?

A

Aggregate functions can be used with a WHERE clause to aggregate only over filtered data. For example, to find the average salary of employees in a specific department:

SELECT AVG(salary) AS average_salary FROM employees WHERE department_id = 5;

53
Q

What is a scalar subquery, and how does it relate to aggregation without GROUP BY?

A

A scalar subquery is a subquery that returns exactly one row and one column. It often uses aggregation without GROUP BY to return a single value that can be used in a larger query’s SELECT, WHERE, or other clauses.

54
Q

Can window functions be used to perform aggregation without GROUP BY?

A

Yes, window functions can perform aggregations across sets of rows related to the current row without collapsing the result set into a single row, effectively allowing aggregate calculations without GROUP BY.

55
Q

Give an example of using a window function to aggregate without GROUP BY.

A

Here’s an example that calculates the total salary across all employees alongside individual salaries:

SELECT salary, SUM(salary) OVER () AS total_salary FROM employees;

56
Q

What is the use of aggregate functions without GROUP BY in conditional expressions?

A

Aggregate functions can be used in conditional expressions to create metrics based on specific criteria. For example, using CASE statements within SUM or AVG to selectively aggregate data based on certain conditions.

57
Q

How does using DISTINCT with aggregate functions affect the results?

A

Using DISTINCT within an aggregate function, like SUM(DISTINCT column_name), calculates the aggregate based only on unique values of the specified column across the entire dataset.

58
Q

What is an example of using aggregate functions in the HAVING clause without GROUP BY?

A

Technically, HAVING should be used with GROUP BY. However, you can manipulate this by using a dummy group, like:

SELECT COUNT() FROM employees HAVING COUNT() > 5;
This example uses a meaningless condition just to illustrate the concept but isn’t practical since HAVING without GROUP BY isn’t standard usage.

59
Q

How can aggregate functions be used for data validation?

A

Aggregate functions without GROUP BY can be used to validate data, for example, checking the integrity of data by comparing counts or sums against expected values or thresholds.