SQL - 04 Flashcards

1
Q

What is a self join in the context of SQL?

A

A self join is a type of SQL join where a table is joined with itself. It’s used for comparing rows within the same table to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When would you typically use a self join?

A

A self join is used when you need to compare or contrast data within the same table, such as finding pairs of rows that meet certain criteria.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you differentiate between the two instances of the same table in a self join?

A

In a self join, you use aliases to differentiate between the two instances of the same table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can you give an example of a situation where a self join would be useful?

A

A self join is useful in situations like finding employees who work in the same department, or products in the same category with similar characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a correlated subquery in SQL?

A

A correlated subquery is a subquery that refers to columns from a table in the outer query, making the subquery dependent on the outer query. It is executed repeatedly, once for each row that might be selected by the main query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does a correlated subquery differ from a non-correlated subquery?

A

Unlike a non-correlated subquery, which runs once and returns a value or set that can be used by the outer query, a correlated subquery must be re-evaluated for each row processed by the outer query because it depends on data from the outer query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In what SQL clauses can correlated subqueries appear?

A

Correlated subqueries can appear in various SQL clauses, including SELECT, WHERE, and HAVING clauses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can you give an example of a correlated subquery used in a WHERE clause?

A

Yes, for instance, to find employees whose salary is greater than the average salary in their respective departments:

SELECT employee_name, salary
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e1.department_id = e2.department_id
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the performance impact of using correlated subqueries?

A

Correlated subqueries can negatively impact performance because the subquery may be executed many times; once for each row evaluated by the outer query. Optimization or alternative querying methods may be needed for large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a common use case for correlated subqueries?

A

A common use case is to determine if a record exists in a database that meets specific conditions relative to each row processed by the outer query, such as checking each department for employees earning above a certain threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do correlated subqueries interact with the EXISTS operator?

A

Correlated subqueries are often used with the EXISTS operator to check for the existence of rows that meet certain conditions. EXISTS returns true as soon as it finds a matching row, making it efficient for this purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can correlated subqueries update data in SQL tables?

A

Yes, correlated subqueries can also be used in UPDATE and DELETE statements to specify which rows should be updated or deleted based on conditions related to data in the same table or in other tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give an example of a correlated subquery used in an UPDATE statement.

A

For example, to increase the salary of employees who earn less than the average salary in their department:
UPDATE employees e1
SET salary = salary * 1.1
WHERE salary < (
SELECT AVG(salary)
FROM employees e2
WHERE e1.department_id = e2.department_id
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the challenges of writing correlated subqueries?

A

The main challenges include ensuring the correctness of the reference to the outer query, avoiding excessive performance costs, and managing the complexity that comes with multiple dependencies and repeated evaluations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the GROUP BY clause used for in SQL?

A

The GROUP BY clause in SQL is used to arrange identical data into groups. This clause is often used with SQL aggregate functions (COUNT, MAX, MIN, SUM, AVG) to perform a calculation on each group of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you use a basic GROUP BY clause in an SQL query?

A

A basic GROUP BY clause groups rows that have the same values in specified columns into summary rows. For example, to count the number of employees in each department:

SELECT department_id, COUNT(employee_id)
FROM employees
GROUP BY department_id;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an aggregate function in the context of SQL?

A

An aggregate function performs a calculation on a set of values and returns a single value. Common aggregate functions include SUM, AVG (average), MAX (maximum), MIN (minimum), and COUNT.

18
Q

Can you combine GROUP BY with ORDER BY in a single query?

A

Yes, you can use ORDER BY to sort the results of a GROUP BY operation. The ORDER BY clause is placed after the GROUP BY clause:

SELECT department_id, COUNT(employee_id) AS employee_count
FROM employees
GROUP BY department_id
ORDER BY employee_count DESC;

19
Q

How does the HAVING clause work with GROUP BY?

A

The HAVING clause is used to filter records that work on summarized group data, unlike WHERE which works on individual records. This allows you to filter groups based on the result of an aggregate function:

SELECT department_id, COUNT(employee_id) AS employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(employee_id) > 10;

20
Q

What happens if you omit the GROUP BY clause when using an aggregate function?

A

If you omit the GROUP BY clause while using an aggregate function, SQL treats the entire table as a single group, and the aggregate function applies to all rows in the table.

21
Q

Can GROUP BY aggregate multiple columns?

A

Yes, you can group by multiple columns in SQL to get more detailed aggregation. The rows are grouped by the unique combinations of the specified columns:

SELECT department_id, job_id, AVG(salary)
FROM employees
GROUP BY department_id, job_id;

22
Q

What are some common mistakes to avoid when using GROUP BY?

A

Common mistakes include forgetting to include the GROUP BY clause when using aggregates, using non-aggregated columns in the SELECT that are not part of the GROUP BY, and misunderstanding the interaction between GROUP BY and ORDER BY.

23
Q

How can GROUP BY be used to perform complex data transformations?

A

GROUP BY can be used with different aggregate functions to transform data into summaries for reports, perform statistical analyses, or prepare data for machine learning algorithms.

24
Q

What is a practical example of using GROUP BY in business analytics?

A

In business analytics, GROUP BY is often used to aggregate sales data by region, product category, or time period to analyze trends, performance, and opportunities for improvement.

25
Q

What does the COUNT(*) function do in SQL?

A

COUNT(*) counts all rows in a table, regardless of null values in any column. It is useful when you want the total number of records.

26
Q

How does the COUNT(column_name) function treat null values?

A

COUNT(column_name) counts only the non-null values in the specified column. Rows where the column contains a null value are not included in the count.

27
Q

How can you count all rows, including those with null values in a specific column?

A

To include rows with null values in your count for a specific column, use COUNT(*) while filtering the data with a WHERE clause if needed. For example:

SELECT COUNT(*)
FROM employees
WHERE department_id IS NULL;

28
Q

How can you count rows where a certain column is not null?

A

To count only the rows where a specific column is not null, use COUNT(column_name) directly, or use COUNT(*) with a WHERE clause that filters out null values:

SELECT COUNT(*)
FROM employees
WHERE department_id IS NOT NULL;

29
Q

What is the difference between COUNT(1) and COUNT(column_name)?

A

COUNT(1) is a method that effectively behaves like COUNT(*), counting all rows regardless of nulls because 1 as an expression always exists. COUNT(column_name), however, counts only non-null entries in the specified column.

30
Q

Can you use COUNT() with multiple columns?

A

While COUNT() typically works with a single column, if you need to consider multiple columns, you might use it in conjunction with CASE statements or other functions to handle specific counting logic across multiple fields.

31
Q

How does using COUNT(DISTINCT column_name) affect counting with nulls?

A

COUNT(DISTINCT column_name) counts unique non-null values only. Null values are ignored because null is not considered a distinct value in SQL.

32
Q

What SQL function can you use to explicitly check for null values in a count?

A

You can use COUNT with a CASE statement to conditionally count rows based on null status in a column. For example:

SELECT COUNT(CASE WHEN department_id IS NULL THEN 1 ELSE NULL END) AS null_count
FROM employees;

33
Q

How can aggregation functions other than COUNT be used to handle null values?

A

Functions like SUM and AVG automatically ignore null values. To handle nulls explicitly, you may use COALESCE to set a default value when nulls are encountered.

34
Q

What is a practical application of counting null values in a database?

A

Counting null values can help in data quality checks, such as identifying incomplete entries in a dataset, which is crucial for maintaining the accuracy and reliability of data analysis and reporting.

35
Q

How do SUM and COUNT handle null values?

A

SUM ignores null values in its calculation, adding only the non-null values. COUNT(column_name), on the other hand, counts only non-null entries in the specified column, while COUNT(*) counts all rows, including those with null values.

36
Q

Can SUM and COUNT be used together in a query?

A

Yes, SUM and COUNT can be used together in a query to provide both a total and a count of rows in a single SQL statement, often in conjunction with GROUP BY to segment the aggregated results.

SELECT department_id, COUNT(employee_id) AS number_of_employees, SUM(salary) AS total_salaries
FROM employees
GROUP BY department_id;

37
Q

What types of data can SUM and COUNT be applied to?

A

SUM is applied to numeric data fields, as it performs arithmetic operations. COUNT can be applied to any type of data field, as it simply totals the number of items in a set, including distinct values if specified.

38
Q

How might you use SUM and COUNT to analyze data differently?

A

SUM is useful for financial calculations, such as total sales or expenses, while COUNT is essential for quantifying records, like the number of transactions or the number of customers in different regions.

39
Q

What is a practical example of using both SUM and COUNT to get a meaningful business insight?

A

A practical example could be calculating the average sales per transaction in a store:

SELECT SUM(total_sale) / COUNT(transaction_id) AS average_sale
FROM sales_transactions;
This uses SUM to calculate the total sales and COUNT to determine the number of transactions, with the result giving the average sale amount per transaction.

40
Q

What are the limitations of using SUM?

A

The main limitation of using SUM is the potential for overflow with very large numbers. Additionally, SUM is only applicable to numerical data and ignores null values, which might affect the total calculation if not accounted for properly.

41
Q

How do performance considerations differ between SUM and COUNT?

A

Performance considerations for SUM and COUNT generally relate to the size of the dataset and the presence of indexes. COUNT(*) is usually fast on indexed columns but can be slower on large datasets without indexes. SUM may require more processing power due to arithmetic calculations, especially if the dataset is large and the operation is not indexed.

42
Q

How can SUM and COUNT affect the interpretation of data in reports?

A

In reporting, using SUM helps in understanding the total impact of numeric values, such as total revenue. COUNT helps in understanding volume and frequency, such as the number of sales. Misinterpreting these can lead to incorrect conclusions about data trends or business health.