SQL - Basics Flashcards

(49 cards)

1
Q

When talking about relational databases, what is a fact table? what is a dim table?

A

A fact table contains core data for analysis.
Measures and records business events - in this case, job postings
A dim table is associated to the fact table through the use of a primary key. They usually have fewer rows of data but are generally more descriptive.
In this case - the skills table (skills_dim) or the company table (company_dim) are dimension tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When would you use a right join vs. a left join?

A

The choice between a RIGHT JOIN and a LEFT JOIN depends on which table’s data you want to prioritize in the result. A LEFT JOIN includes all rows from the left table and matching rows from the right, while a RIGHT JOIN includes all rows from the right table and matching rows from the left.

But practically speaking any right join can be re-written as a left join and vice versa.

From reddit: “queries grow over time usually and become more and more complex. . . . someone needs something “quick” and you look at a mess of code. Maybe a RIGHT JOIN will save you. . . . Generally you want to rewrite those queries if you have time . . . “

Or you might use a right join when you remember you biggest table at the end of writing your query and you don’t want to go back and change it. Fringe cases really.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When talking about relational databases, what is a dimension table?

A

A dimension table describes attributes or dimensions of the data.
They support filtering, labeling and grouping of facts in the fact table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an ERD?

A

Entity Relationship Diagram - it’s a diagram that shows the fact table, and all the columns in the fact table and it also shows all the dimensional tables, and all the columns in the dimensional tables, and it has lines showing how they relate to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what do the following keywords do in SQL?
SELECT
*
FROM

A

“select” identifies the column or columns we want to connect to and
“from” identifies the table or tables we want to connect to. and
* means “all” or “everything”. So . . .
SELECT *
FROM ellensBigFacts
would select all the columns from the table ellensBigFacts

Syntax: SELECT always comes before FROM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does this do in SQLiteViz?
CNTRL + ENTER

A

It’s a short cut to run the query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

are keywords, table names and column names case sensitive?

A

Keywords are not case sensitive. but using ALL CAPS for keywords will make it easier to read.

table names and column names can be case sensitive depending on what database you’re using. So it’s best to just always treat them like they ARE case sensitive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how do you select more than one column in a table?

A

use a comma!

SELECT big_facts_1, big_facts_2

will retrieve both the big_facts_1 AND the big_facts_2 column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do you use the dot notation in SQL? And when is it most commonly used?

A

You can use dot notation to specify table and then column.
For example:

SELECT
bigfax_fact.company_info

this will select the company_info column from the bigfax_fact table

Used mainly when you are selecting columns from different tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the following keyword do?LIMIT

A

LIMIT limits the amount of rows that return to you
Used when you are working with large datasets

Syntax: It is the VERY LAST statement you write – so after the SELECT and FROM statements – after everything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the following keyword do?DISTINCT
Where is it used? What immediately preceeds it?

A

It retrieves unique rows in a column
- usually isn’t used for the whole query b/c it uses a lot of processing power. Usually used inside of another function.

Example:
SELECT DISTINCT job_title_column

Syntax: it goes right after SELECT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When do you use the semi-colon?

A

Use it after each SQL statement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

WHERE

A

sets a condition for the query and filters out certain data. Only retrieves data that matches what you put in the WHERE condition.

Example:
SELECT *
FROM job_postings_fact
WHERE job_title_short = ‘Data Analyst’

This will select rows where the job title is “Data Analyst” and it will return all the columns in that row.

You can use =, <, > etc. You could filter all the salaries that were above a certain threshold for example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the COMPARISON and LOGICAL operators that you can use with the WHERE keyword

A

Comparison operators:
= <> < > <= >=
Logical Operators:
AND OR BETWEEN IN NOT

NOT and <> mean the same thing - they essentially mean ‘does not equal’

WHERE job_via <> ‘via Ai-Jobs.net’ – this will filter out all the jobs from Ai-jobs.net

WHERE NOT job_via = ‘via Ai-Jobs.net’ – this will also filter out all the jobs from Ai-jobs.net

WHERE NOT job_via <> ‘via Ai-Jobs.net’ – this will actually return ONLY jobs from Ai-jobs.net because it’s a double negative. But you could say this in a better, less confusing way by just saying:
WHERE job_via = ‘via Ai-jobs.net’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you make a single line comment in SQL?

A

– this is a comment
anything to the right of the two dashes will be a comment

you can also use this to comment out
certain parts of your query when you are troubleshooting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you make a multi-line comment in SQL?

A

/* Use the forward slash and the start to make a multi-line comment in SQL*/

/* commentzz
zzzzzz */

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does this keyword do in SQL?ORDER BY

A

It sorts the rows that are returned.

The default sort is ascending which means smallest to largest, or A-Z.
(BTW, ‘Null’ is considered the smallest value.)
If you want it in descending order you would write DESC after the column name.
Example:
ORDER BY salary_year_avg DESC

SYNTAX: It comes almost last - the only thing that would come after this is the LIMIT keyword.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Syntax: what order do you need to write your query in????

A

SFWGHOL (San Fran wig hole)

SELECT column1, column2, . . .
FROM table_name
WHERE condition
GROUP BY column
HAVING condition
ORDER BY column1 (ASC, DESC)
LIMIT number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When and why would you use these operators?
AND
OR

A

-use AND when you want to meet more than one ccondition
-use OR when you want to meet ANY of the conditions

They are both used in the WHERE clause

For example:
WHERE job_salary > 50000 AND job_title = ‘Data Engineer’

Use OR with the WHERE keyword when you want to see data that meets ANY of the conditions.
WHERE job_salary >100000 OR job_title =’Data Engineer’
it will return all the jobs with the job title of Data Engineer, and it will return all the jobs where the salary is >100000.

You MUST have a COMPLETE statement on both sides of an AND or OR operator!
column_Name condition AND column_name condition.
You need the column name on BOTH sides of the operator - not just the condition.

This is tricky because it probably wont throw and error if you forget the column name, but your data will be WRONG!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When and why would you use this operator?
BETWEEN

A

Use BETWEEN with the WHERE keyword when you want to return values that are with in a certain range.
For example:
WHERE job_salary BETWEEN 60000 AND 90000
this is the same as saying
WHERE job_salary >= 60000 AND job_salary <= 90000

can also use this for text data and dates

21
Q

When and why would you use this operator?
IN

A

Used most commonly in text data. For example, if I were looking for a job in Boston or a remote job I could run a query like this

WHERE job_location IN (‘Boston, MA’, ‘Anywhere’)

You can keep adding values - just separate them with a comma

It’s looking for that value (whatever is between the quotes) IN the job_location column

22
Q

What is a wildcard? Where is it used?

A

It’s used to substitute one or more characters within a string.

Wildcards are used in the WHERE clause

the % and _ wildcards are used with the LIKE operator

23
Q

How is the % wildcard used? Where is it used?

A

the % wildcard is used to substitute zero, one or more characters within a string.

% are used in the WHERE clause

% are used with the LIKE operator

% represents 0, 1 or more characters within a string.

Example:
WHERE job_title LIKE ‘%Analyst%’
– this will cover anything that has the word analyst in it. If it has characters or spaces before and after, it will still return it.
Examples:
1. %Business%Analyst% it would return data that had both ‘business’ and ‘Analyst’ even though there might be words, spaces or characters between them. It would not return data that had just ‘Analyst’
2. Similarly %Business_Analyst% would only return data where ‘business’ and ‘analyst’ were both present and there was only 1 space between them.

24
Q

How is the _ wildcard used? Where is it used?

A

_ represents a single character– it matches any single character (letter, number, or symbol) in a string.

Used with LIKE in the WHERE clause
_ is particularly useful when you need to match a pattern where a specific character at a particular position is unknown or variable.

Example:
You want to find employees whose names start with “S” and are followed by any single character.

WHERE employee_name LIKE ‘S_’

This query will return rows where the employee_name matches the pattern “S” followed by any single character, such as “Sam” or “Sarah

25
What is an alias? How is it used? AS
You can use AS to rename a column in a table temporarily. It can be useful to make a column name more readable. Use AS in the SELECT clause Example: SELECT salary_year_avg AS salary Can be used for columns or tables Renaming a table is very common when you're doing joins. *You can leave out the AS! and just leave a space and it will still work, but it's harder to read. Mostly just be aware of it because you might see it in other people's queries. You can use AS to return the entire column several times! You could say SELECT hours_rate AS rate_original, hour_rate - 5 AS dropped_rate
26
whats important to remember with these two operators? - (subtract) and / (divide) ?
- subtracts the SECOND number from the first 9-8=1 / divides the FIRST number by the SECOND 10/2 = 5
27
As and operator, what does this % do and what is it called?
It's called Modulus It returns the remainder of a division operation.
28
Where can you perform arithmetic operations? Like what clauses can you perform them in?
SELECT clause WHERE clause ORDER BY clause GROUP BY clause HAVING clause
29
How would you do arithmetic operations in the SELECT clause?
SELECT hours_rate AS rate_original, hours_rate - 5 AS rate_dropped, hours_rate +5 AS rate_hike arithmatic is done on every value in the column and it's returned AS a new column.
30
How would you do arithmetic operations in the WHERE clause?
SELECT * FROM orders WHERE (quantity + unit_price) > 100 quantity and unit_price would be columns in the table
31
How would you do arithmetic operations in the ORDER BY clause?
SELECT (current_date - birth_date)/ 365 AS age_years FROM people_table ORDER BY (current_date - birth_day)/365 If you're going to do arithmetic in the ORDER BY clause, you (probably) need to return that arithmetic operation AS a new column in the select statement.
32
How would you do arithmetic operations in the GROUP BY and HAVING clause?
SELECT department, SUM(salary) FROM employee_table GROUP BY department HAVING SUM(salary) > 100000
33
Modulus operator
SELECT hours_spent % 8 AS extra_hours If an employee worked 10 hours, 10 % 8 would give 2. Which means the employee worked 2 hours beyond the standard work day.
34
what are the 5 aggregation functions?
SUM() - adds all the values in a column COUNT() - counts all the rows that meet a certain condition AVG() - calculates the average of a column with numeric data MIN() - finds the minimum value in a set a values MAX() - finds the largest value in a set a values
35
the main difference between arithmetic functions vs. aggregation function
arithmetic function allow you to perform arithmetic across columns (perform arithmetic on several columns at once) while aggregation function allow you to perform arithmetic within a column (or down a column) for example - take the average of this column or find the smallest value in this column etc.
36
What clauses are aggregation functions usually used with?
They can be used with SELECT HAVING GROUP BY
37
How do you use the SUM() function
SUM (column_name) will return the sum of every value in the column
38
How do you use the COUNT() function
COUNT(*) counts the rows of every column in the database You can use COUNT in conjunction with other keywords and operators. COUNT (DISTINCT column_name) -- this counts the unique rows or (ibid) counts the rows that match the specified criteria. You can count how many distinct companies there are in the database or you can count how many distinct work locations are in the database
39
Can you return all the values in one column and then return an aggregate value for another column
Seems like you can't mix Aggregate functions with other things. Like you can't have one column return all the values and the count column only return the count value
40
How do you use the AVG(), MIN() and MAX() functions
They all work in a similar way. You just put a column name in the parenthesis. AVG(column_name) MIN(column_name) etc.
41
How does the GROUP BY statement work? And what is a KEY POINT to remember?
GROUP BY automatically finds the unique values in that column and groups things by those unique values. or in other words: the GROUP BY clause is used to group rows with the same values into summary rows. Think of it like categorizing data based on a common attribute. -- Once the data is grouped, you can use aggregate functions (like SUM(), COUNT(), AVG(), MAX(), MIN()) to perform calculations on the data within each group. Result: The GROUP BY clause returns a single row for each group, along with the results of any aggregate functions applied to that group. EXAMPLE: -- find the total amount of orders for each customer: SELECT CustomerID, SUM(Order_Amount) AS TotalOrderAmount FROM Orders_table GROUP BY CustomerID; This query would group the orders by CustomerID and then calculate the sum of the Amount for each customer, displaying the CustomerID and their TotalOrderAmount. Key Points: Any column in the SELECT statement that is not included in an aggregate function MUSt be included in the GROUP BY clause.
42
How does the HAVING clause work in sql
--the HAVING clause is used to filter grouped data based on the results of aggregate functions. 1. use the GROUP BY clause to group rows with identical values. This allows you to apply aggregate functions (like SUM(), COUNT(), AVG(), MIN(), MAX()) to each group. 2. After the GROUP BY operation has created the groups and the aggregate functions have been calculated, the HAVING clause filters these groups based on a specified condition that typically involves the results of the aggregate functions. --Think of it as a WHERE clause specifically designed for groups created by the GROUP BY clause. Key Differences from the WHERE clause: 1. WHERE filters individual rows before grouping and aggregation, while HAVING filters groups after grouping and aggregation. 2. Aggregate Functions: You cannot use aggregate functions directly in the WHERE clause. The HAVING clause is specifically designed for this purpose. Example: "orders" table with columns "product_id" and "quantity." You want to find products with a total quantity ordered exceeding 100: SELECT product_id, SUM(quantity) as total_quantity FROM orders GROUP BY product_id HAVING SUM(quantity) > 100; It's weird because it seems like you're doing the SUM first, but really it's doing the SUM after it groups that data by product_id.
43
What are the 4 most common types of joins?
Left Join Right Join inner join full outer join
44
When would you use a left join? What is an example of a Left Join?
A LEFT JOIN (aka LEFT OUTER JOIN) is used to combine rows from two or more tables based on a related column between them. Syntax: SELECT column_list FROM table1 (this is the Left table) LEFT JOIN table2 (this is the Right table) ON table1.column = table2.column; (this is the Joining condition) It returns all rows from the left table (the table specified first in the FROM clause). It returns matching rows from the right table (the table specified in the LEFT JOIN clause). Key Points -- If there's no matching row in the right table for a row in the left table, the result will still include the row from the left table, but the columns from the right table will have NULL values. -- the LEFT JOIN keyword indicates that you want to perform a left join. -- The ON clause specifies the condition for joining the tables, typically based on a common column. -- The order of the tables in the FROM and LEFT JOIN clauses is crucial, as it determines which table is considered the "left" table.
45
When would you use a right join? what is an example of a Right Join?
A RIGHT JOIN (aka RIGHT OUTER JOIN) returns all rows from the right table and the matching records from the left table. -- For each row in the right table, it looks for matching rows in the left table based on the join condition you provide. -- If a match is found, the data from both tables is combined into a single row in the result set. -- If no match is found in the left table, the row will contain NULL values in the result. In essence, a RIGHT JOIN guarantees that all records from the right table are included in the result, even if they don't have a corresponding match in the left table. Syntax: SELECT column_list FROM table1 (this is the left table) RIGHT JOIN table2 (-- the right table) ON table1.column = table2.column; if the right table has less rows than the left table, you will still get as many rows returned as are in the left table.
46
When would you use an inner join? What is an example of an inner join?
INNER JOIN is used to combine rows from two or more tables based on a related column that exists in both tables. The INNER JOIN retrieves only the rows where the join condition is true (i.e., where there is a match in both tables). Think of it as finding the intersection between two sets of data. Example: If you have a table of customers and a table of orders, an INNER JOIN on CustomerID would return only the customers who have placed orders. Customers without orders would be excluded from the result set. Syntax: SELECT column_names FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name; -- Combines matching rows: It only returns rows where the specified join condition is met in both tables. -- Excludes non-matching rows: Rows that do not have a corresponding match in the other table are not included in the result. -- Default join type: In many database systems, JOIN is equivalent to INNER JOIN.
47
When would you use a Full Outer Join? What is an example of a full out Join?
FULL OUTER JOIN (also known as a FULL JOIN) combines rows from two tables. It returns all rows from both the left table and the right table, regardless of whether there is a match based on the join condition. -- it Includes all rows: It returns all records from both tables, even if there's no corresponding row in the other table. If a row from one table does not have a match in the other table the unmatched table will have NULL values in the result. Syntax: SELECT column_list FROM table1 FULL OUTER JOIN table2 ON table1.column = table2.column; When to use a FULL OUTER JOIN: A FULL OUTER JOIN is suitable for combining data while retaining all records from both tables, which is helpful when data is incomplete or missing.
48
How is "Order of Execution" different from the order in which you write the query?
"Order of Execution" is the order in which the database performs the query. It uses a specific order to optimize the query and make it run quickly and logically.
49
What is the "Order of Execution"?
1. From/Join - Specifies the tables to retrieve data from and how to join them 2. Where - Filters rows based on a condition 3. Group By - Group rows that share a property so that aggregate functions can be applied to the group 4. Having - filters GROUPS based on aggregate conditions (that you just did in GROUP BY) 5. Select - select specific columns to display in the final result 6. Distinct - removes duplicate rows from the result set 7. Order By - sorts the result set based on specific values 8. Limit/Offset - limits the # of rows returned often used for pagination.