SQL - Basics Flashcards
(49 cards)
When talking about relational databases, what is a fact table? what is a dim table?
A fact table contains core data for analysis.
Measures and records business events - in this case, job postings
A dim table is associated to the fact table through the use of a primary key. They usually have fewer rows of data but are generally more descriptive.
In this case - the skills table (skills_dim) or the company table (company_dim) are dimension tables.
When would you use a right join vs. a left join?
The choice between a RIGHT JOIN and a LEFT JOIN depends on which table’s data you want to prioritize in the result. A LEFT JOIN includes all rows from the left table and matching rows from the right, while a RIGHT JOIN includes all rows from the right table and matching rows from the left.
But practically speaking any right join can be re-written as a left join and vice versa.
From reddit: “queries grow over time usually and become more and more complex. . . . someone needs something “quick” and you look at a mess of code. Maybe a RIGHT JOIN will save you. . . . Generally you want to rewrite those queries if you have time . . . “
Or you might use a right join when you remember you biggest table at the end of writing your query and you don’t want to go back and change it. Fringe cases really.
When talking about relational databases, what is a dimension table?
A dimension table describes attributes or dimensions of the data.
They support filtering, labeling and grouping of facts in the fact table.
What is an ERD?
Entity Relationship Diagram - it’s a diagram that shows the fact table, and all the columns in the fact table and it also shows all the dimensional tables, and all the columns in the dimensional tables, and it has lines showing how they relate to each other.
what do the following keywords do in SQL?
SELECT
*
FROM
“select” identifies the column or columns we want to connect to and
“from” identifies the table or tables we want to connect to. and
* means “all” or “everything”. So . . .
SELECT *
FROM ellensBigFacts
would select all the columns from the table ellensBigFacts
Syntax: SELECT always comes before FROM
What does this do in SQLiteViz?
CNTRL + ENTER
It’s a short cut to run the query.
are keywords, table names and column names case sensitive?
Keywords are not case sensitive. but using ALL CAPS for keywords will make it easier to read.
table names and column names can be case sensitive depending on what database you’re using. So it’s best to just always treat them like they ARE case sensitive.
how do you select more than one column in a table?
use a comma!
SELECT big_facts_1, big_facts_2
will retrieve both the big_facts_1 AND the big_facts_2 column
how do you use the dot notation in SQL? And when is it most commonly used?
You can use dot notation to specify table and then column.
For example:
SELECT
bigfax_fact.company_info
this will select the company_info column from the bigfax_fact table
Used mainly when you are selecting columns from different tables.
What does the following keyword do?LIMIT
LIMIT limits the amount of rows that return to you
Used when you are working with large datasets
Syntax: It is the VERY LAST statement you write – so after the SELECT and FROM statements – after everything
What does the following keyword do?DISTINCT
Where is it used? What immediately preceeds it?
It retrieves unique rows in a column
- usually isn’t used for the whole query b/c it uses a lot of processing power. Usually used inside of another function.
Example:
SELECT DISTINCT job_title_column
Syntax: it goes right after SELECT
When do you use the semi-colon?
Use it after each SQL statement
WHERE
sets a condition for the query and filters out certain data. Only retrieves data that matches what you put in the WHERE condition.
Example:
SELECT *
FROM job_postings_fact
WHERE job_title_short = ‘Data Analyst’
This will select rows where the job title is “Data Analyst” and it will return all the columns in that row.
You can use =, <, > etc. You could filter all the salaries that were above a certain threshold for example.
What are the COMPARISON and LOGICAL operators that you can use with the WHERE keyword
Comparison operators:
= <> < > <= >=
Logical Operators:
AND OR BETWEEN IN NOT
NOT and <> mean the same thing - they essentially mean ‘does not equal’
WHERE job_via <> ‘via Ai-Jobs.net’ – this will filter out all the jobs from Ai-jobs.net
WHERE NOT job_via = ‘via Ai-Jobs.net’ – this will also filter out all the jobs from Ai-jobs.net
WHERE NOT job_via <> ‘via Ai-Jobs.net’ – this will actually return ONLY jobs from Ai-jobs.net because it’s a double negative. But you could say this in a better, less confusing way by just saying:
WHERE job_via = ‘via Ai-jobs.net’
How do you make a single line comment in SQL?
– this is a comment
anything to the right of the two dashes will be a comment
you can also use this to comment out
certain parts of your query when you are troubleshooting
How do you make a multi-line comment in SQL?
/* Use the forward slash and the start to make a multi-line comment in SQL*/
/* commentzz
zzzzzz */
What does this keyword do in SQL?ORDER BY
It sorts the rows that are returned.
The default sort is ascending which means smallest to largest, or A-Z.
(BTW, ‘Null’ is considered the smallest value.)
If you want it in descending order you would write DESC after the column name.
Example:
ORDER BY salary_year_avg DESC
SYNTAX: It comes almost last - the only thing that would come after this is the LIMIT keyword.
Syntax: what order do you need to write your query in????
SFWGHOL (San Fran wig hole)
SELECT column1, column2, . . .
FROM table_name
WHERE condition
GROUP BY column
HAVING condition
ORDER BY column1 (ASC, DESC)
LIMIT number
When and why would you use these operators?
AND
OR
-use AND when you want to meet more than one ccondition
-use OR when you want to meet ANY of the conditions
They are both used in the WHERE clause
For example:
WHERE job_salary > 50000 AND job_title = ‘Data Engineer’
Use OR with the WHERE keyword when you want to see data that meets ANY of the conditions.
WHERE job_salary >100000 OR job_title =’Data Engineer’
it will return all the jobs with the job title of Data Engineer, and it will return all the jobs where the salary is >100000.
You MUST have a COMPLETE statement on both sides of an AND or OR operator!
column_Name condition AND column_name condition.
You need the column name on BOTH sides of the operator - not just the condition.
This is tricky because it probably wont throw and error if you forget the column name, but your data will be WRONG!
When and why would you use this operator?
BETWEEN
Use BETWEEN with the WHERE keyword when you want to return values that are with in a certain range.
For example:
WHERE job_salary BETWEEN 60000 AND 90000
this is the same as saying
WHERE job_salary >= 60000 AND job_salary <= 90000
can also use this for text data and dates
When and why would you use this operator?
IN
Used most commonly in text data. For example, if I were looking for a job in Boston or a remote job I could run a query like this
WHERE job_location IN (‘Boston, MA’, ‘Anywhere’)
You can keep adding values - just separate them with a comma
It’s looking for that value (whatever is between the quotes) IN the job_location column
What is a wildcard? Where is it used?
It’s used to substitute one or more characters within a string.
Wildcards are used in the WHERE clause
the % and _ wildcards are used with the LIKE operator
How is the % wildcard used? Where is it used?
the % wildcard is used to substitute zero, one or more characters within a string.
% are used in the WHERE clause
% are used with the LIKE operator
% represents 0, 1 or more characters within a string.
Example:
WHERE job_title LIKE ‘%Analyst%’
– this will cover anything that has the word analyst in it. If it has characters or spaces before and after, it will still return it.
Examples:
1. %Business%Analyst% it would return data that had both ‘business’ and ‘Analyst’ even though there might be words, spaces or characters between them. It would not return data that had just ‘Analyst’
2. Similarly %Business_Analyst% would only return data where ‘business’ and ‘analyst’ were both present and there was only 1 space between them.
How is the _ wildcard used? Where is it used?
_ represents a single character– it matches any single character (letter, number, or symbol) in a string.
Used with LIKE in the WHERE clause
_ is particularly useful when you need to match a pattern where a specific character at a particular position is unknown or variable.
Example:
You want to find employees whose names start with “S” and are followed by any single character.
WHERE employee_name LIKE ‘S_’
This query will return rows where the employee_name matches the pattern “S” followed by any single character, such as “Sam” or “Sarah