Lesson 3 Flashcards
Primary vs Foreign Keys; DISTINCT (10 cards)
Duplicate data can happen due to a variety of reasons such as ____ or ____, but many times duplicate values are ____.
Duplicate data can happen due to a variety of reasons such as logging errors or data transformations, but many times duplicate values are there for a reason.
Very often, we need to understand the unique values in a dataset, without any dupes or repeats.
We can do this by using ____ in SQL.
It’s really useful in SQL, esp when ___
DISTINCT
- Goes before a column name
- Will only return the distinct values in that column – no duplicates
____________________________________________________________
- Really useful in SQL, especially when it comes to aggregating and counting (will get to later in the course)
DISTINCT
can be ____.
So:
- Only use it ____
- Use it with caution on ____
DISTINCT
can be taxing to run.
So:
- Only use it when necessary
- Use it with caution on large datasets
What is a primary key?
And what should you do in terms of them?
A primary key of a dataset is:
- a column of unique, non-null values that uniquely identify every single row in the table.
____________________________________________________________
- You (should) specify a primary key when you create a table in a database.
- By definition, the values in the primary key column cannot have ANY duplicates.
- It’s an actual restriction put on that column when the table is created in the database (if it’s set up correctly)
Primary keys are very useful when it comes to:
- Pulling data
- Filtering
How can you pull all the unique order_id
s from the orders
table?
(Write code)
SELECT DISTINCT order_id FROM orders
How can you pull all the unique customer_id
s from the orders
table?
(Write code)
SELECT DISTINCT customer_id FROM orders
What is a foreign key?
A foreign key is:
- a column in ONE table that maps to a primary key in ANOTHER table
- CAN have duplicate values in the table where it’s a foreign key
- But can NOT have duplicate values in the other table, where it’s a primary key
It’s a way to link the tables together
- (Will learn more with joins at the end of the course)
____________________________________________________________
A foreign key is a column that CAN have multiple duplicate values, since each value doesn’t have to be unique.
Since each customer can have multiple orders, this is called a ____ between customers and orders.
Since each customer can have multiple orders, this is called a one to many (1:M) relationship between customers and orders.
Relationships between tables and entities are really important when ____ different datasets
- Which we’ll do at the end of the course with ____
Relationships between tables and entities are really important when combining and transforming different datasets
- Which we’ll do at the end of the course with joins