Section 3 - Data Exploration with SQL Flashcards
Learn fundamentals for SQL (21 cards)
What is Normalization?
In simple terms, data normalization is the practice of organizing data entries to ensure they appear similar across all fields and records, making information easier to find, group and analyze
Primary Keys vs. Foreign Keys
Primary keys are unique to a table and distinguish each data point. Foreign keys give reference to another related table. Data can be propagated from one table to another using foreign keys.
3 levels of Normalization. NF - Normalization Forms
1NF - Removes repetative entities.
2NF - Creates a separate table for sets of values dependent on different keys in composite primary keys.
3NF - Eliminates fields that do not depend on the primary key.
Calculate Death Rate for a Specific Country
Task: Write a SQL query to calculate the death rate (as a percentage) for the United States, defined as (total_deaths / total_cases) * 100. Select Location, total_deaths, total_cases and the calculated death rate for the latest date in the dataset. Round the death rate to 2 decimal places.
SELECT Location, date, total_deaths, total_cases, ROUND(total_deaths*100/total_cases,2) AS DeathRate
FROM CovidDeaths
WHERE Location =’Afghanistan’ AND date = (SELECT MAX(date) FROM CovidDeaths);
List Countries with High Infection Rates above 5 %
Task: Write a SQL query to find countries with an infection rate (TotalCases / Population * 100) greater than 5% as of the latest date. Select Location, total_cases, population, and the infection rate, sorted in descending order of infection rate. Round the infection rate to 2 decimal places.
SELECT Location, date, total_cases, population, ROUND(total_cases100/population,2) AS infection_rate
FROM CovidDeaths
WHERE date = (SELECT MAX(date) FROM CovidDeaths) AND (total_cases100/population) > 5
ORDER BY infection_rate DESC
Compare New Cases Across Two Dates
Task: Write a SQL query to select Country, DateReported, and NewCases for all countries on two specific dates, e.g., ‘2021-01-01’ and ‘2021-06-01’. Arrange the results by Country in ascending order and DateReported in ascending order.
SELECT Country, DateReported, NewCases
FROM CovidData
WHERE DateReported IN (‘2021-01-01’, ‘2021-06-01’)
ORDER BY Country ASC, DateReported ASC;
What is the meaning of VARCHR(n)
Variable character. String of characters. N is no.of letters in string.
VARCHR(255) is not related to 255 bit storage. It aligns with byte-based storage limits (1 byte - 255)
What is 1NF in database normalization
All attributes have atomic (indivisible) values.
Each row is unique (typically with a primary key).
No repeating groups or multi-valued attributes.
Example: Split a column with “Math, Physics” into separate rows.
What is 2NF in database normalization?
All non-key attributes are fully dependent on the entire primary key (no partial dependencies).
Applies to composite keys.
Example: Move StudentName (dependent only on StudentID) to a separate Students table.
What is 3NF in database normalization?
No non-key attribute depends on another non-key attribute (no transitive dependencies).
Example: Move InstructorOffice (dependent on Instructor) to a separate Instructors table.
What is the order of SQL statements?
SELECT -> FROM -> WHERE -> GROUP BY -> ORDER BY
What is the code for system stored procedure on getting information about a database?
EXEC sp-help CovidDeaths
Why will the below code give an error?
SELECT SUM(new_cases) as total_cases, SUM(cast(new_deaths as int)) as total_deaths, (total_deaths/total_cases)*100 as DeathPercentage
total_deaths is neither an actual column present in the database nor a column calculated by an aggregate function that is provided in a GROUP BY clause.
Will SUM function ignore NULL values?
YES
How to write comments in SQL?
– Single Line
/* Multiple
Lines */
Write an SQL program to check for duplicate rows with the same date and location in the CovidVaccinations table.
Get full row with duplicates -
SELECT *
FROM Test.dbo.CovidVaccinations cv
WHERE EXISTS (
SELECT 1
FROM Test.dbo.CovidVaccinations cv2
WHERE cv2.location = cv.location
AND cv2.date = cv.date
GROUP BY location, date
HAVING COUNT(*) > 1
)
ORDER BY location, date;
Only check if multiple rows are present -
SELECT location, date, COUNT() as row_count
FROM Test.dbo.CovidVaccinations
WHERE continent IS NOT NULL
GROUP BY location, date
HAVING COUNT() > 1;
Write an SQL program to delete duplicate rows from CovidVaccinations. The rows have the same location and date.
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY location, date ORDER BY (SELECT NULL)) as rn
FROM Test.dbo.CovidVaccinations
)
DELETE FROM CTE
WHERE rn > 1;
What is CTE, and how are they used?
CTE (Common Table Expression) is a temporary table result that exists only inside a query.
They are used for readability. CTEs help you recall a result multiple times in the same query as ‘CTE’
Eg -
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY location, date ORDER BY (SELECT NULL)) as rn
FROM Test.dbo.CovidVaccinations
)
DELETE FROM CTE
WHERE rn > 1;
Write a skeletal code for creating and inserting values in a temporary table
In what feature in SQL is the ORDER BY statement invalid?
View Table
What is a view? Why is it useful?
A view is a virtual table in the database. It is generated by other tables in the database.
It is used to restrict users from viewing certain data from databases.