SQL Flashcards by Annie Clarnette

what code for skipping rows in the returned result?

OFFSET

SELECT column_names
FROM table_name
ORDER BY column_name
LIMIT number_of_rows
OFFSET offset_value;

(OFFSET 9, skips first 9 rows)

-used for skipping specified rows, returning chunks for exporting, focusing on data in a particular range
-always use order by first

How well did you know this?

Not at all

Perfectly

syntax for using NOT?

SELECT column1, column2, …
FROM table_name
WHERE NOT condition;

e.g.
where not country = ‘spain’

when combining with something else then put with that
e.g.
where country not in (‘’)

How well did you know this?

Not at all

Perfectly

how does between work?

inclusive
e.g.
where number between 1 and 5

How well did you know this?

Not at all

Perfectly

whats the syntax for window functions

SELECT start_terminal,
duration_seconds,
SUM(duration_seconds) OVER
(PARTITION BY start_terminal ORDER BY start_time rows between 13 preceding and current row)
AS running_total
FROM tutorial.dc_bikeshare_q1_2012

-can do without the partition and it will do over the whole dataframe, otherwise partitions over the sections and goes up in order so dont forget the order by
-the 13 means consider the current row and the previous 13 in the calculations

How well did you know this?

Not at all

Perfectly

what functions apart from the standard aggregations can be run on window functions and what do they do?

ROW_NUMBER() -no arg accepted, starts at 1 at each partition

RANK() - starts at 1, if same occurs twice gives both same number and then skips next number

DENSE_RANK() - starts at 1, if same occurs twice gives both same number and then goes to next number

PERCENT_RANK() - relative rank of the current row, rank/partition - ranges from 0 to 1 - gives percentile ranking

NTILE(# of tiles) - e.g. NTILE(4) gives quartiles, NTILE(100) gives percentiles. careful with NTILE(100) when small number of entries

LAG(column, 1) - gives entry of specified column in the row before

LAG(column 3) -gives entry of specified column 3 rows ahead

How well did you know this?

Not at all

Perfectly

how do you alias a window function when you are using multiple with the same window

SELECT start_terminal,
duration_seconds,
NTILE(4) OVER ntile_window AS quartile,
NTILE(5) OVER ntile_window AS quintile,
NTILE(100) OVER ntile_window AS percentile
FROM tutorial.dc_bikeshare_q1_2012
WHERE start_time < ‘2012-01-08’
WINDOW ntile_window AS
(PARTITION BY start_terminal ORDER BY duration_seconds)
ORDER BY start_terminal, duration_seconds

-window clause must come after where cause

How well did you know this?

Not at all

Perfectly

how do you create a ‘difference’ column? like finding the difference between a value and the value before it

SELECT start_terminal,
duration_seconds,
duration_seconds -LAG(duration_seconds, 1) OVER
(PARTITION BY start_terminal ORDER BY duration_seconds)
AS difference
FROM tutorial.dc_bikeshare_q1_2012
WHERE start_time < ‘2012-01-08’
ORDER BY start_terminal, duration_seconds

How well did you know this?

Not at all

Perfectly

Datetypes in SQL:

DATE: YYYY-MM-DD
TIME: HH:MM:SS
TIMESTAMP: YYYY-MM-DD HH:MM:SS
INTERVAL: interval between two date/time values ‘x year x month…’

How well did you know this?

Not at all

Perfectly

What functions to get current date/time?

NOW(): current date, time and Timezone
CURRENT_DATE: current date
CURRENT_TIME: current time and Timezone
CURRENT_TIMESTAMP: current date, time and Timezone

How well did you know this?

Not at all

Perfectly

how do you calculate the difference between 2 times or dates

just use minus operator
date - date
returns an interval

How well did you know this?

Not at all

Perfectly

how do you add or subtract a certain amount of time from a date or time

using interval
date + INTERVAL ‘1 day’
date - INTERVAL ‘1 month’
etc

How well did you know this?

Not at all

Perfectly

how can you extract different parts of a date?

select date_part(‘year’, date)

select extract(year from date)

-both extract a part of the date, extract can handle timezones

date_trunc(‘year’, date)
-truncates date to certain part

How well did you know this?

Not at all

Perfectly

how do you convert dates to strings and strings to dates?

date to specified string format, choose whichever string format you want:
to_char(date, ‘YY/MM/DD’)

string to specified date format. here you need to specify the format that the string is in e.g.:
to_date(‘01/02/2024’, ‘DD/MM/YYYY)

to_timestamp (same as above but includes time)

How well did you know this?

Not at all

Perfectly

when to use in and exists?

select * from …
where x in (select * from ….)
-checks all rows for the in statement

select * from …
where exists
(select 1 from … where x=x)
-checks if the row with the condition exists, stops as soon as it does - better for large datasets

How well did you know this?

Not at all

Perfectly

what are some ways of creating conditional logic and the syntaxes? ie new columns with logic conditional on other columns

select x, y,
case when
condition then result
condtion2 then result 2
else result3
end as new_column
end
from …

coalesce returns the first non-null value:
select coalesce(col1, col2, ‘newoption’) as newcolumn
from …

NULLIF returns null if two expressions are equal
select nulliff(col1, 0) from …
-can be used to prevent division by 0 error

How well did you know this?

Not at all

Perfectly

how do you use conditional logic to calculate things like counting active users, or sales from US customers

Study These Flashcards

select count(case when x = ‘active’ then 1 else 0) as active_users,
count(*) as total_users
from …

SELECT
SUM(CASE WHEN country = ‘US’ THEN revenue ELSE 0 END) AS us_revenue,
SUM(revenue) AS total_revenue
FROM sales;

what are the 2 ways to cast types in postgres

Study These Flashcards

select 123::integer
select now()::date

select cast(123 as integer)
select cast(‘hello’ as text)

when and how to pivot data?

Study These Flashcards

pivot = turn rows into columns
e.g. show sales per month as columns like ‘feb sales’ ‘march sales’

SELECT
product,
SUM(CASE WHEN month = ‘Jan’ THEN sales ELSE 0 END) AS jan_sales,
SUM(CASE WHEN month = ‘Feb’ THEN sales ELSE 0 END) AS feb_sales
FROM sales_data
GROUP BY product;

group by the column you want to pivot on, then use case when to group

how and when to unpivot?

Study These Flashcards

unpivot = turn columns into rows
e.g. turn a table with columns of KPIs into one KPI name column and one value column

SELECT product, ‘Jan’ AS month, jan_sales AS sales FROM monthly_sales
UNION ALL
SELECT product, ‘Feb’ AS month, feb_sales FROM monthly_sales;

manually create each name/value column and union all

what is normalisation and denormalisation?

Study These Flashcards

normalisation = storing data to reduce redundancy. data is split into multiple related tables. reduces duplicates - is more efficient for storage. harder to read - need joins

denormalisation = combine tables to reduce joins and make more readable. improves read performance at the cost of some redundancy

what is collation and what are the different sensitivity types?

Study These Flashcards

collation is the set of rules around how string data is stored

case sensitive/insensitive - a and A
accent sensitive/insensitive - e and é
-others for different languages

what and how to use histogram buckets

Study These Flashcards

histogram buckets use to group continuous data into buckets
-organises values into fixed width intervals and count how many fall into each one

SELECT
width_bucket(age, 0, 100, 10) AS bucket_number, – 10 buckets from 0 to 100
FROM users

width_bucket(value, min, max, num_buckets)

-returns which bucket the value belongs to. buckets are equal width and span from min to max]

e.g. segment users by age, group orders by price etc.

how to select random rows

Study These Flashcards

…
order by random ()
limit 10

how do you calculate something when its asking you say the percentage of the total dataframe. e.g. find percentage of users where a certain column = a certain value

Study These Flashcards

select
count(case when close_date = ‘2020-01-01’ then 1 end) * 100 / count(*)
as pc
from accounts

use aggegate with case when

how can you handle for nulls correctly when comparing columns

when comparing nulls to something, it will never be = or !=, its just unknown. you can use when col1 is distinct from col2 , to handle nulls. is distinct null will treat a null and something else as distinct from each other. and it will treat two nulls as not distinct from each other. good for when comparing if columns are equal or not

when comparing columns with each other, how would you ensure the accuracy of this comparison

- handle nulls correctly (is distinct from) -check data types, formats and meanings of columns -for text columns, standardise using lower(trim(col)) -for dates, make sure same timezones and format -validate assumptions: data in approprirate ranges, foreign keys lining up “I’d also document any assumptions made and clarify edge cases like NULLs, unexpected data types, or transformations, to ensure the comparison is both correct and reproducible.”

query to find strings but case insensitive

use ILIKE instead of LIKE

SQL Flashcards

(28 cards)