SQL Flashcards

(28 cards)

1
Q

what code for skipping rows in the returned result?

A

OFFSET

SELECT column_names
FROM table_name
ORDER BY column_name
LIMIT number_of_rows
OFFSET offset_value;

(OFFSET 9, skips first 9 rows)

-used for skipping specified rows, returning chunks for exporting, focusing on data in a particular range
-always use order by first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

syntax for using NOT?

A

SELECT column1, column2, …
FROM table_name
WHERE NOT condition;

e.g.
where not country = ‘spain’

when combining with something else then put with that
e.g.
where country not in (‘’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how does between work?

A

inclusive
e.g.
where number between 1 and 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

whats the syntax for window functions

A

SELECT start_terminal,
duration_seconds,
SUM(duration_seconds) OVER
(PARTITION BY start_terminal ORDER BY start_time rows between 13 preceding and current row)
AS running_total
FROM tutorial.dc_bikeshare_q1_2012

-can do without the partition and it will do over the whole dataframe, otherwise partitions over the sections and goes up in order so dont forget the order by
-the 13 means consider the current row and the previous 13 in the calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what functions apart from the standard aggregations can be run on window functions and what do they do?

A

ROW_NUMBER() -no arg accepted, starts at 1 at each partition

RANK() - starts at 1, if same occurs twice gives both same number and then skips next number

DENSE_RANK() - starts at 1, if same occurs twice gives both same number and then goes to next number

PERCENT_RANK() - relative rank of the current row, rank/partition - ranges from 0 to 1 - gives percentile ranking

NTILE(# of tiles) - e.g. NTILE(4) gives quartiles, NTILE(100) gives percentiles. careful with NTILE(100) when small number of entries

LAG(column, 1) - gives entry of specified column in the row before

LAG(column 3) -gives entry of specified column 3 rows ahead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how do you alias a window function when you are using multiple with the same window

A

SELECT start_terminal,
duration_seconds,
NTILE(4) OVER ntile_window AS quartile,
NTILE(5) OVER ntile_window AS quintile,
NTILE(100) OVER ntile_window AS percentile
FROM tutorial.dc_bikeshare_q1_2012
WHERE start_time < ‘2012-01-08’
WINDOW ntile_window AS
(PARTITION BY start_terminal ORDER BY duration_seconds)
ORDER BY start_terminal, duration_seconds

-window clause must come after where cause

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how do you create a ‘difference’ column? like finding the difference between a value and the value before it

A

SELECT start_terminal,
duration_seconds,
duration_seconds -LAG(duration_seconds, 1) OVER
(PARTITION BY start_terminal ORDER BY duration_seconds)
AS difference
FROM tutorial.dc_bikeshare_q1_2012
WHERE start_time < ‘2012-01-08’
ORDER BY start_terminal, duration_seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Datetypes in SQL:

A

DATE: YYYY-MM-DD
TIME: HH:MM:SS
TIMESTAMP: YYYY-MM-DD HH:MM:SS
INTERVAL: interval between two date/time values ‘x year x month…’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What functions to get current date/time?

A

NOW(): current date, time and Timezone
CURRENT_DATE: current date
CURRENT_TIME: current time and Timezone
CURRENT_TIMESTAMP: current date, time and Timezone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how do you calculate the difference between 2 times or dates

A

just use minus operator
date - date
returns an interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how do you add or subtract a certain amount of time from a date or time

A

using interval
date + INTERVAL ‘1 day’
date - INTERVAL ‘1 month’
etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how can you extract different parts of a date?

A

select date_part(‘year’, date)

select extract(year from date)

-both extract a part of the date, extract can handle timezones

date_trunc(‘year’, date)
-truncates date to certain part

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do you convert dates to strings and strings to dates?

A

date to specified string format, choose whichever string format you want:
to_char(date, ‘YY/MM/DD’)

string to specified date format. here you need to specify the format that the string is in e.g.:
to_date(‘01/02/2024’, ‘DD/MM/YYYY)

to_timestamp (same as above but includes time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when to use in and exists?

A

select * from …
where x in (select * from ….)
-checks all rows for the in statement

select * from …
where exists
(select 1 from … where x=x)
-checks if the row with the condition exists, stops as soon as it does - better for large datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are some ways of creating conditional logic and the syntaxes? ie new columns with logic conditional on other columns

A

select x, y,
case when
condition then result
condtion2 then result 2
else result3
end as new_column
end
from …

coalesce returns the first non-null value:
select coalesce(col1, col2, ‘newoption’) as newcolumn
from …

NULLIF returns null if two expressions are equal
select nulliff(col1, 0) from …
-can be used to prevent division by 0 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do you use conditional logic to calculate things like counting active users, or sales from US customers

A

select count(case when x = ‘active’ then 1 else 0) as active_users,
count(*) as total_users
from …

SELECT
SUM(CASE WHEN country = ‘US’ THEN revenue ELSE 0 END) AS us_revenue,
SUM(revenue) AS total_revenue
FROM sales;

17
Q

what are the 2 ways to cast types in postgres

A

select 123::integer
select now()::date

OR

select cast(123 as integer)
select cast(‘hello’ as text)

18
Q

when and how to pivot data?

A

pivot = turn rows into columns
e.g. show sales per month as columns like ‘feb sales’ ‘march sales’

SELECT
product,
SUM(CASE WHEN month = ‘Jan’ THEN sales ELSE 0 END) AS jan_sales,
SUM(CASE WHEN month = ‘Feb’ THEN sales ELSE 0 END) AS feb_sales
FROM sales_data
GROUP BY product;

group by the column you want to pivot on, then use case when to group

19
Q

how and when to unpivot?

A

unpivot = turn columns into rows
e.g. turn a table with columns of KPIs into one KPI name column and one value column

SELECT product, ‘Jan’ AS month, jan_sales AS sales FROM monthly_sales
UNION ALL
SELECT product, ‘Feb’ AS month, feb_sales FROM monthly_sales;

manually create each name/value column and union all

20
Q

what is normalisation and denormalisation?

A

normalisation = storing data to reduce redundancy. data is split into multiple related tables. reduces duplicates - is more efficient for storage. harder to read - need joins

denormalisation = combine tables to reduce joins and make more readable. improves read performance at the cost of some redundancy

21
Q

what is collation and what are the different sensitivity types?

A

collation is the set of rules around how string data is stored

case sensitive/insensitive - a and A
accent sensitive/insensitive - e and é
-others for different languages

22
Q

what and how to use histogram buckets

A

histogram buckets use to group continuous data into buckets
-organises values into fixed width intervals and count how many fall into each one

SELECT
width_bucket(age, 0, 100, 10) AS bucket_number, – 10 buckets from 0 to 100
FROM users

width_bucket(value, min, max, num_buckets)

-returns which bucket the value belongs to. buckets are equal width and span from min to max]

e.g. segment users by age, group orders by price etc.

23
Q

how to select random rows

A


order by random ()
limit 10

24
Q

how do you calculate something when its asking you say the percentage of the total dataframe. e.g. find percentage of users where a certain column = a certain value

A

select
count(case when close_date = ‘2020-01-01’ then 1 end) * 100 / count(*)
as pc
from accounts

use aggegate with case when

25
how can you handle for nulls correctly when comparing columns
when comparing nulls to something, it will never be = or !=, its just unknown. you can use when col1 is distinct from col2 , to handle nulls. is distinct null will treat a null and something else as distinct from each other. and it will treat two nulls as not distinct from each other. good for when comparing if columns are equal or not
26
when comparing columns with each other, how would you ensure the accuracy of this comparison
- handle nulls correctly (is distinct from) -check data types, formats and meanings of columns -for text columns, standardise using lower(trim(col)) -for dates, make sure same timezones and format -validate assumptions: data in approprirate ranges, foreign keys lining up “I’d also document any assumptions made and clarify edge cases like NULLs, unexpected data types, or transformations, to ensure the comparison is both correct and reproducible.”
27
query to find strings but case insensitive
use ILIKE instead of LIKE
28