SQL Advanced Flashcards

(42 cards)

1
Q

What does this query do?

SELECT companies.permalink,
companies.founded_at_clean,
acquisitions.acquired_at_cleaned,
acquisitions.acquired_at_cleaned -
companies.founded_at_clean::timestamp AS time_to_acquisition
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does this query do?

SELECT companies.permalink,
companies.founded_at_clean,
companies.founded_at_clean::timestamp +
INTERVAL ‘1 week’ AS plus_one_week
FROM tutorial.crunchbase_companies_clean_date companies
WHERE founded_at_clean IS NOT NULL

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interval

A

The interval is defined using plain-English terms like ‘10 seconds’ or ‘5 months’. Also note that adding or subtracting a date column and an interval column results in another date column as in the above query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NOW()

A

get the current server time

SELECT companies.permalink, companies.founded_at_clean, NOW() - companies.founded_at_clean::timestamp AS founded_time_ago FROM tutorial.crunchbase_companies_clean_date companies WHERE founded_at_clean IS NOT NULL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Write a query that counts the number of companies acquired within 3 years, 5 years, and 10 years of being founded (in 3 separate columns). Include a column for total companies acquired as well. Group by category and limit to only rows with a founding date.

A

SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL ‘3 years’
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL ‘5 years’
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL ‘10 years’
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY 5 DESC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

LEFT

A

You can use LEFT to pull a certain number of characters from the left side of a string and present them as a separate string. The syntax is LEFT(string, number of characters).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SELECT incidnt_num,
date,
LEFT(date, 10) AS cleaned_date
FROM tutorial.sf_crime_incidents_2014_01

What will this do with this database?

A

It will take the first 10 characters from the date column and move them to cleaned_date a string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

RIGHT

A

same as left but works from the right side

SELECT incidnt_num, date, LEFT(date, 10) AS cleaned_date, RIGHT(date, 17) AS cleaned_time FROM tutorial.sf_crime_incidents_2014_01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

LENGTH()

A

returns the length of a string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SELECT incidnt_num, date, LEFT(date, 10) AS cleaned_date, RIGHT(date, LENGTH(date) - 11) AS cleaned_time FROM tutorial.sf_crime_incidents_2014_01

What will this produce with this database?

A

inner functions are always evaluated first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

TRIM

A

The TRIM function takes 3 arguments. First, you have to specify whether you want to remove characters from the beginning (‘leading’), the end (‘trailing’), or both (‘both’, as used above). Next you must specify all characters to be trimmed. Any characters included in the single quotes will be removed from both beginning, end, or both sides of the string. Finally, you must specify the text you want to trim using FROM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SELECT location, TRIM(both ‘()’ FROM location) FROM tutorial.sf_crime_incidents_2014_01

What will this do in this database?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

POSITION

A

POSITION allows you to specify a substring, then returns a numerical value equal to the character number (counting from left) where that substring first appears in the target string

POSITION(‘A’ IN position)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SELECT incidnt_num,
descript,
POSITION(‘A’ IN descript) AS a_position
FROM tutorial.sf_crime_incidents_2014_01

What will this do in this database?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SUBSTR

A

SUBSTR(*string*, *starting character position*, *# of characters*):

removes strings at a particular location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SELECT incidnt_num, date,

SUBSTR(date, 4, 2) AS day

FROM tutorial.sf_crime_incidents_2014_01

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Write a query that separates the location field into separate fields for latitude and longitude.

A

SELECT location,

TRIM(leading ‘(‘ FROM LEFT(location, POSITION(‘,’ IN location) - 1)) AS lattitude,

TRIM(trailing ‘)’ FROM RIGHT(location, LENGTH(location) - POSITION(‘,’ IN location) ) )

AS longitude

FROM tutorial.sf_crime_incidents_2014_01

18
Q

CONCAT

A

Simply order the values you want to concatenate and separate them with commas. If you want to hard-code values, enclose them in single quotes.

19
Q

SELECT incidnt_num, day_of_week,

LEFT(date, 10) AS cleaned_date, CONCAT(day_of_week, ‘, ‘, LEFT(date, 10)) AS day_and_date

FROM tutorial.sf_crime_incidents_2014_01

What will this produce?

20
Q

Concatenate the lat and lon fields to form a field that is equivalent to the location field. (Note that the answer will have a different decimal precision.)

A

SELECT lat,
lon,
location,
CONCAT(‘(‘,lat,’, ‘,lon,’)’) AS concat_loc
FROM tutorial.sf_crime_incidents_2014_01

21
Q

Create the same concatenated location field from lat and lon, but using the || syntax instead of CONCAT.

A

SELECT lat,
lon,
location,
‘(‘ || lat || ‘,’ || lon || ‘)’ AS concat_loc
FROM tutorial.sf_crime_incidents_2014_01

22
Q

Write a query that creates a date column formatted YYYY-MM-DD.

A

SELECT date,
CONCAT(SUBSTR(date, 7, 4), ‘-‘, SUBSTR(date, 1, 2), ‘-‘, SUBSTR(date, 4, 2)) AS con_date
FROM tutorial.sf_crime_incidents_2014_01

23
Q

Changing case with UPPER and LOWER

A

SELECT incidnt_num, address, UPPER(address) AS address_upper, LOWER(address) AS address_lower FROM tutorial.sf_crime_incidents_2014_01

24
Q

Write a query that returns the category field, but with the first letter capitalized and the rest of the letters in lower-case.

A

SELECT incidnt_num, category, UPPER(LEFT(category, 1)) || LOWER(RIGHT(category, LENGTH(category) - 1)) AS category_cleaned FROM tutorial.sf_crime_incidents_2014_01

25
Write a query that creates an accurate timestamp using the date and time columns in tutorial.sf\_crime\_incidents\_2014\_01. Include a field that is exactly 1 week later as well.
SELECT date, time, CONCAT(SUBSTR(date,7,4),'-',SUBSTR(date,1,2),'-',SUBSTR(date,4,2), ' ', time, ':00')::timestamp AS date\_time, CONCAT(SUBSTR(date,7,4),'-',SUBSTR(date,1,2),'-',SUBSTR(date,4,2), ' ', time, ':00')::timestamp + INTERVAL '1 week' AS date\_time\_week FROM tutorial.sf\_crime\_incidents\_2014\_01
26
EXTRACT
SELECT cleaned\_date, EXTRACT('year' FROM cleaned\_date) AS year, EXTRACT('month' FROM cleaned\_date) AS month, EXTRACT('day' FROM cleaned\_date) AS day, EXTRACT('hour' FROM cleaned\_date) AS hour, EXTRACT('minute' FROM cleaned\_date) AS minute, EXTRACT('second' FROM cleaned\_date) AS second, EXTRACT('decade' FROM cleaned\_date) AS decade, EXTRACT('dow' FROM cleaned\_date) AS day\_of\_week FROM tutorial.sf\_crime\_incidents\_cleandate
27
DATE\_TRUNC
SELECT cleaned\_date, DATE\_TRUNC('year' , cleaned\_date) AS year, DATE\_TRUNC('month' , cleaned\_date) AS month, DATE\_TRUNC('week' , cleaned\_date) AS week, DATE\_TRUNC('day' , cleaned\_date) AS day, DATE\_TRUNC('hour' , cleaned\_date) AS hour, DATE\_TRUNC('minute' , cleaned\_date) AS minute, DATE\_TRUNC('second' , cleaned\_date) AS second, DATE\_TRUNC('decade' , cleaned\_date) AS decade FROM tutorial.sf\_crime\_incidents\_cleandate
28
Write a query that counts the number of incidents reported by week. Cast the week as a date to get rid of the hours/minutes/seconds.
SELECT DATE\_TRUNC('week', cleaned\_date)::date AS week\_beginning, COUNT(\*) AS incidents FROM tutorial.sf\_crime\_incidents\_cleandate GROUP BY 1 ORDER BY 1
29
COALESCE
In cases like this, you can use COALESCE to replace the null values SELECT incidnt\_num, descript, COALESCE(descript, 'No Description') FROM tutorial.sf\_crime\_incidents\_cleandate ORDER BY descript DESC
30
SELECT COUNT(incidnt\_num), COUNT(descript), COUNT(COALESCE(descript, 'No Description')) FROM tutorial.sf\_crime\_incidents\_cleandate
31
subqueries
Subqueries (also known as inner queries or nested queries) are a tool for performing operations in multiple steps. For example, if you wanted to take the sums of several columns, then average all of those values, you'd need to do each aggregation in a distinct step.
32
Write a query that selects all Warrant Arrests from the tutorial.sf\_crime\_incidents\_2014\_01 dataset, then wrap it in an outer query that only displays unresolved incidents.
SELECT warrents.\* FROM ( SELECT \* FROM tutorial.sf\_crime\_incidents\_2014\_01 WHERE descript = 'WARRANT ARREST') warrents WHERE warrents.resolution = 'NONE'
33
What if you wanted to figure out how many incidents get reported on each day of the week? Better yet, what if you wanted to know how many incidents happen, on average, on a Friday in December? In January?
There are two steps to this process: counting the number of incidents each day (inner query), then determining the monthly average (outer query): SELECT LEFT(sub.date, 2) AS cleaned\_month, sub.day\_of\_week, AVG(sub.incidents) AS average\_incidents FROM ( SELECT day\_of\_week, date, COUNT(incidnt\_num) AS incidents FROM tutorial.sf\_crime\_incidents\_2014\_01 GROUP BY 1,2 ) sub GROUP BY 1,2 ORDER BY 1,2 Step 1 count the number of incidents for each day and day of week. Step two then take the average for each month and day of week from the counts
34
Write a query that displays the average number of monthly incidents for each category. Hint: use tutorial.sf\_crime\_incidents\_cleandate to make your life a little easier.
SELECT sub.category, AVG(sub.incidents) AS avg\_incidents\_per\_month FROM ( SELECT EXTRACT('month' FROM cleaned\_date) AS month, category, COUNT(1) AS incidents FROM tutorial.sf\_crime\_incidents\_cleandate GROUP BY 1,2 ) sub GROUP BY 1
35
Subqueries in conditional logic
ou can use subqueries in conditional logic (in conjunction with WHERE, JOIN/ON, or CASE). SELECT \* FROM tutorial.sf\_crime\_incidents\_2014\_01 WHERE Date = (SELECT MIN(date) FROM tutorial.sf\_crime\_incidents\_2014\_01 ) The following query returns all of the entries from the earliest date in the dataset
36
SELECT \* FROM tutorial.sf\_crime\_incidents\_2014\_01 WHERE Date IN (SELECT date FROM tutorial.sf\_crime\_incidents\_2014\_01 ORDER BY date LIMIT 5 )
NO ALIAS WHEN subqueries are in conditionals IN allows you to use subqueries with multiple return values. This will return the top 5 dates
37
SELECT incidents.\*, sub.incidents AS incidents\_that\_day FROM tutorial.sf\_crime\_incidents\_2014\_01 incidents JOIN ( SELECT date, COUNT(incidnt\_num) AS incidents FROM tutorial.sf\_crime\_incidents\_2014\_01 GROUP BY 1 ) sub ON incidents.date = sub.date ORDER BY sub.incidents DESC, time
The following query ranks all of the results according to how many incidents were reported in a given day. It does this by aggregating the total number of incidents each day in the inner query, then using those values to sort the outer query:
38
Write a query that displays all rows from the three categories with the fewest incidents reported.
SELECT incidents.\*, sub.count AS total\_incidents\_in\_category FROM tutorial.sf\_crime\_incidents\_2014\_01 incidents JOIN ( SELECT category, COUNT(\*) AS count FROM tutorial.sf\_crime\_incidents\_2014\_01 GROUP BY 1 ORDER BY 2 LIMIT 3 ) sub ON sub.category = incidents.category
39
Write a query that counts the number of companies founded and acquired by quarter starting in Q1 2012. Create the aggregations in two separate queries, then join them. USING acquisitions and investments
SELECT COALESCE(companies.quarter, acquisitions.quarter) AS quarter, companies.companies\_founded, acquisitions.companies\_acquired FROM ( SELECT founded\_quarter AS quarter, COUNT(permalink) AS companies\_founded FROM tutorial.crunchbase\_companies WHERE founded\_year \>= 2012 GROUP BY 1 ) companies LEFT JOIN ( SELECT acquired\_quarter AS quarter, COUNT(DISTINCT company\_permalink) AS companies\_acquired FROM tutorial.crunchbase\_acquisitions WHERE acquired\_year \>= 2012 GROUP BY 1 ) acquisitions ON companies.quarter = acquisitions.quarter ORDER BY 1
40
EXPLAIN SELECT \* FROM benn.sample\_event\_table WHERE event\_date \>= '2014-03-01' AND event\_date \< '2014-04-01' LIMIT 100
The entry at the bottom of the list is executed first. So this shows that the WHERE clause, which limits the date range, will be executed first. Then, the database will scan 600 rows (this is an approximate number). You can see the cost listed next to the number of rows—higher numbers mean longer run time.
41
SELECT teams.conference AS conference, players.year, COUNT(1) AS players FROM benn.college\_football\_players players JOIN benn.college\_football\_teams teams ON teams.school\_name = players.school\_name GROUP BY 1,2 ORDER BY 1,2 Pivot this to give the total players, the nuber of players per year by conference
SELECT conference, SUM(players) AS total\_players, SUM(CASE WHEN year = 'FR' THEN players ELSE NULL END) AS fr, SUM(CASE WHEN year = 'SO' THEN players ELSE NULL END) AS so, SUM(CASE WHEN year = 'JR' THEN players ELSE NULL END) AS jr, SUM(CASE WHEN year = 'SR' THEN players ELSE NULL END) AS sr FROM ( SELECT teams.conference AS conference, players.year, COUNT(1) AS players FROM benn.college\_football\_players players JOIN benn.college\_football\_teams teams ON teams.school\_name = players.school\_name GROUP BY 1,2 ) sub GROUP BY 1 ORDER BY 2 DESC
42