SQL Warehouse (Test Qs) Flashcards

1
Q

The data analyst team had put together queries that identify items that are out of stock based on orders and replenishment but when they run all together for final output the team noticed it takes a really long time, you were asked to look at the reason why queries are running slow and identify steps to improve the performance and when you looked at it you noticed all the code queries are running sequentially and using a SQL endpoint cluster. Which of the following steps can be taken to resolve the issue?

Here is the example query

— Get order summary
create or replace table orders_summary
as
select product_id, sum(order_count) order_count
from
(
select product_id,order_count from orders_instore
union all
select product_id,order_count from orders_online
)
group by product_id
– get supply summary
create or repalce tabe supply_summary
as
select product_id, sum(supply_count) supply_count
from supply
group by product_id

– get on hand based on orders summary and supply summary

with stock_cte
as (
select nvl(s.product_id,o.product_id) as product_id,
nvl(supply_count,0) - nvl(order_count,0) as on_hand
from supply_summary s
full outer join orders_summary o
on s.product_id = o.product_id
)
select *
from
stock_cte
where on_hand = 0

Turn on Auto Stop feature for the SQL endpoint

Turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to “Reliability Optimized,”

Increase the cluster size of the SQL Endpoint

Increase the maximum bound of the SQL endpoint’s scaling range.

Turn on the Serverless feature for the SQL Endpoint.

A

The answer is to increase the cluster size of the SQL Endpoint,

here queries are running sequentially and since the single query can not span more than one cluster adding more clusters won’t improve the query but rather increasing the cluster size will improve performance so it can use additional compute in a warehouse.

SQL endpoint scales horizontally(scale-out) and vertically (scale-up), you have to understand when to use what.

Scale-up-> Increase the size of the cluster from x-small to small, to medium, X Large….

If you are trying to improve the performance of a single query having additional memory, additional nodes and cpu in the cluster will improve the performance.

Scale-out -> Add more clusters, change max number of clusters

If you are trying to improve the throughput, being able to run as many queries as possible then having an additional cluster(s) will improve the performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

You had worked with the Data analysts team to set up a SQL Endpoint(SQL warehouse) point so they can easily query and analyze data in the gold layer, but once they started consuming the SQL Endpoint(SQL warehouse) you noticed that during the peak hours as the number of users increase you are seeing queries taking longer to finish, which of the following steps can be taken to resolve the issue?

*Please note Databricks recently renamed SQL endpoint to SQL warehouse.

They can turn on the Serverless feature for the SQL endpoint (SQL warehouse)

They can increase the maximum bound of the SQL endpoint (SQL warehouse)’s scaling range.

They can increase the cluster size from 2X- Small to 4X- Large of the SQL endpoint (SQL warehouse)

They can turn on the Auto Stop feature for the SQL endpoint (SQL warehouse)

They can turn on the Serverless feature for the SQL endpoint (SQL warehouse) and change the Spot Instance Policy from “Cost Optimised” to “Reliability Optimised”

A

They can increase the maximum bound of the SQL endpoint’s scaling range,

when you increase the maximum bound you can add more clusters to the warehouse which can then run additional queries that are waiting in the queue to run, focus on the below explanation that talks about Scale-out.

The question is looking to test your ability to know how to scale a SQL Endpoint(SQL Warehouse) and you have to look for cue words or need to understand if the queries are running sequentially or concurrently. if the queries are running sequentially then scale up(Size of the cluster from 2X-Small to 4X-Large) if the queries are running concurrently or with more users then scale out(add more clusters).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The research team has put together a funnel analysis query to monitor the customer traffic on the e-commerce platform, the query takes about 30 mins to run on a small SQL endpoint cluster with max scaling set to 1 cluster. What steps can be taken to improve the performance of the query?

They can turn on the Serverless feature for the SQL endpoint

They can increase the maximum bound of the SQL endpoints scaling range anywhere between 1 to 100 to revew the performance and select the size that meets the required SLA.

They can increase the cluster size from X small to 3XL to review the performance and select the size that meets the required SLA.

They can turn on the Auto Stop feature for the SQL endpoint to more than 15 mins.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy from “Cost Optimised” to “Reliability Optimised”

A

The answer is, They can increase the cluster size anywhere from 2X-Small to 4XL(Scale Up) to review the performance and select the size that meets your SLA.

If you are trying to improve the performance of a single query at a time having additional memory, additional worker nodes mean that more tasks can run in a cluster which will improve the performance of that query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following is correct for the global temporary view?

Global temporary views can be still accessed even if the notebook is detached and attached

Global temporary view cannot be accessed once the notebook is detached and attached

global temporary views can be accessed across many clusters

global temporary views can can still be accessed even if the cluster is restarted

global temporary vies are creating ins a database called temp database.

A

The answer is global temporary views can be still accessed even if the notebook is detached and attached

There are two types of temporary views that can be created Local and Global

· A local temporary view is only available with a spark session, so another notebook in the same cluster can not access it. if a notebook is detached and reattached local temporary view is lost.

· A global temporary view is available to all the notebooks in the cluster, even if the notebook is detached and reattached it can still be accessible but if a cluster is restarted the global temporary view is lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following is a true statement about the global temporary view?

a global temp view is available only o the cluster it was created, when the cluster restarts global temp view is automatically dropped

a global temp view is available on all clusters for a given workspace

a global temp view persists even if the cluster is restarted

a global temp view is stored in a user database

a global temp view is automatically dropped after 7 days.

A

The answer is, A global temporary view is available only on the cluster it was created.

Two types of temporary views can be created Session scoped and Global

A session scoped temporary view is only available with a spark session, so another notebook in the same cluster can not access it. if a notebook is detached and re attached the temporary view is lost.

A global temporary view is available to all the notebooks in the cluster, if a cluster restarts global temporary view is lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A data engineer wants to create a relational object by pulling data from two tables. The relational object must be used by other data engineers in other sessions on the same cluster only. In order to save on storage costs, the date engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

Temp view

External table

Managed table

Global Temp view

View

A

global temp view

In order to avoid copying and storing physical data, the data engineer must create a view object. A view in databricks is a virtual table that has no physical data. It’s just a saved SQL query against actual tables.

The view type should be Global Temporary view that can be accessed in other sessions on the same cluster. Global Temporary views are tied to a cluster temporary database called global_temp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly