General Flashcards

1
Q

how do you cache a dataframe

A

.persist
.cache.count
doesn’t have to be count but has to be an action that will touch every single record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can you select where your cache is stored

A

.persist(storage level)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

the default storage level for persist and cache is

A

MEMORY_AND_DISK

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do you un cache data

A

.unpresist.count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you determine the storage level of your data frame

A

.storageLevel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you register a function as a dataframe function

A

val function_udf = udf(stringConcat(_:ParamType…):ReturnType)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how do you register a function as SQL Function

A

spark.udf.register(“new_function_name”, function signature)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how can you create a table from a dataframe

A

.write.saveAsTable(“table_name”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how can you set the number of partitions for a shuffle

A

spark.conf.set(“spark.sql.shuffle.partitions”,50)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how do you get the number of partitions available in a given dataframe

A

.rdd.getNumPartitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how do you repartition a dataframe

A

.repartition(2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how can you change the number of partitions on a single node

A

.coalesce(2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

which causes a shuffle repartition or coalesce

A

repartition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you enable adaptive query execution

A

spark.conf.set(“spark.sql.adaptive.enabled”, true)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the elements of an Apache Spark Execution Hierarchy

A

Job, Stages, and Tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Adaptive Query Execution re-optimizes the query plan in the middle of the query execution based on accurate runtime statistics T/F

A

True

17
Q

With AQE, Logical optimization and physical planning is removed

A

False

18
Q

what does spark.sql.autoBroadcastJoinThreshold do

A

Configures the maximum size in bytes for a table that will broadcast to all worker nodes when performing a join

19
Q

How do you turn off dynamic partitions coalescing

A

spark.conf.set(“spark.sql.adaptive.coalescePartitions.enabled”,false)

20
Q

What allows you to control how complex types are printed on schemas

A

.printSchema(1)

21
Q

How do you set infer schema

A

.option(“inferSchema”, true)

22
Q

How do you make a dataframe into a table or a view

A

createOrReplaceTempView()