Core Azure Services Flashcards

1
Q

What is Spark Context?

A

Spark Context (SparkContext) is the main entry point for Spark functionality, allowing interaction with the Spark cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What role does Spark Context play in a Spark application?

A

Spark Context sets up internal services and establishes a connection to the Spark execution environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is Spark Context initialized in a Spark application?

A

Spark Context is typically created using the SparkSession object in Spark 2.x versions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What functionalities are available through Spark Context?

A

Spark Context provides access to various functionalities like creating RDDs (Resilient Distributed Datasets), performing transformations, and executing actions on distributed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the significance of Spark Context’s stop() method?

A

The stop() method is used to shut down the Spark application, releasing resources acquired by the Spark Context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Types of spark contexts

A

SparkContext (Legacy):
In earlier versions of Spark (1.x and prior), SparkContext was the main entry point for creating RDDs and basic operations. It’s now considered the legacy way of interacting with Spark, superseded by SparkSession.
SQLContext:
SQLContext is used to work with structured data in Spark. It provides a way to interact with Spark SQL, enabling the execution of SQL queries against Spark dataframes and tables.
HiveContext:
HiveContext is an extension of SQLContext, allowing interaction with HiveQL. It enables access to Hive tables and metadata, making it convenient for those familiar with Hive.
SparkSession:
Introduced in Spark 2.0, SparkSession unifies all the different contexts (like SQLContext, HiveContext, StreamingContext) into a single entry point. It provides a unified interface to access Spark features, including DataFrame APIs, SQL, and streaming capabilities.
StreamingContext:
StreamingContext is used for creating DStreams (Discretized Streams) to process real-time streaming data. It allows the application to process streaming data in a way similar to working with RDDs.
PysparkShell Context:
This context is specific to the PySpark interactive shell. It initializes and provides the Spark execution environment for Python code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SparkSession

What is the unified entry point introduced in Spark 2.0?

A

SparkSession is the unified entry point in Spark 2.0, combining different contexts (like SQLContext) into a single interface.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What functionalities does SparkSession provide?

A

It provides access to DataFrame APIs, SQL, and streaming capabilities, unifying various Spark features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which context is used to work with structured data in Spark?

A

SQLContext is used to work with structured data in Spark, providing access to Spark SQL functionalities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does SQLContext enable in Spark?

A

It enables the execution of SQL queries against Spark DataFrames and tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main entry point for Spark functionality?

A

SparkContext (SparkContext) is the main entry point for Spark functionality, allowing interaction with the Spark cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does SparkContext primarily set up?

A

It sets up internal services and establishes a connection to the Spark execution environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the primary abstraction in Apache Spark?

A

RDD is the primary abstraction in Apache Spark, representing a collection of elements distributed across multiple nodes in a cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are RDDs resilient?

A

RDDs are resilient because they can recover lost data due to their lineage information, which allows for their reconstruction in case of failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What operations can you perform on RDDs?

A

RDDs support two types of operations: transformations (which create new RDDs from existing ones) and actions (which perform computations and return values).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A