SQL Flashcards

1
Q

SQL is pronounced

A

Spark Sequel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Spark SQL extends RDDs to a

A

“DataFrame” object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DataFrames contain _____ objects

A

row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DataFrames have a ____, which leads to more efficient storage

A

schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DataFrames can run _____ queries

A

SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

parquet is a

A

popular column data store object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Spark SQL can read and write

A

Hive, JSON, parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

import

A

from pyspark.sql import SQLContext, Row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

To use SQL first thing you do is create a

A

Hive context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

create a Hive context

A

hiveContext = HiveContext(sc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

get Hive data from JSON

A

inputData = hiveContext.jsonFile(dataFile)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

JSON is pronounced

A

Jay Sahn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

infer schema from inputData

A

inputData.registerTempTable(“myStructuredStuff”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

run a query and make a DataFrame

A

myResultDataFrame = hiveContext.sql(‘”“‘SELECT foo FROM bar ORDER BY footer’””’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

alternative to HiveContext

A

SQLContext

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Difference between HiveContext and SQLContext

A

Hive on top of SQL. Hive compatibility. Hive has heavier dependencies, but also a bit ahead of SQL at least in 1.5.