minTemperatures Flashcards

1
Q

basic Spark import

A

from pyspark import SparkConf, SparkContext

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

create Spark conf (for my machine)

A

conf = SparkConf().setMaster(“local”).setAppName(“MinTemperatures”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

create Spark context

A

sc = SparkContext(conf = conf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

split delimited string into fields

A

fields = line.split(‘,’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

function syntax

A
def myFunctionName(myInput):
    ... 
    return (...myOutputTuple...)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

convert string to float

A

temperatureC = float(fields[3])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

read text file into RDD

A

lines = sc.textFile(“file:///path/filename.csv”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

map an RDD

A

parsedLines = lines.map(parseLine) where parseLine is my map function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

filter an RDD

A

minTemps = parsedLines.filter(lambda x: “TMIN” in x[1])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

map subset of fields

A

stationTemps = minTemps.map(lambda x: (x[0], x[2]))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

reduce by key with min lambda

A

minTemps = stationTemps.reduceByKey(lambda x, y: min(x,y))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

move RDD to collection

A

results = minTemps.collect()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

loop through a collection “results”

A

for result in results:

…some action…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

print value result[0]

A

print result[0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

print tab escape sequence

A

\t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

format a float “result” as a string w/ 2 decimals

A

”“.format(result)

17
Q

add an int and string value to a string in format

A

”{} {}”.format(myInt,myStr)

18
Q

split a space delimited line and take only third field (as a string)

A

line.split()[2]

19
Q

sort a dictionary

A

sortedResult = sorted(result.items())

20
Q

loop through a dictionary

A

for key, value in sortedResult:

21
Q

count values in RDD

A

result = ratings.countByValue()

22
Q

clean word to show as ascii

A

cleanWord = word.encode(“ascii”,”ignore”)

23
Q

how do you import regular expressions

A

import re

24
Q

what library use for natural language?

A

nltk - natural language tool kit

25
Q

what are regular expressions?

A

language for text processing

26
Q

split “text” by word, allow for unicode

A

re.compile(r’\W+’, re.UNICODE).split(text.lower())

27
Q

broadcast a dictionary in Spark

A

nameDict = sc.broadcast(loadMovieNames())

28
Q

retrieve value from dictionary

A

nameDict.value[keyVal]