Pyspark vs pandas Flashcards

(20 cards)

1
Q

Hvordan lager du en DataFrame i Pandas?

A

pd.DataFrame(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hvordan lager du en DataFrame i PySpark?

A

spark.createDataFrame(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hvordan leser du en CSV-fil i Pandas?

A

pd.read_csv(‘fil.csv’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hvordan leser du en CSV-fil i PySpark?

A

spark.read.csv(‘fil.csv’, header=True, inferSchema=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hvordan filtrerer du rader i Pandas?

A

df[df[‘kol’] > 5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hvordan filtrerer du rader i PySpark?

A

df.filter(df.kol > 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hvordan legger du til en ny kolonne i Pandas?

A

df[‘ny’] = df[‘kol’] + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hvordan legger du til en ny kolonne i PySpark?

A

df.withColumn(‘ny’, df.kol + 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hvordan grupperer du i Pandas?

A

df.groupby(‘kol’).mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hvordan grupperer du i PySpark?

A

df.groupBy(‘kol’).agg({‘kol2’: ‘mean’})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hvordan sorterer du data i Pandas?

A

df.sort_values(‘kol’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hvordan sorterer du data i PySpark?

A

df.orderBy(‘kol’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hvordan skriver du til CSV i Pandas?

A

df.to_csv(‘fil.csv’, index=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hvordan skriver du til CSV i PySpark?

A

df.write.csv(‘sti’, header=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Hvordan sjekker du data-types i Pandas?

A

df.dtypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hvordan sjekker du data-types i PySpark?

A

df.printSchema()

17
Q

Hvordan viser du de første radene i Pandas?

18
Q

Hvordan viser du de første radene i PySpark?

19
Q

Er operasjoner eager eller lazy i Pandas?

A

Eager (kjøres med én gang)

20
Q

Er operasjoner eager eller lazy i PySpark?

A

Lazy (kjøres først ved action)