Pandas Flashcards

1
Q

Explain what a series is in Pandas?

A

A 1d indexed array object containing elements of one type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How would you create a series from a list, dictionary? How do you specify the index?

A

pd. Series([‘a’,’b’,’c’], index=[1,11,111])

pd. Series({1 : ‘a’, 11 : ‘b’, 111 : ‘c’})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you get the values, index, shape and data type from a series?

A

s. values
s. index
s. shape
s. dtype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the underlying data structure of a Pandas Series?

A

numpy arrays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between a DataFrame and a Series?

A

A dataframe contains multiple series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Construct a DataFrame from a list, series, list of lists, list of dictionaries, dictionary of list values, dictionary of series

A

pd.DataFrame(…
[‘a’,’b’,’c’]
[[‘a’, 1, True], [‘b’, 2, False]]
[{‘a’ : 1, ‘b’ : 2}, {‘a’ : 11, ‘b’ : 22}]
{‘a’ : [1,2,3], ‘b’ : [True, True, False]}
{‘a’ : s1, ‘b’ : s2}
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you get the index, columns, values, shape from a DataFrame?

A

df. index
df. columns
df. index
df. values
df. shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How would you create a pandas index object?

A

pd.Index([1,2,3])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between .loc and .iloc?

A

.loc[ : . : ] indexed by index and column values, slicing is inclusive
.iloc[ : , : ] indexed by position in df.index/df.columns, slicing is exclusive upper.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you do Boolean Indexing?

A

df[mask]

where mask is a series of bools with same shape as a df.column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you select just the desired columns in pandas?

A

df[list of cols]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a ufunc? Why should you use them?

A

Universal functions can be efficiently performed on arrays (index-aligned operations). Should be used where possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is ‘broadcasting’ in pandas?

A

series + 3 is broadcast to series + pd.Series([3,3,…])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you apply a function element-wise to a series or DataFrame?

A

df.apply(func/lambda)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Give two ways to handle missing data, explain when you might use either

A

df. fillna(value, method=’bfill’/’ffill’)
df. fillna(df.interpolate()/df.mean())
df. dropna(axis=0 or 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give two ways of combining DataFrames, how do you specify if they go side by side, or on top of each other?

A

pd.concat(list_of_dfs, axis = 0 (rows keep shape) or 1 (columns keep shape))

17
Q

What arguments does the df.merge() function take?

A

df1, df2, left_on = col1 or left_index = True , right (same), how = ‘inner’/’left’…

18
Q

Explain what is meant by Split-Apply-Combine?

A

This is the way Pandas goes about groupby. First split data into instances of column, apply an aggregation on each group to get a single row, recombine to get aggregated data.

19
Q

How would you group the DataFrame based on a column and apply multiple aggregation functions to it?

A

df.groupby(column).agg({col1 : agg1, …})

20
Q

How would you implement the ‘Having’ statement from SQL in Pandas?

A

df.groupby(…).filter(lambda x : bool)

21
Q

How can we join aggregate statistics back to our DataFrame?

A

df. loc[‘new_col’, …] = df.groupby().agg().loc[…] or

df. groupby()[col].transform(agg)

22
Q

How is a pivot table useful? How would you implement it?

A

Useful for displaynig two group by aggregations in one table. df.pivot_table(index=, columns=, values = , aggfunc=…)

23
Q

How do you change the index of a DataFrame?

A

df.set_index(…, inplace=True)

24
Q

How do you change the sampling frequency of the data?

A

df.resample(period).agg()

25
Q

How would you create a 30 day rolling average of a DataFrame?

A

df[…].rolling(30).mean()

26
Q

Give examples of other Time series methods

A

.shift(1)

.diff(1)