Pandas Groupby Flashcards

1
Q

What is the purpose of the pandas.DataFrame.groupby method?

A

Groupby is used to split the data into groups based on some criteria. It involves a combination of splitting the data, applying a function, and combining the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the ‘by’ parameter in pandas.DataFrame.groupby do?

A

The ‘by’ parameter is used to determine the groups for the groupby operation. It can be a mapping, function, label, pd.Grouper or a list of such.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the ‘axis’ parameter in pandas.DataFrame.groupby work?

A

The ‘axis’ parameter determines if the grouping is to be done along rows (0 or ‘index’) or columns (1 or ‘columns’).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the ‘level’ parameter in pandas.DataFrame.groupby do?

A

The ‘level’ parameter is used when the axis is a MultiIndex (hierarchical). It specifies the level(s) to be grouped by.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does the ‘as_index’ parameter in pandas.DataFrame.groupby work?

A

The ‘as_index’ parameter, when set to True, returns an object with group labels as the index for aggregated output. It’s only relevant for DataFrame input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does the ‘sort’ parameter in pandas.DataFrame.groupby work?

A

The ‘sort’ parameter, when set to True, sorts the group keys. Turning this off might result in better performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the ‘group_keys’ parameter in pandas.DataFrame.groupby work?

A

The ‘group_keys’ parameter, when set to True, adds group keys to index to identify pieces when calling apply. It’s not included when the result’s index (and column) labels match the inputs, and is included otherwise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the ‘observed’ parameter in pandas.DataFrame.groupby work?

A

The ‘observed’ parameter applies only if any of the groupers are Categoricals. If True, only observed values for categorical groupers are shown. If False, all values for categorical groupers are shown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the ‘dropna’ parameter in pandas.DataFrame.groupby do?

A

The ‘dropna’ parameter, when set to True, drops NA values and the corresponding row/column if group keys contain NA values. If False, NA values will be treated as the key in groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does pandas.DataFrame.groupby return?

A

pandas.DataFrame.groupby returns a DataFrameGroupBy object that contains information about the groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the pd.Grouper in pandas.DataFrame.groupby?

A

pd.Grouper is a class in pandas that allows more flexible groupby instructions. It can be used with the ‘by’ parameter in pandas.DataFrame.groupby.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to group data by a single label in pandas?

A

To group data by a single label, pass the label as a string to the ‘by’ parameter of the pandas.DataFrame.groupby method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to group data by multiple labels in pandas?

A

To group data by multiple labels, pass the labels as a list to the ‘by’ parameter of the pandas.DataFrame.groupby method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the ‘as_index=False’ option in pandas.DataFrame.groupby do?

A

as_index=False’ in pandas.DataFrame.groupby provides a SQL-style grouped output where group labels are not set as the index for the aggregated output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to use a function with the ‘by’ parameter in pandas.DataFrame.groupby?

A

To use a function with the ‘by’ parameter in pandas.DataFrame.groupby, pass the function which will be called on each value of the object’s index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the ‘sort=False’ option in pandas.DataFrame.groupby do?

A

Setting ‘sort=False’ in pandas.DataFrame.groupby can improve performance by not sorting group keys. It does not influence the order of observations within each group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to use a pd.Grouper with the ‘by’ parameter in pandas.DataFrame.groupby?

A

To use a pd.Grouper with the ‘by’ parameter in pandas.DataFrame.groupby, pass the pd.Grouper specifying any additional parameters like ‘key’, ‘level’, or ‘freq’ as needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to group by a level in a MultiIndex DataFrame in pandas?

A

To group by a level in a MultiIndex DataFrame in pandas, pass the level (as integer or level name) to the ‘level’ parameter of the pandas.DataFrame.groupby method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the ‘group_keys=False’ option in pandas.DataFrame.groupby do?

A

Setting ‘group_keys=False’ in pandas.DataFrame.groupby will not add group keys to the index when calling apply.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the ‘observed=True’ option in pandas.DataFrame.groupby do?

A

Setting ‘observed=True’ in pandas.DataFrame.groupby will show only observed values for categorical groupers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to group by multiple levels in a MultiIndex DataFrame in pandas?

A

To group by multiple levels in a MultiIndex DataFrame in pandas, pass the levels (as a list of integers or level names) to the ‘level’ parameter of the pandas.DataFrame.groupby method.

22
Q

How to use a mapping with the ‘by’ parameter in pandas.DataFrame.groupby?

A

To use a mapping with the ‘by’ parameter in pandas.DataFrame.groupby, pass the mapping (dict or Series). The mapping’s values will be used to determine the groups.

23
Q

What does the ‘dropna=False’ option in pandas.DataFrame.groupby do?

A

Setting ‘dropna=False’ in pandas.DataFrame.groupby will treat NA values as the key in groups.

24
Q

How to use a list or ndarray with the ‘by’ parameter in pandas.DataFrame.groupby?

A

To use a list or ndarray with the ‘by’ parameter in pandas.DataFrame.groupby, pass the list or ndarray of length equal to the selected axis. The values are used as-is to determine the groups.

25
Q

How to use a label or list of labels with the ‘by’ parameter in pandas.DataFrame.groupby?

A

To use a label or list of labels with the ‘by’ parameter in pandas.DataFrame.groupby, pass the label(s). They are used to group by the columns in self.

26
Q

How to iterate through groups in pandas DataFrameGroupBy object?

A

To iterate through groups in a pandas DataFrameGroupBy object, use the ‘for’ loop. Each iteration will return a tuple where the first item is the group key and the second item is the group data.

27
Q

How to select a group in pandas DataFrameGroupBy object?

A

To select a group in a pandas DataFrameGroupBy object, use the ‘get_group’ method with the group key as argument.

28
Q

How to apply a function to each group in pandas DataFrameGroupBy object?

A

To apply a function to each group in a pandas DataFrameGroupBy object, use the ‘apply’ method with the function as argument.

29
Q

How to compute aggregate functions on groups in pandas DataFrameGroupBy object?

A

To compute aggregate functions on groups in a pandas DataFrameGroupBy object, use methods like ‘sum’, ‘mean’, ‘max’, ‘min’, etc.

30
Q

How does pandas handle categorical data with the ‘groupby’ method?

A

In pandas, if the ‘groupby’ method includes categorical data, it handles them based on the ‘observed’ parameter. If ‘observed’ is True, it shows only observed values. If ‘observed’ is False, it shows all values.

31
Q

What happens when both ‘by’ and ‘level’ parameters are specified in pandas.DataFrame.groupby?

A

In pandas.DataFrame.groupby, ‘by’ and ‘level’ parameters should not be specified together. Doing so will raise an error.

32
Q

What happens when ‘as_index’ is set to False in pandas.DataFrame.groupby?

A

When ‘as_index’ is set to False in pandas.DataFrame.groupby, the group labels will not be used as the index of the resulting DataFrame. Instead, a default integer index will be used.

33
Q

How does ‘group_keys’ parameter affect the result of ‘apply’ method in pandas.DataFrame.groupby?

A

The ‘group_keys’ parameter in pandas.DataFrame.groupby, when set to True, adds group keys to index when the ‘apply’ method is called. It helps in identifying pieces.

34
Q

What is the default value of ‘axis’ parameter in pandas.DataFrame.groupby?

A

The default value of ‘axis’ parameter in pandas.DataFrame.groupby is 0 or ‘index’, which means the operation is performed along rows.

35
Q

What is the default value of ‘as_index’ parameter in pandas.DataFrame.groupby?

A

The default value of ‘as_index’ parameter in pandas.DataFrame.groupby is True, which means the group labels will be set as the index of the resulting DataFrame.

36
Q

What is the default value of ‘sort’ parameter in pandas.DataFrame.groupby?

A

The default value of ‘sort’ parameter in pandas.DataFrame.groupby is True, which means the group keys will be sorted.

37
Q

What is the default value of ‘group_keys’ parameter in pandas.DataFrame.groupby?

A

The default value of ‘group_keys’ parameter in pandas.DataFrame.groupby is True, which means group keys will be added to index when calling ‘apply’.

38
Q

What is the default value of ‘observed’ parameter in pandas.DataFrame.groupby?

A

The default value of ‘observed’ parameter in pandas.DataFrame.groupby is False, which means all values for categorical groupers will be shown.

39
Q

What is the default value of ‘dropna’ parameter in pandas.DataFrame.groupby?

A

The default value of ‘dropna’ parameter in pandas.DataFrame.groupby is True, which means NA values and the corresponding row/column will be dropped if group keys contain NA values.

40
Q

What happens if ‘by’ parameter in pandas.DataFrame.groupby is a function?

A

If ‘by’ parameter in pandas.DataFrame.groupby is a function, it’s called on each value of the object’s index.

41
Q

What happens if ‘by’ parameter in pandas.DataFrame.groupby is a dict or Series?

A

If ‘by’ parameter in pandas.DataFrame.groupby is a dict or Series, the Series or dict VALUES will be used to determine the groups.

42
Q

What happens if ‘by’ parameter in pandas.DataFrame.groupby is a list or ndarray?

A

If ‘by’ parameter in pandas.DataFrame.groupby is a list or ndarray, the values are used as-is to determine the groups.

43
Q

What happens if ‘by’ parameter in pandas.DataFrame.groupby is a label or list of labels?

A

If ‘by’ parameter in pandas.DataFrame.groupby is a label or list of labels, they are used to group by the columns in self.

44
Q

How is ‘sort’ parameter in pandas.DataFrame.groupby related to performance?

A

Turning off the ‘sort’ parameter in pandas.DataFrame.groupby (i.e., setting ‘sort’ to False) can improve performance by avoiding the sort operation on group keys.

45
Q

What does ‘group_keys’ parameter in pandas.DataFrame.groupby affect when calling ‘apply’?

A

The ‘group_keys’ parameter in pandas.DataFrame.groupby affects whether group keys are added to index when calling ‘apply’. It is especially relevant when the by argument produces a like-indexed result.

46
Q

How does ‘observed’ parameter in pandas.DataFrame.groupby affect categorical groupers?

A

The ‘observed’ parameter in pandas.DataFrame.groupby decides whether to show only observed values for categorical groupers (if True), or show all values for categorical groupers (if False).

47
Q

What happens when ‘dropna’ parameter in pandas.DataFrame.groupby is True?

A

When ‘dropna’ parameter in pandas.DataFrame.groupby is True, and if group keys contain NA values, those NA values along with the corresponding row/column will be dropped.

48
Q

What is the significance of ‘apply’ method in pandas.DataFrame.groupby?

A

The ‘apply’ method in pandas.DataFrame.groupby is used to apply a certain function to each group of values. The function should take a DataFrame, and return either a pandas object (e.g., DataFrame, Series) or a scalar; the combine operation will be tailored to the type of output returned.

49
Q

What is the significance of ‘agg’ or ‘aggregate’ method in pandas.DataFrame.groupby?

A

The ‘agg’ or ‘aggregate’ method in pandas.DataFrame.groupby is used to apply one or multiple functions to the grouped data. The functions can be standard aggregation functions like ‘mean’, ‘sum’, etc., or custom ones.

50
Q

What is the significance of ‘transform’ method in pandas.DataFrame.groupby?

A

The ‘transform’ method in pandas.DataFrame.groupby is used to perform a transformation that returns a like-indexed. It can’t produce aggregated results and must return a result that is either the same size as the group chunk or broadcastable to the size of the group chunk.

51
Q

What is the significance of ‘filter’ method in pandas.DataFrame.groupby?

A

The ‘filter’ method in pandas.DataFrame.groupby is used to discard some groups, according to a group-wise computation that evaluates True or False. This method is used to filter the data by a condition that is applied on the group level.