Pandas Flashcards

(46 cards)

1
Q

Filter df for when Column is null

A

df[ df.Column.isnull() ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Filter df for when Column is not null

A

df[ df.Column.notna() ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Create a boolean series for when colA >100 AND colB <0

A

(df.colA >100 & df.colB <0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Create a boolean series for when colA >100 OR colB <0

A

(df.colA >100 | df.colB <0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Return a dataframe’s data types

A

df.dtypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Return the dimensions of a dataframe

A

df.shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Rename column MANUfacturer as ‘manufacturer’

A

df.rename( columns= {‘MANUfacturer’ : ‘manufacturer’}, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Convert a string column to a float

A

df[‘column’] = df[‘column’].astype(float)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Extract first prefix when string column is split by a dash

A

df.column.str.split(‘-‘).str[0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Replace values in Column using a mapping dictionary

A

df.column = df.column.map( {‘Key’ : ‘newkey’, ‘Key1’ : ‘newkey1’ } )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Export dataframe to csv file without index values

A

df.to_csv(‘filename.csv’, index=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Get meta-data information for the columns of a dataframe

A

df.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Get the name of the columns in a dataframe

A

df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Get descriptive statistics for a column

A

df.column.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Get frequencies for each unique value in a column

A

df.column.value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Get the averages of col_B grouped by col_A

A

df.groupby(df.col_A).col_B.mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Apply the size, min, and max functions to the dataframe grouped by col_A

A

df.groupby(df.col_A).agg( [‘size’, min, max] )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Create a pivot table where col_V1 is sumed and col_V2 shows the min and max. Have col_1 and col_2 as rows and col_A and col_B as columns. Include grand totals.

A
df.pivot_table(
     values=['col_V1', col_V2'],
     index=['col_1', 'col_2'],
     columns=['col_A', 'col_B'],
    aggfunc={
          'col_v1': sum, 
          'col_V2': [min, max] },
     margins=True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

List the index (aka row labels) of a dataframe

20
Q

Convert a dataframe (or series) into a numpy array

A

df.to_numpy()

21
Q

Assign colA and colB as the index (multiIndex) of the dataframe

A

df.set_index( [‘colA’, ‘colB’], inplace=True)

22
Q

Vertically append two dataframes and assign an additional index indicating which df the row came from

A

pd.concat( [df1,df2], keys=[1, 2])

23
Q

Do an inner join on the indexes two dataframes and add a suffix to duplicated column names

A

df1.merge(df2,
left_index=True,
right_index=True,
suffixes= (‘_df1’,’_df2’))

24
Q

Apply a function element-wise to a series

A

df. col_name.apply(function_name)
- - OR –
df. col_name.map(function_name)

25
Apply a function element-wise to a dataframe
df.applymap(function_name)
26
Apply a function along the columns of a dataframe
df.apply(function_name)
27
Unpivot a dataframe and rename the variables as ID and the values as 'fact'
df.melt( id_vars=[col1, col2], value_vars=[col3, col4], #defaults to all non-id_vars var_name= 'ID', value_name = 'fact')
28
Return a boolean mask if a regex pattern is found in a certain column
df[col_name].str.contains(pattern)
29
Extract a regex capture group from a column
df[col_name].str.extract(pattern)
30
Extract more than one group of patterns from a column
df[col_name].str.extract(pattern_with_multiple_capture_groups)
31
Replace a regex or string in a column with another string
df[col_name].str.replace(pattern, replacement_string)
32
Calculate the number of missing values in each column
df.isnull().sum()
33
Drop rows with any missing values
df.dropna()
34
Drop specific columns
df.drop(columns_to_drop, axis=1)
35
Drop columns with less than a certain number of non-null value
df.dropna(thresh = min_nonnull, axis=1)
36
Replace missing values in a column with another value
df[col_name].fillna(replacement_value)
37
Show all duplicate rows in a dataframe
df[ df.duplicated( keep=False ) ]
38
Drop rows with duplicate values in only certain columns. Keep the last duplicate row
df.drop_duplicates( [col_1, col_2], keep='last')
39
Replace values of column_A with values of column B when column_A is less than zero
df. column_A = df.column_A.mask( df. column_A < 0, df. column_b)
40
Resetting the index
df.reset_index(inplace=True)
41
Do a left join on a shared column named 'ID;
df1.merge( df2, on='ID', how-'left')
42
Fill in missing values of a datafram with zeros
df.fillna(0, inplace=True)
43
Find the correlations between columns in a dataset
df.corr()
44
Convert a column if a dataframe into a list
new_list = df.column.tolist()
45
Get all rows of a dataframe where the value of a column is not in the elements of a list
df[ ~df.column_name.isna( [list_values] ) ]
46
Sort a dataframe by a col_A descinding and col_B ascending and reset the index
df.sort_values( ['col_A', 'col_B'], ascending=[False, True], ignore_index=True, inplace=True)