Pandas Flashcards by Kevin Taylor

Filter df for when Column is null

df[ df.Column.isnull() ]

How well did you know this?

Not at all

Perfectly

Filter df for when Column is not null

df[ df.Column.notna() ]

How well did you know this?

Not at all

Perfectly

Create a boolean series for when colA >100 AND colB <0

(df.colA >100 & df.colB <0)

How well did you know this?

Not at all

Perfectly

Create a boolean series for when colA >100 OR colB <0

(df.colA >100 | df.colB <0)

How well did you know this?

Not at all

Perfectly

Return a dataframe’s data types

df.dtypes

How well did you know this?

Not at all

Perfectly

Return the dimensions of a dataframe

df.shape

How well did you know this?

Not at all

Perfectly

Rename column MANUfacturer as ‘manufacturer’

df.rename( columns= {‘MANUfacturer’ : ‘manufacturer’}, inplace=True)

How well did you know this?

Not at all

Perfectly

Convert a string column to a float

df[‘column’] = df[‘column’].astype(float)

How well did you know this?

Not at all

Perfectly

Extract first prefix when string column is split by a dash

df.column.str.split(‘-‘).str[0]

How well did you know this?

Not at all

Perfectly

Replace values in Column using a mapping dictionary

df.column = df.column.map( {‘Key’ : ‘newkey’, ‘Key1’ : ‘newkey1’ } )

How well did you know this?

Not at all

Perfectly

Export dataframe to csv file without index values

df.to_csv(‘filename.csv’, index=False)

How well did you know this?

Not at all

Perfectly

Get meta-data information for the columns of a dataframe

df.info()

How well did you know this?

Not at all

Perfectly

Get the name of the columns in a dataframe

df.columns

How well did you know this?

Not at all

Perfectly

Get descriptive statistics for a column

df.column.describe()

How well did you know this?

Not at all

Perfectly

Get frequencies for each unique value in a column

df.column.value_counts()

How well did you know this?

Not at all

Perfectly

Get the averages of col_B grouped by col_A

df.groupby(df.col_A).col_B.mean()

How well did you know this?

Not at all

Perfectly

Apply the size, min, and max functions to the dataframe grouped by col_A

df.groupby(df.col_A).agg( [‘size’, min, max] )

How well did you know this?

Not at all

Perfectly

Create a pivot table where col_V1 is sumed and col_V2 shows the min and max. Have col_1 and col_2 as rows and col_A and col_B as columns. Include grand totals.

df.pivot_table(
     values=['col_V1', col_V2'],
     index=['col_1', 'col_2'],
     columns=['col_A', 'col_B'],
    aggfunc={
          'col_v1': sum, 
          'col_V2': [min, max] },
     margins=True)

How well did you know this?

Not at all

Perfectly

List the index (aka row labels) of a dataframe

Study These Flashcards

df.index

Convert a dataframe (or series) into a numpy array

Study These Flashcards

df.to_numpy()

Assign colA and colB as the index (multiIndex) of the dataframe

Study These Flashcards

df.set_index( [‘colA’, ‘colB’], inplace=True)

Vertically append two dataframes and assign an additional index indicating which df the row came from

Study These Flashcards

pd.concat( [df1,df2], keys=[1, 2])

Do an inner join on the indexes two dataframes and add a suffix to duplicated column names

Study These Flashcards

df1.merge(df2,
left_index=True,
right_index=True,
suffixes= (‘_df1’,’_df2’))

Apply a function element-wise to a series

Study These Flashcards

df. col_name.apply(function_name)
- - OR –
df. col_name.map(function_name)

Apply a function element-wise to a dataframe

df.applymap(function_name)

Apply a function along the columns of a dataframe

df.apply(function_name)

Unpivot a dataframe and rename the variables as ID and the values as 'fact'

df.melt( id_vars=[col1, col2], value_vars=[col3, col4], #defaults to all non-id_vars var_name= 'ID', value_name = 'fact')

Return a boolean mask if a regex pattern is found in a certain column

df[col_name].str.contains(pattern)

Extract a regex capture group from a column

df[col_name].str.extract(pattern)

Extract more than one group of patterns from a column

df[col_name].str.extract(pattern_with_multiple_capture_groups)

Replace a regex or string in a column with another string

df[col_name].str.replace(pattern, replacement_string)

Calculate the number of missing values in each column

df.isnull().sum()

Drop rows with any missing values

df.dropna()

Drop specific columns

df.drop(columns_to_drop, axis=1)

Drop columns with less than a certain number of non-null value

df.dropna(thresh = min_nonnull, axis=1)

Replace missing values in a column with another value

df[col_name].fillna(replacement_value)

Show all duplicate rows in a dataframe

df[ df.duplicated( keep=False ) ]

Drop rows with duplicate values in only certain columns. Keep the last duplicate row

df.drop_duplicates( [col_1, col_2], keep='last')

Replace values of column_A with values of column B when column_A is less than zero

df. column_A = df.column_A.mask( df. column_A < 0, df. column_b)

Resetting the index

df.reset_index(inplace=True)

Do a left join on a shared column named 'ID;

df1.merge( df2, on='ID', how-'left')

Fill in missing values of a datafram with zeros

df.fillna(0, inplace=True)

Find the correlations between columns in a dataset

df.corr()

Convert a column if a dataframe into a list

new_list = df.column.tolist()

Get all rows of a dataframe where the value of a column is not in the elements of a list

df[ ~df.column_name.isna( [list_values] ) ]

Sort a dataframe by a col_A descinding and col_B ascending and reset the index

df.sort_values( ['col_A', 'col_B'], ascending=[False, True], ignore_index=True, inplace=True)

Pandas Flashcards

(46 cards)