Communicating Results Flashcards

Question

T or F: you can use .agg() with NaNs

Answer 1

False. You have to use.dropna() or .fillna() first before calculating the mean.

Answer 2

# Now, let's merge the two DataFrames on 'veh_class' to compare them Calculate mean for both years veh_08 = df_08.groupby('veh_class').agg(mean_cmb_mpg_08=('cmb_mpg', 'mean')).reset_index() veh_18 = df_18.groupby('veh_class').agg(mean_cmb_mpg_18=('cmb_mpg', 'mean')).reset_index() inc = pd.merge(veh_08, veh_18, on='veh_class') # Calculate the increase in fuel economy inc['increase'] = inc['mean_cmb_mpg_18'] - inc['mean_cmb_mpg_08'] # Display inc

Answer 3

pd.DataFrame(...): This function creates a new DataFrame . Dictionary Input: The DataFrame is constructed from a dictionary where: "year": This key corresponds to a list of years, ["2008", "2018"]. "model_num": This key corresponds to the average combined MPG for SmartWay vehicles for 2008 and 2018. smart_08["cmb_mpg"].mean(): This calculates the mean of the cmb_mpg column in the smart_08 smart_18["cmb_mpg"].mean(): Similarly, this calculates the mean of the cmb_mpg column in the smart_18

Answer 4

Sample DataFrame with runtime as datetime data = {'runtime': ['1970-01-01 00:00:00.000000162', '1970-01-01 00:00:00.000000120', '1970-01-01 00:00:00.00000090']} df_m = pd.DataFrame(data) # Convert 'runtime' to datetime df_m['runtime'] = pd.to_datetime(df_m['runtime']) # Function to convert datetime to "Xh Ym" format def format_runtime(dt): # Extract total seconds and convert to minutes total_minutes = int(dt.timestamp() // 60) # Get total minutes hours = total_minutes // 60 # Calculate hours minutes = total_minutes % 60 # Calculate remaining minutes return f"{hours}h {minutes}m" # Format as "Xh Ym" # Apply the function to the 'runtime' column df_m['runtime'] = df_m['runtime'].apply(format_runtime)

Answer 5

df_genre = df_m['genres'] # Create a new DataFrame df_g = pd.DataFrame({'genres': df_genre}) # Convert the string representation of lists to actual lists df_g['genres'] = df_g['genres'].apply(ast.literal_eval) # Extract the first genre's dictionary df_g['first_genre'] = df_g['genres'].apply(lambda x: x[0] if len(x) > 0 else None) # Create separate columns for the genre name and id df_g['first_genre_name'] = df_g['first_genre'].apply(lambda x: x['name'] if x is not None else None) df_g['first_genre_id'] = df_g['first_genre'].apply(lambda x: x['id'] if x is not None else None) # Display the resulting DataFrame print(df_g[['first_genre_name', 'first_genre_id']])

Answer 6

df.reindex(columns=new_order) Create a list with the desired column order. Use that list to index the DataFrame or use the reindex() method. This allows you to easily change the order of columns in your DataFrame as needed.

Answer 7

# Use ast.literal_eval to convert the string to a Python object part of the ast (Abstract Syntax Trees). You can use it to convert strings to a Python object or a list of dictionaries that you can work with in your code #import library import ast #String representation of a list of dictionaries data_string = "[{'name': 'Action', 'id': 28}, {'name': 'Adventure', 'id': 12}]" data_list = ast.literal_eval(data_string) print(data_list) # Output: [{'name': 'Action', 'id': 28}, {'name': 'Adventure', 'id': 12}]

Answer 8

np.random.rand(25,5).round(decimals=2)

Answer 9

Converts a string representation of a list into a Python list

Answer 10

Converts the string representation of a list into a python list df['col1'] = "[0.53, 0.59, 0.45]" df['col1'] = df['col1'].apply(lambda u: eval(u)) df['col1'] = [0.53, 0.59, 0.45]

Answer 11

df_explode = df.explode(column="column_name")

Answer 12

df_new = pd.DataFrame(df_explode["column_name"].tolist(), columns=["col1","col2","col3"])

Answer 13

A histogram df.hist() or df['col_name'].hist()

Answer 14

Bar charts df.bar() or df['col'].bar()

Answer 15

Scatter Plot df.scatter() or df['col'].scatter() pd.plotting.scatter_matrix()

Answer 16

Box plot df.box() or df['col'].box()

Answer 17

df.plot(x="col1", y="col2", kind=" ")

Answer 18

.idxmin() #Find the index of the minimum for column 1 lowest_value_index = df['column 1'].idxmin() #Access the corresponding value in column 2 using that index lowest_value = df.loc[lowest_value_index, 'column 2']

Answer 19

figsize .plot(figsize=(width,height))

Answer 20

df.plot(title='title of plot.')

Answer 21

df.plot(xlabel='X axis label') df.plot(ylabel='Y axis label') df.plot(ylim=(min_value, max_value)) #sets the min and max values for the y-axis

Answer 22

df.plot(color='color_name')

Answer 23

df.plot(legend=True)

Answer 24

x = df['column'].value_counts().index df['column'].value_counts()[x].plot(kind='bar');

Answer 25

Counts the unique values in a series and returns a new series containing the counts of each unique value.

Answer 26

color | count blue | 3 red | 2 green | 1

Answer 27

Index(['blue','red','green'], dtype='object') [3 2 1]

Answer 28

values_numpy = df['column'].values python_values = values_numpy.values.tolist() print(python_values)

Answer 29

Using the bitwise OR operator | x = df[(df['column'] == 'data) | (df_02['column'] == 'data2')]

Answer 30

df[df['column'].isin(['column_1', 'column_2'])]

Answer 31

Check for NaN values and drop them df['column'].isnull().sum() df.dropna(subset=['column'])

Answer 32

#Calculate the mean for both years value_24 = df24.groupby('data').agg(mean_col_2024=('col_2024','mean')).reset_index() value_25= df25.groupby('data').agg(mean_col_2025=('col_2025','mean')).reset_index() #merge the two dataframes on column 'data' merged_dfs = pd.merge(value_24, value_25, on='data') #Calculate the increase merged_dfs['increase'] = merged_dfs['mean_col_2024'] - merged_dfs['mean_col_2025'] #Display results merged_dfs

Answer 33

df refers to the dataframe .label accesses a column with the label within the df dataframe. .unique() returns an array of the unique values present in the label column output: array(['no','yes', dtype=object) You can then use the labels to filter df_filter = df.query('label == "yes"')

Answer 34

When the column name has no spaces, special characters, or numbers at the beginning When the column name conflicts with Dataframe such as columns or values When you want to use dynamic column access col_name - 'age' print(df.col_name)

Answer 35

# Sample DataFrame with runtime as datetime pd.to_datetime()

Answer 36

Sample DataFrame with runtime as datetime data = {'runtime': ['1970-01-01 00:00:00.000000162', '1970-01-01 00:00:00.000000120', '1970-01-01 00:00:00.00000090']} df_m = pd.DataFrame(data) Convert 'runtime' to datetime df_m['runtime'] = pd.to_datetime(df_m['runtime']) Function to convert datetime to "Xh Ym" format def format_runtime(dt): # Extract total seconds and convert to minutes total_minutes = int(dt.timestamp() // 60) # Get total minutes hours = total_minutes // 60 # Calculate hours minutes = total_minutes % 60 # Calculate remaining minutes return f"{hours}h {minutes}m" # Format as "Xh Ym" Apply the function to the 'runtime' column df_m['runtime'] = df_m['runtime'].apply(format_runtime)

Answer 37

# Create a new DataFrame import pandas as pd import ast Assuming df_m is your original DataFrame # Creating a new DataFrame to clean up the genre column df_genre = df_m['genres'] df_g = pd.DataFrame({'genres': df_genre}) # Converting the string representation of lists into actual lists df_g['genres'] = df_g['genres'].apply(ast.literal_eval) # Exploding the genres column to get each genre in a separate row df_g = df_g.explode('genres') # Now extract the genre's name and id into separate columns df_g['genre_name'] = df_g['genres'].apply(lambda x: x['name'] if isinstance(x, dict) else None) df_g['genre_id'] = df_g['genres'].apply(lambda x: x['id'] if isinstance(x, dict) else None) # Dropping the original 'genres' column if you no longer need it df_g = df_g.drop(columns=['genres']) # Display the resulting DataFrame with genre names and IDs print(df_g[['genre_name', 'genre_id']])

Answer 38

Aggregate Information: If you want to summarize the information instead of listing all combinations, you could aggregate the data. For example, you could concatenate the genre names into a single string for each movie. dataframe['genres'] = df_g.groupby('Movie Title')['genre_name'].transform(lambda x: ', '.join(x)) df_g = df_g.drop_duplicates(subset=['Movie Title']) # Keep only one row per movie

Answer 39

dataframe.drop(index=###, errors='ignore')

Answer 40

import sqlite3 conn = sqlite3.connect() c = conn.cursor() data = pd.read()

Answer 41

var = pd.read_sql("SQL clause", sqlite3.connect('file_path_to_database'))

Communicating Results Flashcards

Summarizing descriptive statistics, plotting visualizations, drawing conclustions, and customizing visuals to communicate results (66 cards)