Python Pandas Flashcards
(35 cards)
Create a dictionary called my_dict with the following three key value pairs:
key ‘country’ and value names.
key ‘drives_right’ and value dr.
key ‘cars_per_cap’ and value cpc.
my_dict={‘country’: names, ‘drives_right’: dr, ‘cars_per_cap’: cpc}
Use pd.read_csv() to import cars.csv data as a DataFrame. Store this DataFrame as cars.
cars = pd.read_csv(‘cars.csv’)
Specify the index_col argument inside pd.read_csv(): set it to 0, so that the first column is used as row labels.
cars = pd.read_csv(‘cars.csv’, index_col=0)
Use single square brackets to print out the ‘country’ column of cars as a Pandas Series.
print(cars[‘country’])
Use double square brackets to print out the ‘country’ column of cars as a Pandas DataFrame.
print(cars[[‘country’]])
Use double square brackets to print out a DataFrame with both the ‘country’ and ‘drives_right’ columns of cars, in this order.
print(cars[[‘country’, ‘drives_right’]])
Select the first 3 observations from cars and print them out.
print(cars[0:3])
Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out.
print(cars[3:6])
3 ways to inspect a dataframe
Inspect the first few rows (including index labels)
print(df.head())
Inspect the last few rows
print(df.tail())
Inspect random sample rows
print(df.sample(5))
Use loc or iloc to select the observation corresponding to Japan as a Series. The label of this row is JPN, the index is 2. Make sure to print the resulting Series.
print(cars.loc[‘JPN’])
print(cars.iloc[2])
Print out the ‘drives_right’ value of the row corresponding to Morocco (its row label is MOR)
print(cars.loc[‘MOR’, ‘drives_right’])
Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns ‘country’ and ‘drives_right’.
print(cars.loc[[‘RU’, ‘MOR’], [‘country’, ‘drives_right’]])
Print out from the df cars the drives_right column as a Series using loc
print(cars.loc[:,’drives_right’])
Print out the drives_right column as a DataFrame using loc
print(cars.loc[:, [‘drives_right’]])
Print out both the cars_per_cap and drives_right column as a DataFrame using loc
print(cars.loc[:, [‘cars_per_cap’, ‘drives_right’]])
Which areas in my_house are greater than 18.5 or smaller than 10?
print(np.logical_or(my_house > 18.5, my_house < 10))
Which areas are smaller than 11 in both my_house and your_house? Make sure to wrap both commands in print() statement, so that you can inspect the output.
print(np.logical_and(my_house < 11, your_house < 11))
make an if statement that prints out “looking around in the kitchen.” if room equals “kit”.
if room == “kit” :
print(“looking around in the kitchen.”)
Write another if statement that prints out “big place!” if area is greater than 15.
if area >15:
print(“big place!”)
Extract the drives_right column as a Pandas Series and store it as dr.
dr = cars[‘drives_right’]
Use dr, a boolean Series, to subset the ‘cars’ DataFrame. Store the resulting selection in ‘sel’.
sel = cars[dr]
Select the cars_per_cap column as a Pandas Series and store it as cpc
cpc = cars[‘cars_per_cap’]
Create car_maniac: observations that have a cars_per_cap over 500
cpc = cars[‘cars_per_cap’]
many_cars = cpc > 500 # This creates a boolean Series
print(many_cars)
Create medium: observations with cars_per_cap between 100 and 500
cpc = cars[‘cars_per_cap’]
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]