pandas Flashcards

Question 1

Q

what is loc?

Answer

A

it selects rows by values. for example: elections.loc[0:4] returns the first 4 rows in the election table, as they are rows 0 to rows 4 end inclusive.

Question 2

Q

what is the tail function?

Answer

A

returns the last n of table: elections.tail(2) returns the last 2 rows in the elections table.

Question 3

Q

how can you use loc for column names?

Answer

A

example: elections.loc[0:4, “year” : “party”]

Question 4

Q

what is iloc?

Answer

A

selects values by number. example: elections.iloc[[1, 2, 3][0,1,2]] returns the 2nd 3rd and 4th columns then the 1st 2nd and 3rd columns

Question 5

Q

how do loc and iloc differ?

Answer

A

loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.

Question 6

Q

how do we get a datafram out of our loc or iloc function?

Answer

A

use to frame:
elections[“candidate”].tail(5).to_frame()

Question 7

Q

what is isin?

Answer

A

isin helps us find the values we are looking for in the data frame:
wanted = [“Anti-Masonic”, “American”, “Anti-Monopoly”, “American Independent”]
elections[elections[“Party”].isin(wanted)]

Question 8

Q

what is str.strartswith?

Answer

A

str.startswith helps us find the values in a specific column that start with the specified letter:

elections[elections[“Party”].str.startswith(“A”)]

Question 9

Q

how do we use query?

Answer

A

query lets you choose when values equal or are more/less than speific numbers, or when values = win, i.e. a specific value.

elections.query(‘Year >= 2000 and Result == “win”’)

Question 10

Q

what does .describe() return?

Answer

A

std, mean, etc.

Question 11

Q

what is value_counts()?

Answer

A

value_counts() retunrs the # of occurences of unique values;

Question 12

Q

what is unique()?

Answer

A

returns every unique value in the specified column:

elections[‘party’].unique()

Question 13

Q

what is sort_values()?

Answer

A

this function sorts values in word columns A-Z without specification, can sort numbers as well, like in column ‘%’:

elections.sort_values(“%”, ascending = False)

Question 14

Q

what is str.len()?

Answer

A

a funcion that returns the length of the values in a specified column.

babyname_lengths = babynames[“Name”].str.len()

Question 15

Q

how do we drop a column?

Answer

A

babynames = babynames.drop(“name_lengths”, axis = ‘columns’)

Question 16

Q

HIGH LEVEL : in babynames df, sort by the number of occurrences of “dr” plus the number of occurences of “ea”.

Answer

Study These Flashcards

A

create the temporary column

def dr_ea_count(string):
return string.count(‘dr’) + string.count(‘ea’)

babynames[“dr_ea_count”] = babynames[“Name”].map(dr_ea_count)

babynames = babynames.sort_values(by = “dr_ea_count”, ascending=False)
babynames.head()

Question 17

Q

use groupby; Try to create a groupby.agg call that gives the total babies born with each name.

Answer

Study These Flashcards

A

puzzle1 = female_babynames.groupby(“Name”)[[“Count”]].agg(sum)

Question 18

Q

how do we rename columns?

Answer

Study These Flashcards

A

rtp_table = rtp_table.rename(columns = {“Count”: “Count RTP”})

Question 19

Q

make the babynames pivot table;

babynames_pivot = babynames.pivot_table(
index=
columns=’
values=
aggfunc=
)
babynames_pivot.head(6)

Answer

Study These Flashcards

A

babynames_pivot = babynames.pivot_table(
index=’Year’, # the rows (turned into index)
columns=’Sex’, # the column values
values=[‘Count’], # the field(s) to processed in each group
aggfunc=np.sum, # group operation
)
babynames_pivot.head(6)

Question 20

Q

how do we merge two tables that have name columns in common?

Answer

Study These Flashcards

A

merged = pd.merge(left = elections, right = male_2020_babynames,
left_on = “First Name”, right_on = “Name”)
merged

pandas Flashcards

(20 cards)