pandas Flashcards

1
Q

what is loc?

A

it selects rows by values. for example: elections.loc[0:4] returns the first 4 rows in the election table, as they are rows 0 to rows 4 end inclusive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the tail function?

A

returns the last n of table: elections.tail(2) returns the last 2 rows in the elections table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how can you use loc for column names?

A

example: elections.loc[0:4, “year” : “party”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is iloc?

A

selects values by number. example: elections.iloc[[1, 2, 3][0,1,2]] returns the 2nd 3rd and 4th columns then the 1st 2nd and 3rd columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how do loc and iloc differ?

A

loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how do we get a datafram out of our loc or iloc function?

A

use to frame:
elections[“candidate”].tail(5).to_frame()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is isin?

A

isin helps us find the values we are looking for in the data frame:
wanted = [“Anti-Masonic”, “American”, “Anti-Monopoly”, “American Independent”]
elections[elections[“Party”].isin(wanted)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is str.strartswith?

A

str.startswith helps us find the values in a specific column that start with the specified letter:

elections[elections[“Party”].str.startswith(“A”)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do we use query?

A

query lets you choose when values equal or are more/less than speific numbers, or when values = win, i.e. a specific value.

elections.query(‘Year >= 2000 and Result == “win”’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does .describe() return?

A

std, mean, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is value_counts()?

A

value_counts() retunrs the # of occurences of unique values;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is unique()?

A

returns every unique value in the specified column:

elections[‘party’].unique()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is sort_values()?

A

this function sorts values in word columns A-Z without specification, can sort numbers as well, like in column ‘%’:

elections.sort_values(“%”, ascending = False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is str.len()?

A

a funcion that returns the length of the values in a specified column.

babyname_lengths = babynames[“Name”].str.len()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how do we drop a column?

A

babynames = babynames.drop(“name_lengths”, axis = ‘columns’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

HIGH LEVEL : in babynames df, sort by the number of occurrences of “dr” plus the number of occurences of “ea”.

A

create the temporary column

def dr_ea_count(string):
return string.count(‘dr’) + string.count(‘ea’)

babynames[“dr_ea_count”] = babynames[“Name”].map(dr_ea_count)

babynames = babynames.sort_values(by = “dr_ea_count”, ascending=False)
babynames.head()

17
Q

use groupby; Try to create a groupby.agg call that gives the total babies born with each name.

A

puzzle1 = female_babynames.groupby(“Name”)[[“Count”]].agg(sum)

18
Q

how do we rename columns?

A

rtp_table = rtp_table.rename(columns = {“Count”: “Count RTP”})

19
Q

make the babynames pivot table;

babynames_pivot = babynames.pivot_table(
index=
columns=’
values=
aggfunc=
)
babynames_pivot.head(6)

A

babynames_pivot = babynames.pivot_table(
index=’Year’, # the rows (turned into index)
columns=’Sex’, # the column values
values=[‘Count’], # the field(s) to processed in each group
aggfunc=np.sum, # group operation
)
babynames_pivot.head(6)

20
Q

how do we merge two tables that have name columns in common?

A

merged = pd.merge(left = elections, right = male_2020_babynames,
left_on = “First Name”, right_on = “Name”)
merged