3rd Flashcards

1
Q

move a column to become the index

A

df.set_index(‘c’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reset index

A

df.reset_index()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Remove index col

A

df.reset_index(drop = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multiple index columns

A

df.set_index([‘c’, ‘c1’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Select only year from date

A

df[‘c’].dt.year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Select only month from date

A

df[‘c’].dt.month

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Detect missing values

A

df.isna()

           name           breed
0         false             false
1          false             True
2         false             True
3         false              False
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Does df columns have NA

A

df.isna().any()

name false
breed True
color True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Count # of na in a column

A

df.isna().sum()

name 0
breed 2
color 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Remove all NAs from a df

A

df.dropna()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Replace NAs with a missing value

A

df.fillna(0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

List of dictionaries

A

list_of_dicts = [
{‘name’: ‘Ginger’, ‘Breed : ‘Lab’, ‘kg’:22},
{‘name’: ‘qwuire’, ‘Breed : ‘good’, ‘kg’:12}
]

              name               breed             kg 0               Ginger               Lab               22 1                qwuire               good             12
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Dictionary of lists

A

dict_of_lists = {
‘name’ : [‘Ginger’, ‘qwuire’],
‘breed’ : [‘Lab’, ‘good’],
‘weight’ : [22, 12]
}

              name               breed             kg 0               Ginger               Lab               22 1                qwuire               good             12
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Write csv

A

pd.to_csv(‘file/path’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Inner join

A

Only return rows where the values match in both tables

new_df = df1.merge(df2, on = ‘col’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Add suffixes to joins

A

suffixes = (‘_x’, ‘_y’)

new_df = df1.merge(df2, on = ‘col’, suffixes = (‘_x’, ‘_y’))

17
Q

Merge on multiple columns

A

new_df = df1.merge(df2, on = [‘col’, ‘col1’])

18
Q

Merge multiple tables

A

new_df = df1.merge(df2, on = ‘col’) \ .merge(df3, on = ‘col1’)

19
Q

Left join

A

returns all rows of data from the left table and only the rows from the right table that match

new_df = df1.merge(df2, on = ‘col’, how = ‘left’)

20
Q

Merge on same column with two different names

A

new_df = df1.merge(df2, left_on = ‘df1_col’, right_on = ‘df2_col’ )

21
Q

Outer join

A

Join both columns regardless if there is a match between the two tables

new_df = df1.merge(df2, on = ‘col’, how = ‘outer’)

22
Q

Semi join

A

only columns from the left table, but with no duplicates. The first and then complete and moves on.

23
Q

Confirm your the filtered joins

A

new_df = df1.merge(df2, on = ‘col’)
obs = df1[df1[‘col’].isin(new_df[‘col’])]
print(obs)

24
Q

Anti Join

A

Only return columns from the left table and not the right table

Step 1
new_df = df1.merge(df2, on = ‘col’, how = ‘left’, indicator = True)

Step 2
df_bool = df_merge.loc[df_merge[‘_merge’] == ‘left_only’, ‘col’]

Step 3
df_left = df1[df1[‘col’].isin(df_bool)]

25
Q

Concatenate

A

pd.concat([df, df1, df2], axis = 0)

26
Q

Concatenate and ignore index

A

pd.concat([df, df1, df2], axis = 0, ignore_index = True)

27
Q

Concatenate and set labels in index of original tables

A

pd.concat([df, df1, df2], axis = 0, ignore_index = False, keys = [‘1’, ‘2’, ‘3’])

          name     cid
1    0      jj         234
     1       ss         11
2   0
      1
3   0
      1
28
Q

Merge Ordered

A

Helpful for ordered or time series data as the results are sorted

df_merge = pd.merge_ordered(df1, df2, on = ‘col’, suffixes = (‘_df1’, ‘_df2’))

29
Q

Merge ordered forward fill

A

fill missing with previous value

df_merge = pd.merge_ordered(df1, df2, on = ‘col’, suffixes = (‘_df1’, ‘_df2’), fill_method = ‘ffill’)

30
Q

Merge as of

A

Similar to merge ordered

merges to the nearest key column and not exact matches

The merged on columns must be sorted

df_merge = pd.per_asof(df1, df2, on = ‘col’, suffixes = (‘_df1’, ‘_df2’))