Pandas Flashcards

(45 cards)

1
Q

Axis?

A

Axis 0 = rows

Axis 1 = columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reorder columns in dataframe
Option 1)
Option 2)

A

1) df.sort_index(axis=1)

2) df.reindex(columns=sorted(df.columms))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explore document
1) first rows
2)
3)

A

Df.head()
Df.info()
Df.describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reorder dataframe

1) by index
2) by a particular column

A

1) df.sort_index()

2) df.sort_values(by=column)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Remove columns from dataframe

A

Df.drop([col1, col2], axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Filter dataframe by a value in a column?

A

Df[df[‘column’]<5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Filter dataframe by several conditions

A

Df[df[‘column’] >0 and df[‘column] == ‘Berlin’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Filter column in df by list of values

A

List = [‘one’, ‘two’]

Df[df[‘col’].isin(list)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Filter column values by contains

A

Df[df[‘col’].str.contains(pattern)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Unique values in columm

A

Df[‘col’].unique()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Filter column values by does not contain?

A

Df[df[‘col’].str.contains(‘blabla’)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Join dataframes

1) merge
2) join

A

1)
Pd.merge(left, right, how=’inner’,on=None,left_on=None, right_on=None
…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Concatenate datasets

A

Pd.concat([df1,df2],axis=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Create new columns based on existing columns?

1) 2 conditions
2) 2 conditions

A

1) ifelse
Np.where(condition, ‘yes’, ‘no’)

2) ternary expression
Df[‘col’] = df[‘number’].apply(lambda x:
‘more than 5’ if x > 5 else ‘5 or less’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Create column based on existing column

1) one column, 3+ conditioma

A

Df[‘ncol’] = df[‘oldcol’].apply(function)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Create new column based on multiple columns

A

Df[‘newcol’] = df.apply(lambda row: function(row), axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Convert x to string
1)
2)

A

.astype(str)

Str()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Check missing values in column

A

Isnull().any()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Remove rows with missing values

20
Q

Fill missing values

21
Q

Filter column values by regex

A

Df[‘col’].str.contains(pattern,regex=True)

22
Q

Filter groups where the max. value of cumsum_cpo > 0

A

test_converting = test.groupby(‘campaign_id’).filter(lambda g: g[‘cumsum_cpo’].max() != 0)

23
Q

Cumsum by groups

A

facebook_costs[‘cumsum’] = facebook_costs.groupby(‘campaign_id’)[‘spend’].cumsum()

24
Q

select index by condittion and convert to list

A

df.index[conversions_sitelinks[‘conversions’].str.len() > 7].tolist()

25
Find maximum values per group
result = df.groupby('A').apply(lambda g: g[g['B'] == g['B'].max()])
26
Count of values
Value_counts()
27
Count of values, relative %
Value_counts(normalize=True)[value] | df['col'].Value_counts(normalize=True)
28
group by and aggregate
data[data['item'] == 'call'].groupby('month').agg( # Get max of the duration column for each group max_duration=('duration', max), # Get min of the duration column for each group min_duration=('duration', min), # Get sum of the duration column for each group total_duration=('duration', sum), # Apply a lambda to date column num_days=("date", lambda x: (max(x) - min(x)).days)
29
Correlation two columns
df['col1'].corr(df['col2'])
30
Percent change to previous row
- datetime as index | pct_change()
31
Iterate through two lists at once
``` >>> letters = ['a', 'b', 'c'] >>> numbers = [0, 1, 2] >>> for l, n in zip(letters, numbers): ... print(f'Letter: {l}') ... print(f'Number: {n}') ```
32
Create new column with percentage of total from another column
df['col_pct'] = df.col / df.col.sum()
33
Print dimensions of df
print(f'df:{df.shape}')
34
Number of unique values in a column
len(set(df.col))
35
Correlation with all other columns in df
result = df.corr()[['col']]
36
Join dataframes (merge) 1) index 2) other columns
``` result = df1.merge(df2, on='index', how='left') result = df1.merge(df2, left_on='col1', right_on='col2', how='left') ```
37
Display information to a function
?name
38
display version of pandas
pd. __version__
39
Set max. number of rows
pd. set_option('display.max_rows', 500) | pd. options.display.
40
Remove duplicates from list: 1) list comprehension 2) set
1) res = [] [res.append(x) for x in test_list if x not in res] 2) list(set(test_list))
41
Unique values in a column
set(df.col)
42
rename column in dataframe
df.rename(columns={'two':'new_name'}, inplace=True)
43
replace string in a column with another string
df['newcol'] = df['oldcol'].str.replace('str','another_str')
44
stack two dataframes
transactions = pd.concat([transactions_1, transactions_2],axis=0)
45
new column: sum over groups
final_new['sum'] = final_new.groupby('created')['count_transactions'].transform("sum")