Pandas Flashcards

(34 cards)

1
Q

Filling NA’s in one column with another

fullDf[‘forecast_date’] has NA fill with fullDf.day

A

fullDf[‘forecast_date’].fillna(fullDf.day, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

calls_final.calls_tw has NaN values. replace them with 5

A

calls_final.calls_tw=calls_final.calls_tw.map(lambda x: x if np.isfinite(x) else 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Add a series of integers to df_test1[‘forecast_date’]

A
df_test1['add']=pd.to_timedelta(df_test1['add'], unit='D')
# convert integers to days this way
df_test1['forecast_date']=df_test1['forweek']+df_test1['add']
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Calculate day of week from a date python

df[‘date’] is a series of dates

A
df['date'].dt.weekday_name
#df['date'].dt.dayofweek will give numbers Monday :0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sort dataframe ‘df’ by index and save

A

df.sort_index(inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Get today’s date and

Convert into string datetime object into ‘2017-07-17’ format

A
Today = datetime.now()
Today = Today.strftime("%Y-%m-%d")
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

summary for a dataframe df

A

train. info() #will give a summary for the entire data

train. describe() ##will give a summary for continuous variables in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

filter df for having only values 3 and 6,9 in column A

A

df[df[‘A’].isin([3, 6])]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

drop rows which have indices 1 and 3 in df

A

df.drop(df.index[[1,3]])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Convert series df[‘date’] to datetime objects df

A

df[‘date’]=df[‘date’].map(lambda x: pd.to_datetime(x,dayfirst=True))
df.date=pd.to_datetime(df.date,dayfirst=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Reset index of dataframe df

A

df = df.reset_index()

del df[‘index’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Check if col1 and col2 in df are equal

A

df[‘col1’].equals(df[‘col2’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Convert series df.col1 to a list

A

list1=df.col1.tolist()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dropping rows by null values in a column df.col1

A

df = df[np.isfinite(df.col1)]
df=df[pd.notnull(df.col1)]

df=df[pd.isnull(df.col1)] ##keeping only those rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Filtering a dataframe df by column Gender

A

df[df[Gender]==’Male’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Filtering a dataframe df by two columns Gender and Year

A
df[(df[Gender]=='Male') & (df[Year]==2014)]
#Dont forget th round brackets
17
Q

Delete col1 from datafrmae df

A

del df[‘col1’]

df.drop(df.columns[[0, 1, 3]], axis=1) # delete columns by number

18
Q

Check datatype of col1 or of whole dataframe

A

df. dtypes

df. col1.dtype

19
Q

sort dataframe df according to col1 values

then according to col1 and col2 values

A
#Sorting the dataframe based on values of one column 
df.sort_values(by='col1',ascending=True)
#based on two columns
df.sort_values(['col1', 'col2'], ascending=[True, False])
20
Q

Print variable and string together

String : hello Variable: Name

A

print “I have %s” % Name

21
Q

correlation between df.col1 and df.col2

A

np.corrcoef(df.col1,df.col2)[0,1]

22
Q

Rename col1 of df to col2

A

df=df.rename(columns = {‘col1’:’col2’})

23
Q

groupby by categorical col3 and aggregate mean

A

gb =df.groupby(df.col3)
gb.agg(‘mean’)

gb.agg({‘col1’: ‘sum’,’col2’:’mean’})

24
Q

mapping df.country to create df.capital

A

map1={‘India’:’Delhi’,’Canada’:’Ottawa’}

df[‘capital’] = df[‘country’].map(map1)

25
drop duplicate rows from a df
df.drop_duplicates() #just drops duplicate rows #just keep the first one
26
drop duplicates from a df for a column col1
df.drop_duplicates(['col1']) #drops duplicates by a single column #just keep the first one
27
Drop duplicates from a df for a column col1. | Keeping the last one
df.drop_duplicates(['col1'],keep='last') #take the last value of duplicate
28
pivot dataframe
df_piv = df.pivot(index='date',columns='variable',values='value')
29
all type of merges?
a1=pd.merge(dframe1,dframe2) #default merge inner join on some column a=pd.merge(dframe1,dframe2,on='key') #inner join b=pd.merge(dframe1,dframe2,on='key',how='left') #left join c=pd.merge(dframe1,dframe2,on='key',how='outer') #outer join d=pd.merge(df_left, df_right, on=['key1', 'key2'], how='outer') #on multiple keys e=pd.merge(left, right, left_on='key1', right_on='key2')
30
read csv and write csv syntax?
``` a=pd.read_csv('lec25.csv') b=pd.read_table('lec25.csv',sep=',') c=pd.read_csv('lec25.csv',header=None) d=pd.read_csv('lec25.csv',header=None,nrows=2) dframe1.to_csv('mytextdata_out.csv') ```
31
concatenate df1 and df2
pd.concat([df1,df2])
32
create a dataframe df
from numpy.random import randn df1=DataFrame(randn(25).reshape((5,5)),columns=list('abcde'),index=list('12345')) dframe2 = DataFrame({'key':['Q','Y','Z'],'data_set_2':[1,2,3]})
33
Pivot df syntax?
long to wide is pivot. Pivot takes 3 arguments with the following names: index, columns(cat) , and values(num) entries inside the column(cat) will be used to create new columns index will have distinct values values will go inside the table p = d.pivot(index='Item', columns='CType') If you omit values all numerical columns in the datframe will be used. multi index will be created
34
Unpivot
wide to long is unpivot/melt df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'}, 'B': {0: 1, 1: 3, 2: 5}, 'C': {0: 2, 1: 4, 2: 6}}) pd.melt(df, id_vars=['A'], value_vars=['B','C'], var_name="Person", value_name="Score") ##set the variables (columns) that we want to leave unaffected as id_vars. All variables not included in this list will become rows in a new column (which has the name given by “var_name”) if you do not specify value_vars.