Pandas Flashcards
(34 cards)
Filling NA’s in one column with another
fullDf[‘forecast_date’] has NA fill with fullDf.day
fullDf[‘forecast_date’].fillna(fullDf.day, inplace=True)
calls_final.calls_tw has NaN values. replace them with 5
calls_final.calls_tw=calls_final.calls_tw.map(lambda x: x if np.isfinite(x) else 0)
Add a series of integers to df_test1[‘forecast_date’]
df_test1['add']=pd.to_timedelta(df_test1['add'], unit='D') # convert integers to days this way df_test1['forecast_date']=df_test1['forweek']+df_test1['add']
Calculate day of week from a date python
df[‘date’] is a series of dates
df['date'].dt.weekday_name #df['date'].dt.dayofweek will give numbers Monday :0
Sort dataframe ‘df’ by index and save
df.sort_index(inplace=True)
Get today’s date and
Convert into string datetime object into ‘2017-07-17’ format
Today = datetime.now() Today = Today.strftime("%Y-%m-%d")
summary for a dataframe df
train. info() #will give a summary for the entire data
train. describe() ##will give a summary for continuous variables in the data
filter df for having only values 3 and 6,9 in column A
df[df[‘A’].isin([3, 6])]
drop rows which have indices 1 and 3 in df
df.drop(df.index[[1,3]])
Convert series df[‘date’] to datetime objects df
df[‘date’]=df[‘date’].map(lambda x: pd.to_datetime(x,dayfirst=True))
df.date=pd.to_datetime(df.date,dayfirst=True)
Reset index of dataframe df
df = df.reset_index()
del df[‘index’]
Check if col1 and col2 in df are equal
df[‘col1’].equals(df[‘col2’])
Convert series df.col1 to a list
list1=df.col1.tolist()
Dropping rows by null values in a column df.col1
df = df[np.isfinite(df.col1)]
df=df[pd.notnull(df.col1)]
df=df[pd.isnull(df.col1)] ##keeping only those rows
Filtering a dataframe df by column Gender
df[df[Gender]==’Male’]
Filtering a dataframe df by two columns Gender and Year
df[(df[Gender]=='Male') & (df[Year]==2014)] #Dont forget th round brackets
Delete col1 from datafrmae df
del df[‘col1’]
df.drop(df.columns[[0, 1, 3]], axis=1) # delete columns by number
Check datatype of col1 or of whole dataframe
df. dtypes
df. col1.dtype
sort dataframe df according to col1 values
then according to col1 and col2 values
#Sorting the dataframe based on values of one column df.sort_values(by='col1',ascending=True)
#based on two columns df.sort_values(['col1', 'col2'], ascending=[True, False])
Print variable and string together
String : hello Variable: Name
print “I have %s” % Name
correlation between df.col1 and df.col2
np.corrcoef(df.col1,df.col2)[0,1]
Rename col1 of df to col2
df=df.rename(columns = {‘col1’:’col2’})
groupby by categorical col3 and aggregate mean
gb =df.groupby(df.col3)
gb.agg(‘mean’)
gb.agg({‘col1’: ‘sum’,’col2’:’mean’})
mapping df.country to create df.capital
map1={‘India’:’Delhi’,’Canada’:’Ottawa’}
df[‘capital’] = df[‘country’].map(map1)