Pandas Flashcards
(39 cards)
Creating a Series
pd.series(list)
pd.Series(data=my_list,index=labels)
Creating series along with the index
ser1 = pd.Series([1,2,3,4],index = [‘USA’, ‘Germany’,’USSR’, ‘Japan’])
ser1
USA 1
Germany 2
USSR 3
Japan 4
dtype: int64
fetch elements in series
varname[index]
ser1[‘USA’]
Create a DataFrame
pd.DataFrame(data, index= , columns = )
pd.DataFrame(randn(5,4),index=’A B C D E’.split(),columns=’W X Y Z’.split())
W X Y Z A 2.706850 0.628133 0.907969 0.503826 B 0.651118 -0.319318 -0.848077 0.605965 C -2.018168 0.740122 0.528813 -0.589001 D 0.188695 -0.758872 -0.933237 0.955057 E 0.190794 1.978757 2.605967 0.683509
fetching column from a dataframe
dataframe[col_name] –> df[‘W’]
dataframe[[col1,col2 ]] –> df[[‘W’,’Z’]]
Creating a new column in a dataframe
df[‘new’] = df[‘W’] + df[‘Y’]
how to remove a column
df.drop(‘col_name’,axis=1,inplace=True)
how to remove a rows
df.drop(‘index’, axis=0)
Selecting Rows from a DataFrame
df.loc[“index”]
df.loc[“label”]
Selecting multiple Rows and column in a DataFrame
df.loc[row, col]
df.loc[ [row1,row2,… ] , [col1,col2,… ] ]
df.loc[[‘A’,’B’],[‘W’,’Y’]]
Conditional Selection
An important feature of pandas is conditional selection using bracket notation, very similar
to numpy:
dataFrame[condition]
df[df>0]
df[df[‘col’] >0]
Selecting single and multiple columns
dataframe[ dataframe [“col_name”] < 5] [ ‘colname’]
df[df[‘W’]>0][‘Y’]
dataframe[ dataframe [“col_name”] < 5] [ [‘col1’,col2’] ]
df[df[‘W’]>0][[‘Y’,’X’]]
Selecting single and multiple columns with multiple conditions
For two conditions you can use | and & with parenthesis:
dataframe[ (condition1) & (condition2)]
df[(df[‘W’]>0) & (df[‘Y’] > 1)]
how to reset index
Reset to default 0,1…n index
dataframe.reset_index()
df.reset_index()
how to set specific column as index in a dataframe
Here we can give column name or we can provide pandas series
Dataframe.set_index(“col_name”)
df.set_index(‘States’)
df.set_index(‘States’,inplace=True) to change in permanently
how to create Multi-Index in a dataframe
zip the 2 list convert to list of tuples
first create 2 list outside and inside index
outside = [‘G1’,’G1’,’G1’,’G2’,’G2’,’G2’]
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
#set multiindex from tuple
hier_index = pd.MultiIndex.from_tuples(hier_index)
print(pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=[‘A’,’B’]))
A B
G1 1 -0.419202 -0.516732
2 0.036076 -0.205218
3 0.148782 0.942314
G2 1 0.001280 -1.956888
2 1.069982 -2.220442
3 1.180214 0.924161
How to fetch elements from the Multi-Index dataframe
For index hierarchy we use df.loc[], if this was on the
columns axis, you would just use normal bracket notation df[]. Calling one level of the index
returns the sub-dataframe:
df.loc[“G1”]
df.loc[‘G1’].loc[1]
how to drop missing values
df.dropna() This deletes from row wise
df.dropna(axis=1) this deletes from column wise
df.dropna(thresh=2) removes rows having 2 missing values
how to replace null values
df.fillna(value=’FILL VALUE’)
df[‘A’].fillna(value=df[‘A’].mean())
hello
Groupby
ok
this
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'], 'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'], 'Sales':[200,120,340,124,243,350]} df.groupby('Company')
Now you can use the .groupby() method to group rows together based off of a column
name. For instance let’s group based off of Company. This will create a DataFrameGroupBy
object
Company Person Sales GOOG Sam 200 GOOG Charlie 120 MSFT Amy 340 MSFT Vanessa 124 FB Carl 243 FB Sarah 350
df.groupby("Company").mean()
And then call aggregate methods off the object. we have to use any one aggregate methods
what are the aggregate methods used in group by
std(), min(), max(), count(), describe()
how to you transpose the dataframe
dataFrame.transpose()
what are the ways to combine dataframes together
Merging, Joining, and Concatenating
how to concatenate dataframes
Concatenation basically glues together DataFrames. Keep in mind that dimensions should
match along the axis you are concatenating on. You can use pd.concat and pass in a list of
DataFrames to concatenate together
pd.concat([dataframe1], [dataframe2], [dataframe3], ..)
pd.concat([dataframe1], [dataframe2], [dataframe3], .., axis=1)