Pandas Flashcards by Mahesh màŕťìáñ

Creating a Series

pd.series(list)

pd.Series(data=my_list,index=labels)

How well did you know this?

Not at all

Perfectly

Creating series along with the index

ser1 = pd.Series([1,2,3,4],index = [‘USA’, ‘Germany’,’USSR’, ‘Japan’])

ser1

USA 1
Germany 2
USSR 3
Japan 4

dtype: int64

How well did you know this?

Not at all

Perfectly

fetch elements in series

varname[index]

ser1[‘USA’]

How well did you know this?

Not at all

Perfectly

Create a DataFrame

pd.DataFrame(data, index= , columns = )

pd.DataFrame(randn(5,4),index=’A B C D E’.split(),columns=’W X Y Z’.split())

          W                X                     Y                    Z A     2.706850     0.628133     0.907969     0.503826 B     0.651118     -0.319318    -0.848077    0.605965 C     -2.018168     0.740122     0.528813    -0.589001 D     0.188695     -0.758872     -0.933237   0.955057 E     0.190794     1.978757       2.605967    0.683509

How well did you know this?

Not at all

Perfectly

fetching column from a dataframe

dataframe[col_name] –> df[‘W’]

dataframe[[col1,col2 ]] –> df[[‘W’,’Z’]]

How well did you know this?

Not at all

Perfectly

Creating a new column in a dataframe

df[‘new’] = df[‘W’] + df[‘Y’]

How well did you know this?

Not at all

Perfectly

how to remove a column

df.drop(‘col_name’,axis=1,inplace=True)

How well did you know this?

Not at all

Perfectly

how to remove a rows

df.drop(‘index’, axis=0)

How well did you know this?

Not at all

Perfectly

Selecting Rows from a DataFrame

df.loc[“index”]

df.loc[“label”]

How well did you know this?

Not at all

Perfectly

Selecting multiple Rows and column in a DataFrame

df.loc[row, col]

df.loc[ [row1,row2,… ] , [col1,col2,… ] ]

df.loc[[‘A’,’B’],[‘W’,’Y’]]

How well did you know this?

Not at all

Perfectly

Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar
to numpy:

dataFrame[condition]

df[df>0]

df[df[‘col’] >0]

How well did you know this?

Not at all

Perfectly

Selecting single and multiple columns

dataframe[ dataframe [“col_name”] < 5] [ ‘colname’]

df[df[‘W’]>0][‘Y’]

dataframe[ dataframe [“col_name”] < 5] [ [‘col1’,col2’] ]

df[df[‘W’]>0][[‘Y’,’X’]]

How well did you know this?

Not at all

Perfectly

Selecting single and multiple columns with multiple conditions

For two conditions you can use | and & with parenthesis:

dataframe[ (condition1) & (condition2)]

df[(df[‘W’]>0) & (df[‘Y’] > 1)]

How well did you know this?

Not at all

Perfectly

how to reset index

Reset to default 0,1…n index

dataframe.reset_index()

df.reset_index()

How well did you know this?

Not at all

Perfectly

how to set specific column as index in a dataframe

Here we can give column name or we can provide pandas series
Dataframe.set_index(“col_name”)

df.set_index(‘States’)

df.set_index(‘States’,inplace=True) to change in permanently

How well did you know this?

Not at all

Perfectly

how to create Multi-Index in a dataframe

Study These Flashcards

zip the 2 list convert to list of tuples

first create 2 list outside and inside index
outside = [‘G1’,’G1’,’G1’,’G2’,’G2’,’G2’]
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
#set multiindex from tuple
hier_index = pd.MultiIndex.from_tuples(hier_index)

print(pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=[‘A’,’B’]))

A B
G1 1 -0.419202 -0.516732
2 0.036076 -0.205218
3 0.148782 0.942314
G2 1 0.001280 -1.956888
2 1.069982 -2.220442
3 1.180214 0.924161

How to fetch elements from the Multi-Index dataframe

Study These Flashcards

For index hierarchy we use df.loc[], if this was on the
columns axis, you would just use normal bracket notation df[]. Calling one level of the index
returns the sub-dataframe:

df.loc[“G1”]

df.loc[‘G1’].loc[1]

how to drop missing values

Study These Flashcards

df.dropna() This deletes from row wise

df.dropna(axis=1) this deletes from column wise

df.dropna(thresh=2) removes rows having 2 missing values

how to replace null values

Study These Flashcards

df.fillna(value=’FILL VALUE’)

df[‘A’].fillna(value=df[‘A’].mean())

hello

Groupby

this

Study These Flashcards

data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
'Sales':[200,120,340,124,243,350]}

df.groupby('Company')

Now you can use the .groupby() method to group rows together based off of a column
name. For instance let’s group based off of Company. This will create a DataFrameGroupBy
object

Company Person Sales
GOOG   Sam 200 
GOOG   Charlie 120
MSFT   Amy 340
MSFT   Vanessa 124
FB   Carl 243
FB   Sarah 350

df.groupby("Company").mean()

And then call aggregate methods off the object. we have to use any one aggregate methods

what are the aggregate methods used in group by

Study These Flashcards

std(), min(), max(), count(), describe()

how to you transpose the dataframe

Study These Flashcards

dataFrame.transpose()

what are the ways to combine dataframes together

Study These Flashcards

Merging, Joining, and Concatenating

how to concatenate dataframes

Study These Flashcards

Concatenation basically glues together DataFrames. Keep in mind that dimensions should
match along the axis you are concatenating on. You can use pd.concat and pass in a list of
DataFrames to concatenate together

pd.concat([dataframe1], [dataframe2], [dataframe3], ..)

pd.concat([dataframe1], [dataframe2], [dataframe3], .., axis=1)

Merging multiple dataframes

The merge function allows you to merge DataFrames together using a similar logic as merging SQL Tables together. For example: pd.merge(left,right, how = "inner", on = "col_name")

``` left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'], 'key2': ['K0', 'K1', 'K0', 'K1'], 'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3']}) right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'], 'key2': ['K0', 'K0', 'K0', 'K0'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}) pd.merge(left, right, on=['key1', 'key2']) ``` Here above it merges side by side

how to fetch the unique values and its count

dataframe["col_name"].unique() --> fetching unique values dataframe["col_name"].nunique() --> fetching count of unique values df['col2'].value_counts() --> fetching values and count

how to Select from DataFrame using criteria from multiple columns

dataframe[(dataframe['col1']>2) & (dataframe['col2']==444)] df[(df['col1']>2) & (df['col2']==444)]

how to use apply functions in pandas

dataframe["col_name"].apply(function_name) every data in the column will apply this function df['col3'].apply(len) df['col3'].apply(lambda x: len(x))

how to Permanently Remove` a Column

del df['col_name']

how to Get column and index names

dataFrame.columns RangeIndex(start=0, stop=2, step=1) dataframe.index Index(['col2', 'col3'], dtype='object')

how to Sorting and Ordering a DataFrame

dataframe.sort_values(by = "colname") #inplace=False by default

how to Find Null Values or Check for Null Values

dataframe.isnull()

how to drop null rows

df.dropna()

how to Fill in Null values with something else

df.fillna("value")

how to create pivot table

data = {'A':['foo','foo','foo','bar','bar','bar'], 'B':['one','one','two','two','one','one'], 'C':['x','y','x','y','x','y'], 'D':[1,3,2,5,4,1]} df = pd.DataFrame(data) A B C D 0 foo one x 1 1 foo one y 3 2 foo two x 2 3 bar two y 5 4 bar one x 4 5 bar one y 1 df.pivot_table(values='D',index=['A', 'B'],columns=['C']) C x y A B bar one 4.0 1.0 two NaN 5.0 foo one 1.0 3.0 two 2.0 NaN

importing csv_file and exporting csv file in pandas

pd.read_csv(file_name) dataframe.to_csv(file_name, index=False)

importing excel_file and exporting excel file in pandas

Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or images, having images or macros may cause this read_excel method to crash. pd.read_excel('Excel_Sample.xlsx',sheetname='Sheet1') dataframe.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')

how to read html tables from the website

Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects: df = pd.read_html('website link')

Pandas Flashcards

(39 cards)