Pandas Flashcards

(39 cards)

1
Q

Creating a Series

A

pd.series(list)

pd.Series(data=my_list,index=labels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Creating series along with the index

A

ser1 = pd.Series([1,2,3,4],index = [‘USA’, ‘Germany’,’USSR’, ‘Japan’])

ser1

USA 1
Germany 2
USSR 3
Japan 4

dtype: int64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

fetch elements in series

A

varname[index]

ser1[‘USA’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Create a DataFrame

A

pd.DataFrame(data, index= , columns = )

pd.DataFrame(randn(5,4),index=’A B C D E’.split(),columns=’W X Y Z’.split())

          W                X                     Y                    Z A     2.706850     0.628133     0.907969     0.503826 B     0.651118     -0.319318    -0.848077    0.605965 C     -2.018168     0.740122     0.528813    -0.589001 D     0.188695     -0.758872     -0.933237   0.955057 E     0.190794     1.978757       2.605967    0.683509
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

fetching column from a dataframe

A

dataframe[col_name] –> df[‘W’]

dataframe[[col1,col2 ]] –> df[[‘W’,’Z’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Creating a new column in a dataframe

A

df[‘new’] = df[‘W’] + df[‘Y’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to remove a column

A

df.drop(‘col_name’,axis=1,inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to remove a rows

A

df.drop(‘index’, axis=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Selecting Rows from a DataFrame

A

df.loc[“index”]

df.loc[“label”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Selecting multiple Rows and column in a DataFrame

A

df.loc[row, col]

df.loc[ [row1,row2,… ] , [col1,col2,… ] ]

df.loc[[‘A’,’B’],[‘W’,’Y’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Conditional Selection

A

An important feature of pandas is conditional selection using bracket notation, very similar
to numpy:

dataFrame[condition]

df[df>0]

df[df[‘col’] >0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Selecting single and multiple columns

A

dataframe[ dataframe [“col_name”] < 5] [ ‘colname’]

df[df[‘W’]>0][‘Y’]

dataframe[ dataframe [“col_name”] < 5] [ [‘col1’,col2’] ]

df[df[‘W’]>0][[‘Y’,’X’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Selecting single and multiple columns with multiple conditions

A

For two conditions you can use | and & with parenthesis:

dataframe[ (condition1) & (condition2)]

df[(df[‘W’]>0) & (df[‘Y’] > 1)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to reset index

A

Reset to default 0,1…n index

dataframe.reset_index()

df.reset_index()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to set specific column as index in a dataframe

A

Here we can give column name or we can provide pandas series
Dataframe.set_index(“col_name”)

df.set_index(‘States’)

df.set_index(‘States’,inplace=True) to change in permanently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how to create Multi-Index in a dataframe

A

zip the 2 list convert to list of tuples

first create 2 list outside and inside index
outside = [‘G1’,’G1’,’G1’,’G2’,’G2’,’G2’]
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
#set multiindex from tuple
hier_index = pd.MultiIndex.from_tuples(hier_index)

print(pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=[‘A’,’B’]))

A B
G1 1 -0.419202 -0.516732
2 0.036076 -0.205218
3 0.148782 0.942314
G2 1 0.001280 -1.956888
2 1.069982 -2.220442
3 1.180214 0.924161

17
Q

How to fetch elements from the Multi-Index dataframe

A

For index hierarchy we use df.loc[], if this was on the
columns axis, you would just use normal bracket notation df[]. Calling one level of the index
returns the sub-dataframe:

df.loc[“G1”]

df.loc[‘G1’].loc[1]

18
Q

how to drop missing values

A

df.dropna() This deletes from row wise

df.dropna(axis=1) this deletes from column wise

df.dropna(thresh=2) removes rows having 2 missing values

19
Q

how to replace null values

A

df.fillna(value=’FILL VALUE’)

df[‘A’].fillna(value=df[‘A’].mean())

20
Q

hello

Groupby

ok

this

A
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
'Sales':[200,120,340,124,243,350]}

df.groupby('Company')

Now you can use the .groupby() method to group rows together based off of a column
name. For instance let’s group based off of Company. This will create a DataFrameGroupBy
object

Company Person Sales
GOOG   Sam 200 
GOOG   Charlie 120
MSFT   Amy 340
MSFT   Vanessa 124
FB   Carl 243
FB   Sarah 350

df.groupby("Company").mean()

And then call aggregate methods off the object. we have to use any one aggregate methods

21
Q

what are the aggregate methods used in group by

A

std(), min(), max(), count(), describe()

22
Q

how to you transpose the dataframe

A

dataFrame.transpose()

23
Q

what are the ways to combine dataframes together

A

Merging, Joining, and Concatenating

24
Q

how to concatenate dataframes

A

Concatenation basically glues together DataFrames. Keep in mind that dimensions should
match along the axis you are concatenating on. You can use pd.concat and pass in a list of
DataFrames to concatenate together

pd.concat([dataframe1], [dataframe2], [dataframe3], ..)

pd.concat([dataframe1], [dataframe2], [dataframe3], .., axis=1)

25
Merging multiple dataframes
The merge function allows you to merge DataFrames together using a similar logic as merging SQL Tables together. For example: pd.merge(left,right, how = "inner", on = "col_name")
26
``` left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'], 'key2': ['K0', 'K1', 'K0', 'K1'], 'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3']}) right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'], 'key2': ['K0', 'K0', 'K0', 'K0'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}) pd.merge(left, right, on=['key1', 'key2']) ``` Here above it merges side by side
27
how to fetch the unique values and its count
dataframe["col_name"].unique() --> fetching unique values dataframe["col_name"].nunique() --> fetching count of unique values df['col2'].value_counts() --> fetching values and count
28
how to Select from DataFrame using criteria from multiple columns
dataframe[(dataframe['col1']>2) & (dataframe['col2']==444)] df[(df['col1']>2) & (df['col2']==444)]
29
how to use apply functions in pandas
dataframe["col_name"].apply(function_name) every data in the column will apply this function df['col3'].apply(len) df['col3'].apply(lambda x: len(x))
30
how to Permanently Remove` a Column
del df['col_name']
31
how to Get column and index names
dataFrame.columns RangeIndex(start=0, stop=2, step=1) dataframe.index Index(['col2', 'col3'], dtype='object')
32
how to Sorting and Ordering a DataFrame
dataframe.sort_values(by = "colname") #inplace=False by default
33
how to Find Null Values or Check for Null Values
dataframe.isnull()
34
how to drop null rows
df.dropna()
35
how to Fill in Null values with something else
df.fillna("value")
36
how to create pivot table
data = {'A':['foo','foo','foo','bar','bar','bar'], 'B':['one','one','two','two','one','one'], 'C':['x','y','x','y','x','y'], 'D':[1,3,2,5,4,1]} df = pd.DataFrame(data) A B C D 0 foo one x 1 1 foo one y 3 2 foo two x 2 3 bar two y 5 4 bar one x 4 5 bar one y 1 df.pivot_table(values='D',index=['A', 'B'],columns=['C']) C x y A B bar one 4.0 1.0 two NaN 5.0 foo one 1.0 3.0 two 2.0 NaN
37
importing csv_file and exporting csv file in pandas
pd.read_csv(file_name) dataframe.to_csv(file_name, index=False)
38
importing excel_file and exporting excel file in pandas
Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or images, having images or macros may cause this read_excel method to crash. pd.read_excel('Excel_Sample.xlsx',sheetname='Sheet1') dataframe.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
39
how to read html tables from the website
Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects: df = pd.read_html('website link')