Python Pandas - Udemy Flashcards

Question 1

Q

What are Variables?

Answer

A

Placeholders

Question 2

Q

Are the terms list and array the same in python?

Answer

A

True! Yes!

Question 3

Q

What does len(df) return?

Answer

A

The number of elements in the list or array

Question 4

Q

What is a Dictionary?

Answer

A

A data type that stores keys and corresponding values.

A dictionary is represented by { }

Question 5

Q

What is a Series?

Answer

A

A series is a one dimensional labeled array

Question 6

Q

How do you convert a list to a series object?

Answer

A

pd.Series(list)

Question 7

Q

What is the difference between a list and series?

Answer

A

The index of a list can be only numeriv values and the index of a series can be abything you like it to be.

Question 8

Q

what does series.values give us?

Answer

A

All the values in the series as an array

Question 9

Q

What does series. index give us?

Answer

A

The index of the series

Question 10

Q

What do the

series. sum()
series. product()
series. mean()

return?

Answer

A

the

sum

product

mean

of the series

Question 11

Q

What does pd.read_csv(usecols=’abc’, squeeze=True) do?

Answer

A

It selects a single column ‘abc’ from a dataframe and converts it into a series.

Question 12

Q

What does x=df.head() or df.tail() do?

Answer

A

head() or tail() methods actually create a new series from the original dataframe so the variable ‘X’ will contain the new series

Question 13

Q

what does dir(s) do? ( where ‘s’ is a series)

Answer

A

gives you a list of attributes and methods available with that series.

Question 14

Q

what does sort ( series ) do?

Answer

A

sort all the values in the series in ascending order

Question 15

Q

what does

list(series)

dict(series)

do?

Answer

A

list(series) turns the series into a list

dict(series) turns the series into a dictionary

Question 16

Q

what does

series.is_unique

do?

Answer

A

returns True or False to show if all values in the series are unique

Question 17

Q

What does

series.sort_values do?

Answer

A

sorts the series in ascending order and returns a brand new series. You can also run it’s own methods on the newly returned series

eg. series.sort_values().head() will return the top 5 values of the newly created series.

Question 18

Q

What does the inplace=True parameter do?

Answer

A

makes changes to the series in place

Question 19

Q

What does the statement:

‘abc’ in series do?

Answer

A

returns a boolean value by checking for ‘abc’ in the index of the series. If you want to check for ‘abc’ in the values of the series you must use:

‘abc’ in series.values

Question 20

Q

what does

series[-30 : - 10]

return?

Answer

A

returns all the values from the -30 to the -10 position.

Question 21

Q

What is the difference between

len(series) and series.count() ?

Answer

A

len(series) returns the length of the series including the rows having nan values.

series.count() only returns a count of the rows that have values and excludes rows that have NANs

Question 22

Q

Good to rememember:

What are some of the mathematical functions available with series?

Answer

A

series. sum()
series. mean()
series. std()
series. median()
series. describe()

Question 23

Q

What does

series. idxmax()
series. idxmin()

retuen?

Answer

A

returns the index of the position that holds the min and max values in the series.

Nice way of using this is:

series[series.idxmax()]

will return the same value as

series.max()

Question 24

Q

What does the

series.values_counts()

do?

Answer

A

returns the number of times all the unique values occur.

series.value_counts().sum()

will retutn the lenght of the string same as len(series).

Good to remember the value_counts() has the ascending=True/False parameter

Question 25

Q

What does series.apply()

do?

Answer

A

series.apply() accepts a function as a parameter and then applies that function to all the values in the series.

eg:

series.apply( lambda stockprice : stockprice + 1)

Question 26

Q

What does the

series.map()

do?

Answer

A

performs a v lookup type function on 2 seperate series.

I need to explore this further

Question 27

Q

True or False:

The index labels in a panda Series must be unique

Question 28

Q

What are pandas DataFrames?

Answer

A

DataFrames are 2 dimensional array. What does 2 dimensions mean : it means you need 2 pieces of info. to access a particular value i.e row and column #

Question 29

Q

A csv file contains integer values but when you read it into a dataframe it shows up as a float….When?

Answer

A

If some of the values in the columns are NANs pandas DataFrames converts the entire column into Floats…reason not yet known

Question 30

Q

What does

df.info()

return?

Answer

A

Basic info about the dataframe as well as the number of non null values in each column.

Question 31

Q

What does

df.axes()

return?

Answer

A

returns the combined result of

df.index() and df.columns

Question 32

Q

df.sum(axis=1)

or

df.sum(axis=”columns”)

return the horizontal left to right total of a dataframe

Answer

A

what does

df.sum(axis=1)

return?

Question 33

Q

How to extract a single column ‘abc’ from a Dataframe df?

Answer

A

df [“abc”]

this command returns a series

Question 34

Q

How do you extract multiple columns from a DataFrame?

Answer

A

df [[“abc”,”def”] ]

or

select = [“abc”,”def”]

df [select]

both the above return the same resulting DataFrame

Question 35

Q

How do you insert a new column ‘Sport’ in a Dataframe?

Answer

A

df [“Sport”] = “ Basket Ball”

inserts the column Sport at the end of the DataFrame and populates all rows with the value ‘Basket Ball’

df.insert( 5, column = “Sport”, value= “Basket Ball”)

this inserts the ‘Sport’ column in the 5’th position with the value ‘Basket Ball’ in all rows

Question 36

Q

How do you add 20 to every value in the column ‘Salary’ of a dataframe?

Answer

A

df [“Salary”].add(20)

or

df [“Salary”] + 20

These are called Broadcast methods and can be used with all the other mathematical functions as well.

Question 37

Q

How do you use value_counts() with a DataFrame?

Answer

A

df [“abc”].value_counts().head().

Imp: value_counts() can only be used on series objects

Question 38

Q

How do you remove rows with null values in a DataFrame?

Answer

A

df.dropna()

by default this method will remove all rows even with a single nan value in the columns

Question 39

Q

How do you remove a row from a DataFrame only when a particular column has a Nan value?

Answer

A

df.dropna( subset= [“column name”, “column name 2”] )

Question 40

Q

How do you replace a nan value with a particular value in a column of a DataFrame?

Answer

A

df [” abc”].fillna( “Hello”, inplace=True)

This will fill all NaN values in the ‘abc’ column with the string “Hello”

Question 41

Q

How do your convert the ‘Salary’ column from float to intiger in a DataFrame?

Answer

A

df [“Salary”] .astype (“int”)

Must remember that all NaNs must be removed or replaced for this method to work
- There is no inplace parameter so you must assign value to a variable for the change to be permanent

Question 42

Q

How do you sort a dataframe?

Answer

A

An entire DataFrame can be sorted only by a particular column.

df.sort_values(“Salary”)

If the column has NaN values they will be at the end of the dataframe or will occupy the last position.

Question 43

Q

How do you do a sort on multiple columns in a dataframe?

Answer

A

df.sort_values( [” col 1” , “ col 2”], ascending = [True , False] )

This sorts the dataframe first based on col 1 and then col 2. Col 1 in ascending order and col 2 in descending order

Question 44

Q

How do you convert a string to a Date type?

Answer

A

df [“String_Date”] = pd.to_datetime (df [“String_Date”])

Question 45

Q

How do you convert a string type to a category type?

Answer

A

df[” Management”] = df [” Management”].astype(“Category”)

Question 46

Q

How do you filter a datframe so that only the columns where gender = ‘Male’ is returned as a dataframe?

Answer

A

df [df [“Gender”] == “ Male”]

or

filter = df [” Gender”] = “ Male”

df [filter]

Question 47

Q

How to filter a dataframe using more than one condition eg

Gender = Male

Team = Marketing

?

Answer

A

filter 1 = df [” Gender”] == ‘ Male’

filter 2 = df [” Team”] == ‘ Marketing’

df [filter 1 & filter 2]

Question 48

Q

Write code to filter a dataframe where Team = ‘ Legal’, ‘Marketing’ or ‘Sales’

Answer

A

filter = df [” Team “].isin( [” Legal” , “ Sales”, “ Marketing “] )

df [filter]

You can also pass a series into the isin() method

eg. df [” Team”].isin ( df2 [” Team”] )

Question 49

Q

What do the isnull() and notnull() methods do?

Answer

A

isnull() returns True if a given column is a NaN else False.

notnull() returns True if a given column is not a Nan else False.

Question 50

Q

Write code to filter Salary >= 60,000 and <= 70,000?

Answer

A

df [” Salary”].between( 60000, 70000)

or

x= df [” Salary”] > = 60000

y= df [” Salary”]

df [x & y]

Question 51

Q

What does the ~ symbol do?

Answer

A

It returns the reverse of a Boolean value.

i.e. True becomes False

False becomes True

Question 52

Q

What does df [” Name”].duplicated( ) return?

Answer

A

Returns the boolean value True for all duplicate values of the Name column except for the first occurance which returns False.

If the are 4 Toms it will returns 1 False and 3 Trues

Question 53

Q

Remove duplicate valued from dataframe where Name and Team are duplicates

Answer

A

df.drop_duplicates (subset= [” Name”, “ Team”],keep=False,inplace=True)

Question 54

Q

What do the unique( ) and nunique ( ) do?

Answer

A

unique () returns an array of unique values that will also count NaN as unique.

nunique( ) will return an inter of the count of unique values. This will not count the NaN as the parameter dropna=True is set by default

Question 55

Q

How do you set a particular column as the index of a dataframe?

Answer

A

df.set_index( “Col_name”)

to reverse change

df.reset_index()

Question 56

Q

What does the df.loc[] method do?

Answer

A

extract rows using index labels

Question 57

Q

Extract rows from a dataframe between index 18 and 35?

Answer

A

df.iloc [18 : 36]

note index 36 will not be returned in iloc[]

Question 58

Q

What is the df.ix[] method ?

Answer

A

It is a combination of the iloc[] and the loc[] methods. It accepts both string labels as well as integer indexes as arguments.

Note :

When using labels in ix[] and you specify a range or a list and one of the labels does not exist in the dataframe python returns a NaN value for the missing label.

BUT

When using index values in ix[] and you specify a range or a list and one of the indeces does not exist in the dataframe python returns an error value for the entire query.

Question 59

Q

How do you write a value to a given row and column in a dataframe using the ix[]?

Answer

A

df.ix[“James”, “Salary”] = 80000

This changes the James row and Salary column to 80000

Question 60

Q

filter = df [” Team”] == “Marketing”

df.ix [filter, “Team”] = “ Online Marketing”

What does this piece of code do?

Answer

A

Finds all instances where Team = Marketing and then replaces ‘ Marketing’ with ‘ Online Marketing’ in the dataframe

Question 61

Q

How do you change the name of columns in a dataframe?

Answer

A

df.rename ( { “ Team”:”Dept”, “ Salary”: “ Compensation”}, inplace=True)

The rename ( ) accepts a dictionary as a parameter.

Question 62

Q

What are the 3 methods to delete columns from a dataframe?

Answer

A

df.drop( “ Team”,inplace=True)

or

df.pop( “Team”)

This method removes “Team” from the dataframe and returns the column team as a series.

or

del df.Team

Question 63

Q

How do you extract 5 random rows from your dataset? Also how do you extract 25% of your data set randomly?

Answer

A

df.sample( n=5)

and

df.sample(frac =.25)

Question 64

Q

How to find the 5 highest values in the ‘Revenue’ column without using sort method?

Answer

A

df.nlargest(5,”Revenue”)

or

df [“Revenue”].nlargest(5)

The same syntax cane be used for the nsamllest() as well

Answer 64

A

all methods must be prefixed with the .str. name

eg

df [” Name”].str.len()

df [” Name”].str.upper()

df [” Name”].str.lower()

df [” Name”].str.title()

Answer 65

A

df [“Team”] = df [“Team”].str.replace( “ Mkt”, “ Marketing”)

Answer 66

A

df [“Name”].str.lower().str.contains(“john”)

returns all rows where Name contains ‘john’ irrespective of the position

df [“Name”].str.lower().str.startswith(“john”)

returns all rows where Name begins with ‘john’

df [“Name”].str.lower().str.endsswith(“john”)

returns all rows where Name endss with ‘john’

Answer 67

A

Removes spaces from left and right,left and right of a string

Answer 68

A

String methods are called in the same way on the index and columns as well.

eg.

df.index.str.upper()

and

df.columns.str.upper()

Answer 69

A

df [“Name”].str.split(“, “).str.get(0).value_counts().head()

Answer 70

A

df[“Name”].str.split(“,”).str.get(1).str.strip().str.split(“ “).str.get(0).value_counts().head(10)

Answer 71

A

the str.split( expand = True,n=2 )

has a parameter expand when set to True returns a dataframe

n determines the number of splits

Answer 72

A

x=df[“NM”].tolist()
y=df[“NM”].to_frame()

Answer 73

A

df.to_csv(“Tial and Error”,index=False,Columns=[“BRTH_YR”,”NM”])

index=False does not copy the index

Columns=[] allows you to copy only certain columns if you so desire

Answer 74

A

df= pd.read_excel(‘C:/Users/SHAWN/Desktop/Python Pandas/Data - Multiple Worksheets.xlsx’,sheetname=None).

The resulting output will be a dictionary.

Answer 75

A

df.set_index( [“Date”,”Country”],inplace=True)

OR

You can do it directly while importing the csv file like this

df= pd.read_csv(‘C:/Users/SHAWN/Desktop/bigmac.csv’,index_col= [“Date”, “Country”] )

Answer 76

A

df.index.get_level_values(0)

or

df.index.get_level_values(“Date”)

Answer 77

A

df.index.set_index( [“Day”, “Location”] )

Tip:

Assume you want the first index to stay the same but change the second level,then just pass the same index name in the arguments.

Answer 78

A

df.loc [( “ 2016-10-10”, “ China)]

for a multi index the .loc [] accepts a tupule as an argument

Answer 79

A

df.transpose()

Answer 80

A

df.swaplevel()

Answer 81

A

stack() takes the columns and stacks the columns as rows.

unstack()

unstacks the rows and makes them columns

Answer 82

A

Group=df.groupby( “Dept.”)

The groupby () creates a separate groupby object. Groupby by itself is meaning less until you call methods on it.

Answer 83

A

g1.size()

Answer 84

A

It returns the index value of all of the rows that fall within each group

Answer 85

A

G1= df.groupby(“Dept”)

G1.get_group(“Marketing”)

Answer 86

A

Returns sum of ‘Revenue’ column for N groups present in sectors
Returns Max of ‘Profits’ column for N groups present in sectors
Returns Min of ‘Profits’ column for N groups present in sectors
Returns average no of ‘Employees’ column for N groups present in sectors
- This is how you choose more than one column and return their sum.

Answer 87

A

sectors = df.groupby([“Sector”, “Industry”])

Answer 88

A

There are 2 ways to use the .agg() by :

Passing a dictionary as a parameter
Passing a list as a parameter

sectors.agg ({“Revenue” : [“sum”, “mean”],

“Profits” : “sum”,

“Employees” : “mean”})

and

sectors.agg([“size”, “sum”, “mean”])

Brainscape's Knowledge GenomeTM

Python Pandas - Udemy Flashcards

Brainscape's Knowledge Genome^TM