Lesson7 Numpy_Pandas analysis Flashcards by Emma Harvey

Create an array of 10 zeros and ensure they are integers.

np.zeros(10, dtype=’int’)

How well did you know this?

Not at all

Perfectly

Create a matrix with a predefined value of 5.45 with 3 rows and 5 cols.

np.full((3,5),5.45)

How well did you know this?

Not at all

Perfectly

Create an array of even space between 0 and 2. Do this for 5 numbers.

np.linspace(0, 2, 5)

How well did you know this?

Not at all

Perfectly

create a 3x3 array with random numbers (0-1) with a normal distribution. Specify that they have a mean 0 and standard deviation 1.

np.random.normal(0, 1, (3,3))

How well did you know this?

Not at all

Perfectly

Combine the following arrays x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]

x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]
np.concatenate([x, y,z])

How well did you know this?

Not at all

Perfectly

Concatenate the grid array twice grid = np.array([[1,2,3],[4,5,6]]).

grid = np.array([[1,2,3],[4,5,6]])
np.concatenate([grid,grid])

How well did you know this?

Not at all

Perfectly

Create a dataframe using a dictionary with the columns: Fruit and Items (the values list for items is 121,40,100,130,11] and the values for fruit Fruit’: [‘Peach’,’Apple’,’Pear’,’Plum’,’Kiwi’.

data = pd.DataFrame({‘Fruit’: [‘Peach’,’Apple’,’Pear’,’Plum’,’Kiwi’],
‘Items’:[121,40,100,130,11]})

How well did you know this?

Not at all

Perfectly

How do you get complete information on the dataset

data.info()

How well did you know this?

Not at all

Perfectly

Make a dataframe with the column name group, kg. Group values: ‘a’, ‘a’, ‘a’, ‘b’,’b’, ‘b’, ‘c’, ‘c’,’c’, kg values: 4, 3, 12, 6, 7.5, 8, 3, 5, 6

data = pd.DataFrame({‘group’:[‘a’, ‘a’, ‘a’, ‘b’,’b’, ‘b’, ‘c’, ‘c’,’c’],’kg’:[4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

How well did you know this?

Not at all

Perfectly

Sort the values in the data df by kg. Do this for ascending and change the original df.

data = pd.DataFrame({‘kg’: [‘a’,’a’,’a’,’b’,’b’,’b’,’c’,’c’,’c’], ‘kg values’: [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

data.sort_values(by=[‘kg’],ascending=True,inplace=True)

How well did you know this?

Not at all

Perfectly

Sort by multiple columns - do this for data. Sort group by ascending order and kg by descending order. Make sure you don’t modify the original dataset.

data.sort_values(by=[‘group’,’kg’],ascending=[True,False],inplace=False)

How well did you know this?

Not at all

Perfectly

data = pd.DataFrame({‘names’:[‘Mila’]3 + [‘Igor’]4, ‘Age’:[3,2,1,3,3,4,4]})

remove duplicates

data.drop_duplicates()

How well did you know this?

Not at all

Perfectly

Remove duplicate values from the name column

data = pd.DataFrame({‘names’:[‘Mila’]3 + [‘Igor’]4, ‘Age’:[3,2,1,3,3,4,4]})

data.drop_duplicates(subset=’names’)

How well did you know this?

Not at all

Perfectly

for the farm shop df (data) create a new column animal 2 that shows the result of the meat to animal. Ensure they are all lowercase.

data[‘animal’] = data[‘food’].map(str.lower).map(meat_to_animal)

How well did you know this?

Not at all

Perfectly

Remove animal 2 from dataset (series only).

data.drop(‘animal2’,axis=’columns’,inplace=True)

How well did you know this?

Not at all

Perfectly

Make a new series using assign

Study These Flashcards

data.assign(new_variable = data[‘kg’]*10)

Make a dataframe that has values 1-11, in a matrix of 3 rows and 4 columns. Use the index names

index=[‘London’, ‘Manchester’, ‘Brighton’],
columns=[‘one’, ‘two’, ‘three’, ‘four’])

Study These Flashcards

data = pd.DataFrame(np.arange(12).reshape((3, 4)),
index=[‘London’, ‘Manchester’, ‘Brighton’],
columns=[‘one’, ‘two’, ‘three’, ‘four’])

Rename Manchester to Cardiff and in the columns one to one_p and two to two_p for the dataframe data. Make sure to change the original df.

Study These Flashcards

data.rename(index = {‘Manchester’:’Cardiff’}, columns={‘one’:’one_p’,’two’:’two_p’},inplace=True)

convert the index to capital letters and columns to title.

Study These Flashcards

data.rename(index = str.upper, columns=str.title,inplace=True)

Create categories for this variable ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]. Use the bins bins = [18, 25, 35, 60, 100]

Study These Flashcards

categories = pd.cut(ages, bins)

Include the left bin value

Study These Flashcards

pd.cut(ages,bins,right=False)

See how many observations (the frequency or count of observations that belong to each bin) fall under each bin. Do this for the categories variable.

Study These Flashcards

pd.value_counts(categories)

Add unique name to each category then check how many observations fall under each bin. bin_names = [‘Youth’, ‘Early 20s’, ‘Middle Age’, ‘Senior’]

Study These Flashcards

bin_names = [‘Youth’, ‘Early 20s’, ‘Middle Age’, ‘Senior’]
new_cats = pd.cut(ages, bins,labels=bin_names)

pd.value_counts(new_cats)

Create a df date starting from 20210701 with a length of 7 periods. Then create a pandas DataFrame with 7 rows and 4 columns, with random values generated from a normal distribution the row index is set to the ‘dates’ variable created above and the columns are labeled ‘A’, ‘B’, ‘C’, and ‘D’

Study These Flashcards

dates = pd.date_range(‘20210701’,periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list(‘ABCD’))
df

Get the first 3 rows from the df

df[:3]

Slice df based on date range 20210703 to 20210705

df['20210703':'20210705']

Slice df on the column names A and B

df.loc[:,['A','B']]

Slice df based on the dates 20210703 to 20210705 and the column names A and B.

df.loc['20210701':'20210705',['A','B']]

Slice the df based on the second index of row

df.iloc[2]

Return a specific range of rows based on index. Return the rows 2-4 for the first two columns.

df.iloc[2:4, 0:2]

Return specific rows (second and sixth row) and columns (first and third) using lists containing columns or row indexes.

df.iloc[[1,5],[0,2]]

Copy the dataframe df and add a new column E. Name it df2.

df2 = df.copy() df2['E']=['one', 'one','two','three','four','three','two']

Select rows based on column values. Select anything from column E that are in the rows that contain two or four. Use df2.

df2[df2['E'].isin(['two','four'])]

select all rows in column E except those with two and four. Use the df df2.

df2[~df2['E'].isin(['two','four'])]

Make a series which has random integers from range 1-10 with total of 40 numbers. Then make a dataframe using this series and change it to 8 rows and 5 columns.

ser = pd.Series(np.random.randint(1, 10, 40)) df = pd.DataFrame(ser.values.reshape(8,5))

Create a dataframe of two column headings called name and age where the values for the names and ages are: names = ['Alice', 'Bob', 'Charlie'] ages = [25, 30, 35]

names = ['Alice', 'Bob', 'Charlie'] ages = [25, 30, 35] Create DataFrame df = pd.DataFrame({'Name': names, 'Age': ages})

Lesson7 Numpy_Pandas analysis Flashcards

(36 cards)