Merging Dataframes / Datacamp(1,2,3 not project) Flashcards

1
Q

Use a loop to store these files in a list as data frames and unpacked the list into variables called gold,silver, bronze

Gold.csv
Silver.csv
Bronze.csv

A

import pandas as pd
filenames = [‘Gold.csv’, ‘Silver.csv’, ‘Bronze.csv’]
dataframes = []
for filename in filenames:
dataframes.append(pd.read_csv(filename))

gold,silver,bronze=dataframes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Make a copy of the df gold and call it medals

A

medals = gold.copy()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Rename columns of df as Code, Country, Total

A B C
0 USA United States 2088.0
1 URS Soviet Union 838.0

A
new_labels = ['NOC', 'Country', 'Gold']
medals.columns = new_labels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Add columns ‘Total’ from data frames Silver and Bronze into dataframe gold

A
medals['Silver'] = silver['Total']
medals['Bronze'] = bronze['Total']
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Change order of the dataframe

0 1
2013-12-31 -0.31 0.83
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85

0 1
2015-12-31 -0.53 -0.85
2014-12-31 -0.63 -0.19
2013-12-31 -0.31 0.83

A

df.sort_index(ascending=False,inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Change order of the dataframe

0 1
2013-12-31 -0.31 0.83
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85

#Outcome:
 0 1
2014-12-31 -0.63 -0.19
2015-12-31 -0.53 -0.85
2013-12-31 -0.31 0.83
A

df.sort_values(0, ascending=True, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fill forward NaN

 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 NaN 4.0 8 2.0
3 4.0 3.0 5 NaN
 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 3.0 4.0 8 2.0
3 4.0 3.0 5 2.0
A

df.ffill(axis = 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fill forward NaN

 A B C D
0 5.0 NaN 4 5.0
1 3.0 2.0 3 4.0
2 NaN 4.0 8 2.0
3 4.0 3.0 5 NaN
A B C D
0 5.0 5.0 4.0 5.0
1 3.0 2.0 3.0 4.0
2 NaN 4.0 8.0 2.0
3 4.0 3.0 5.0 5.0
A

df.ffill(axis = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Outcome after reindexing df

a b c
A 0.338244 0.548040 0.898135
B 0.709872 0.965031 0.521506
C 0.605053 0.340734 0.139700

df1.reindex([‘Ro’, ‘A’, ‘B’, ‘Ri’,])

A
a b c
Ro NaN NaN NaN
A 0.187632 0.587335 0.141530
B 0.900119 0.984410 0.648169
Ri NaN NaN NaN
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Transform the first dataframe into the second one using a list called new_indexes

a b c
A 0.800152 0.485063 0.607966
B 0.509770 0.856334 0.820924
C 0.249211 0.675843 0.096703

 a b c
Ro missing missing missing
A 0.800152 0.485063 0.607966
B 0.50977 0.856334 0.820924
Ri missing missing missing
A

new_index=[‘Ro’, ‘A’, ‘B’, ‘Ri’,]

print(df1.reindex(new_index, axis =’rows’, fill_value =’missing’))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Add these two dataframes and guess the output

a b c
A 0.8 0.4 0.3
B 0.3 0.9 0.2
C 0.9 0.9 0.4

a b c
Z 0.4 0.5 0.3
B 0.1 0.4 0.4
C 0.4 0.7 0.7

A

df0+df1

print(df0+df1)

aN NaN NaN
B 0.4 1.3 0.6
C 1.3 1.6 1.1
Z NaN NaN NaN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the output of multiplying 10 with this dataframe?

ax bx cx
Z 0.1 0.7 0.9
B 0.8 0.6 0.1
C 0.1 0.5 0.1

A

ax bx cx
Z 1.0 7.0 9.0
B 8.0 6.0 1.0
C 1.0 5.0 1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Resample into year and get the mean

 0 1
2013-01-31 -2.10 -1.73
2013-02-28 -1.15 -0.74
2013-03-31 -0.51 1.65
2013-04-30 -1.10 -2.13
A

yearly = df.resample(‘A’).mean()
yearly

0 1
2013-12-31 -1.215 -0.7375

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Percentage growth of ‘c’

a b c
A 0.5 1.0 0.4
B 0.2 0.8 0.5
C 0.8 0.5 0.6

A

df_1=df[‘growth_c%’] = df0[‘c’].pct_change()*100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Make sure the dataframe gets the dates

sp500 = pd.read_csv(‘sp500.csv’,index_col=’Date’,…)

A

sp500 = pd.read_csv(‘sp500.csv’,index_col=’Date’, parse_dates=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Convert the sp500 into GBP

#df_sp500 in USD
 Open Close
Date 
2015-01-02 2058.899902 2058.199951
2015-01-05 2054.439941 2020.579956
2015-01-06 2022.150024 2002.609985
#df_exchange GBP/USD
Date 
2015-01-02 0.65101
2015-01-05 0.65644
2015-01-06 0.65896
A

pounds = df_sp500.multiply(df_exchange[‘GBP/USD’], axis=’rows’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Append df_silver and df_bronze into a df called combined and print United States result

#df_silver
Country
United States 1052.0
Soviet Union 584.0
United Kingdom 505.0
#df_bronze
Country
United States 1195.0
Soviet Union 627.0
United Kingdom 591.0
A

combined=bronze.append(silver)

combined.loc[‘United States’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Append the Series Units for both df into a series called q1 and add all the units sold

#df_jan
Date 
2015-01-21 Streeplex Hardware 11
2015-01-09 Streeplex Service 8
2015-01-06 Initech Hardware 17
#df_feb
 Company Product Units
Date 
2015-02-26 Streeplex Service 4
2015-02-16 Hooli Software 10
2015-02-03 Initech Software 13
A

jan_units = jan[‘Units’]
feb_units = feb[‘Units’]
q1=jan_units.append(feb_units)
q1.sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Use a loop to concatenate data frames (jan,feb, mar) by its Units into a dataframe called quarter1

 Company Product Units
Date 
2015-02-26 Streeplex Service 4
2015-02-16 Hooli Software 10
2015-02-03 Initech Software 13
A

units = []
for month in [jan, feb, mar]:
units.append(month[‘Units’])
quarter1 = pd.concat(units, axis=’rows’)

20
Q

Concatenate both data frames

df1
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
df2
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
A

Import pandas as pd

pd.concat([df1, df2])

21
Q

difference between append and concat method in pandas

A

Append combines the rows of one dataframe to another dataframe. To be honest, I don’t use this method ever.
Concat can take a group of 2+ dataframes and combines the dataframes via the rows or columns.

22
Q

Outcome of:

df1.append(df2)

#df1
a b
0 1 5
1 2 6
2 3 7
#df2
a b
0 9 4
1 6 2
2 3 10
A
a b
0 1 5
1 2 6
2 3 7
3 4 8
0 9 4
1 6 2
2 3 10
23
Q

Append df1 and df2 and make the index to be a sequence of integers starting from 0

A

df1.append(df2, ignore_index = True)

24
Q

Concatenate horizontally df1 and df2

A

pd.concat([df1,df2],axis=1)

25
Q

Concatenate vertically df1 and df2, and make the index to be a sequence of integers starting from 0

A

pd.concat([df1,df2],axis=0,ignore_index=True)

26
Q

Use a loop to read csv files named [gold_top5.csv, silver.., bronze] . Create a df with 4 columns (Country, medals) and make the Country the index. Use append method

#Outcome:
 bronze silver gold
France 475.0 461.0 NaN
Germany 454.0 NaN 407.0
Italy NaN 394.0 460.0
A

medals_type=[‘bronze’, ‘silver’, ‘gold’]
medals =[]

for medal in medal_types:
 file_name = "%s_top5.csv" % medal
 columns = ['Country', medal]
 medal_df = pd.read_csv(file_name, index_col='Country', names=columns)
 medals.append(medal_df)

medals_df = pd.concat(medals, axis=’columns’)
print(medals_df.head(3)

27
Q
#Concatenating vertically to get MultiIndexed rows
#Concatenate a list of dataframes
A

To do

medals = pd.concat(medals,keys=[‘bronze’, ‘silver’, ‘gold’])

28
Q
#Code to get the given outcome:
df=medals
Total
 Country 
bronze United States 1052.0
 Soviet Union 584.0
 United Kingdom 505.0
… 
gold United Kingdom 498.0
 Italy 460.0
 Germany 407.0
#Outcome:
 Total
 Country 
bronze France 475.0
silver France 461.0
bronze Germany 454.0
gold Germany 407.0
A

medals.sort_index(level=1)

29
Q

Slice index to get medals from UK

 Country 
bronze United States 1052.0
 Soviet Union 584.0
 United Kingdom 505.0
...
A

idx = pd.IndexSlice

medals_sorted.loc[idx[:,’United Kingdom’],:]

#Outcome:
 Total
 Country 
 bronze United Kingdom 505.0
 gold United Kingdom 498.0
 silver United Kingdom 591.0
30
Q

Get silver medals

Country
bronze United States 1052.0
Soviet Union 584.0
United Kingdom 505.0

A

medals_sorted.loc[‘silver’]

Outcome:
 Total
Country 
France 461.0
Italy 394.0
Soviet Union 627.0
United Kingdom 591.0
United States 1195.0
31
Q

You have 3 dataframes in a dataframes list. Concatenate horizontally using their product a hierarchical column that contains the rest of the information

#df1
 Company Product Units
Date 
2015-02-04 Acme Coporation Hardware 14
2015-02-07 Acme Coporation Hardware 1
2015-02-19 Mediacore Hardware 16
#Outcome:
 Hardware Software Service 
 Company Product Units …….
Date 
2015-02-02 Hooli Software 3 …..
A

pd.concat(dataframes, keys=[‘Hardware’,’Software’,’Service’], axis=1)

32
Q

In a multi-indexed column dataframe with Hardware’,’Software’,’Service as top columns and Company Product Units second level coluumns, get Company data for 2015-2-2 to 2015-2-8

 Hardware Software Service
 Company Company Company
Date 
2015-02-02 08:33:01 NaN Hooli NaN
2015-02-02 20:54:49 Mediacore NaN NaN
A

idx = pd.IndexSlice

february.loc[‘2015-2-2’:’2015-2-8’, idx[:,’Company’]]

33
Q

Use the list month_list to populate a dict the value as df. Grouped by Company. Sum its values. Then concatenate the dictionary into a dataframe called sales

month_list = [(‘january’, jan), (‘february’, feb), (‘march’, mar)]

A

Outcome:

month_dict = {}
for month_name, month_data in month_list:
month_dict[month_name] = month_data.groupby(‘Company’).sum()

sales = pd.concat(month_dict)

 Units
 Company 
february Acme Coporation 34
 Hooli 30
 Initech 30
 Mediacore 45
34
Q

Inner join of three similar dataframes (gold, silver, bronze) and visualize more or less the final dataframe

#bronze
 Total
Country 
United States 1052.0
Soviet Union 584.0
United Kingdom 505.0
A

medals=pd.concat(medal_list,keys=[‘bronze’, ‘silver’, ‘gold’],axis=1,join=’inner’)

#Outcome
 bronze silver gold
 Total Total Total
Country 
United States 1052.0 1195.0 2088.0
Soviet Union 584.0 627.0 838.0
35
Q

What does inner join do?

A

keep only rows that share common index labels.

36
Q

Change Date to annually using the last value of each year, chain a percentage growth and delete null values

 China
Year 
1961-01-01 49.557050
1962-01-01 46.685179
1963-01-01 50.097303
A

Outcome:

china_annual = china.resample(‘A’).last().pct_change(10).dropna()

China
Year
1971-12-31 0.988860
1972-12-31 1.402472

37
Q

Use inner join with two dataframes called china, us with similar structure

#Outcome
 China US
Year 
1971-12-31 0.988860 1.052270
1972-12-31 1.402472 1.172566
A

pd.concat([china_annual, us_annual], axis=1, join=’inner’)

38
Q

Merge both df. How many rows will have the new df?

#df_1
 city revenue
0 Austin 100
1 Denver 83
2 Springfield 4
#df_2
 city manager
0 Austin Charlers
1 Denver Joel
2 Mendocino Brett
A

combined = pd.merge(revenue, managers, on=’city’)

Two lines

39
Q

Merge both using cities. What will be the structure of the new dataframe?

city branch_id state revenue
0 Austin 10 TX 100
1 Denver 20 CO 83

branch branch_id state manager
0 Austin 10 TX Charlers
1 Denver 20 CO Joel

A

combined = pd.merge(revenue,managers,left_on=’city’,right_on=’branch’)

Columns of the new df=city branch_id_x state_x revenue branch branch_id_y state_y manager

40
Q

Outcome of:

pd.merge(revenue, managers, on=’city’)

city branch_id state revenue
0 Austin 10 TX 100
1 Denver 20 CO 83

branch branch_id state manager
0 Austin 10 TX Charlers
1 Denver 20 CO Joel

A
Traceback (most recent call last):
 ...  ...
 pd.merge(revenue, managers, on='city')
 ...  ...
KeyError: 'city'
41
Q

Join both dataframes df1,df2 and visualize Outcome:

A B
K0 A0 B0
K1 A1 B1
K2 A2 B2

C D
K0 C0 D0
K1 C1 D1
K2 C2 D2

A

left.join(right)

A B C D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1
K2 A2 B2 C2 D2

42
Q

Join using suffix and prefix. Visualize outcome

#revenue
 city state revenue
branch_id 
10 Austin TX 100
20 Denver CO 83
30 Springfield IL 4
#managers
 branch state manager
branch_id 
10 Austin TX Charlers
20 Denver CO Joel
47 Mendocino CA Brett
A

revenue.join(managers,lsuffix=’_rev’,rsuffix=’_mng’,how=’outer’)

 city state_rev revenue branch state_mng manager
branch_id 
10 Austin TX 100.0 Austin TX Charlers
20 Denver CO 83.0 Denver CO Joel
30 Springfield IL 4.0 NaN NaN NaN
31 NaN NaN NaN Springfield MO Sally
43
Q

What is the difference between merge and join in pandas?

A

join() methods as a convenient way to access the capabilities of pandas. … join(df2) always joins via the index of df2 , but df1. merge(df2) can join to one or more columns of df2 (default) or to the index of df2 (with right_index=True ). lookup on left table: by default, df1

44
Q

What does left join do?

A

Can add suffixes

Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.

pd.merge(df_a, df_b, on=’subject_id’, how=’left’)

pd.merge(df_a, df_b, on=’subject_id’, how=’left’, suffixes=(‘_left’, ‘_right’)

45
Q

What does merge_ordered do?

A

Perform merge with optional filling/interpolation.

Designed for ordered data like time series data