Pandas Data Frame Flashcards

1
Q

import pandas as pd
purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df.head()

A

The heart of pandas library is DataFrame. Works on 2d arrays like excel table.

       Cost   	Item Purchased 	Name Store 1 	22.5 	Dog Food 	        Chris Store 1 	2.5 	        Kitty Litter 	        Kevyn Store 2 	5.0     	Bird Seed 	        Vinod
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df.loc[‘Store 2’]

A

give me all info under index Store 2

Cost 5
Item Purchased Bird Seed
Name Vinod
Name: Store 2, dtype: object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

type(df.loc[‘Store 2’])

A

type is a series when we use loc

pandas.core.series.Series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df.loc[‘Store 1’]

A

returns columns indexed to ‘Store 1’, the first two purchase series.

        Cost 	Item Purchased 	Name Store 1 	22.5 	Dog Food 	       Chris Store 1 	2.5 	        Kitty Litter 	       Kevyn
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df.loc[‘Store 1’, ‘Cost’]

A

if we only want the ‘Cost’ column for Store 1.

Store 1 22.5
Store 1 2.5
Name: Cost, dtype: float64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df.T

A

swap all of the columns with rows.

     Store 1 	Store 1        	Store 2 Cost 	22.5 	2.5 	5 Item Purchased 	Dog Food 	Kitty Litter 	Bird Seed Name 	Chris 	Kevyn 	Vinod
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df.T.loc[‘Cost’]

A

iloc and loc are used for row selection. That’s why we swapped them.

Store 1 22.5
Store 1 2.5
Store 2 5
Name: Cost, dtype: object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df[‘Cost’]

A

columns always have a label in pandas databases.

Store 1 22.5
Store 1 2.5
Store 2 5.0
Name: Cost, dtype: float64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df.loc[:,[‘Name’, ‘Cost’]]

A

Give me all ‘Name’ and ‘Cost values for all stores. Can use list or String.

.loc also supports slicing. ‘:’ returns all values. Could use ‘Store 1’ instead

Name Cost

Store 1 Chris 22.5
Store 1 Kevyn 2.5
Store 2 Vinod 5.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

purchase_1 = pd.Series({‘Name’: ‘Chris’,
‘Item Purchased’: ‘Dog Food’,
‘Cost’: 22.50})
purchase_2 = pd.Series({‘Name’: ‘Kevyn’,
‘Item Purchased’: ‘Kitty Litter’,
‘Cost’: 2.50})
purchase_3 = pd.Series({‘Name’: ‘Vinod’,
‘Item Purchased’: ‘Bird Seed’,
‘Cost’: 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=[‘Store 1’, ‘Store 1’, ‘Store 2’])

df.drop(‘Store 1’)

A

delete copy of ‘Store 1’ from data frame. Original data frame is still intact.

drop has 2 optional params. If inplace param True, it will change original dataframe.

Second parameter is axis which can be dropped. 0 is default for row. 1 switches to column.

copy_df.drop(labels, axis=0, level=None, inplace=False, errors=’raise’)

        Cost 	Item Purchased 	Name Store 2 	5.0 	               Bird Seed 	Vinod
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

df.drop(‘Store 1’)

print(df)

A

Full data frame is still in tact.

        Cost 	Item Purchased 	Name Store 1 	22.5 	Dog Food 	Chris Store 1 	2.5 	        Kitty Litter 	Kevyn Store 2 	5.0 	        Bird Seed 	Vinod
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

del copy_df[‘Name’]

A

easier way of dropping a column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

df[‘Location’] = None

df

A

creates a new column and broadcasts default value to new column.

df[‘Location’] = None # creates a new column and broadcasts default value to new column.

df

       Cost 	Item Purchased 	Name 	Location Store 1 	22.5 	Dog Food 	Chris 	None Store 1 	2.5 	       Kitty Litter 	Kevyn 	None Store 2 	5.0 	       Bird Seed 	Vinod 	None
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

costs = df[‘Cost’]

costs

A

gives me same results as df[‘Costs’]

Store 1 22.5
Store 1 2.5
Store 2 5.0
Name: Cost, dtype: float64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

costs = df[‘Cost’]

costs + 2

A

Before:
Store 1 22.5
Store 1 2.5
Store 2 5.0

After:
Store 1    24.5
Store 1     4.5
Store 2     7.0
Name: Cost, dtype: float64
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

df = pd.read_csv(‘olympics.csv’)

A

reads csv file as is and stores it in data frame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

df = pd.read_csv(‘olympics.csv’, index_col = 0, skiprows=1)

A

The code reads the csv file in.

skiprows skips the top row of labels so it changes column names.

Params indicate which column should be index & skip first row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

df.columns

A

pandas stores all columns in this attribute.

Index([‘№ Summer’, ‘01 !’, ‘02 !’, ‘03 !’, ‘Total’, ‘№ Winter’, ‘01 !.1’, ‘02 !.1’, ‘03 !.1’, ‘Total.1’, ‘№ Games’, ‘01 !.2’, ‘02 !.2’, ‘03 !.2’, ‘Combined total’], dtype=’object’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

for col in df.columns:
if col[:2]==’01’:
df.rename(columns={col:’Gold’ + col[4:]}, inplace=True)
if col[:2]==’02’:
df.rename(columns={col:’Silver’ + col[4:]}, inplace=True)
if col[:2]==’03’:
df.rename(columns={col:’Bronze’ + col[4:]}, inplace=True)
if col[:1]==’№’:
df.rename(columns={col:’#’ + col[1:]}, inplace=True)

A

renaming the column names so they are clearer.

if the first two numbers in an element are equal.

inplace tells pandas to update df directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

df[‘Feedback’] = [‘Positive’, None, ‘Negative’]

df

A

Need to make sure there are enough values.

Create a column w/ the list values in each row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

adf = df.reset_index()
adf[‘Date’] = pd.Series({0: ‘December 1’, 2: ‘mid-May’})
adf

A

It’s better to make sure there are key values so we don’t have to type None values.

If you have indexes when adding a list of values to a new column, NaN will be added to rows that are not referenced in dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

pd.merge(staff_df, student_df, how=’outer’, left_index=True, right_index=True)

A
# we want to do an outer join
# if we want to merge based on a column name and use that column as the index, (left_index = True, right_on = "Column Name")
Role	School
Name		
James	Grader	Business
Kelly	Director of HR	NaN
Mike	NaN	Law
Sally	Course liasion	Engineering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

pd.merge(staff_df, student_df, how=’inner’, left_index=True, right_index=True)

A

we want to do an inner join.

Role School
Name
James Grader Business
Sally Course liasion Engineering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

pd.merge(staff_df, student_df, how=’left’, left_index=True, right_index=True)

A

we want all staff but we also want their student info if the’re a student (left join)

	Role	School
Name		
Kelly	Director of HR	NaN
Sally	Course liasion	Engineering
James	Grader	Business
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

pd.merge(staff_df, student_df, how=’right’, left_index=True, right_index=True)

A

we want all student but we also want their staff info if the’re a staff (right join)

Role	School
Name		
James	Grader	Business
Mike	NaN	Law
Sally	Course liasion	Engineering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

staff_df = staff_df.reset_index()
student_df = student_df.reset_index()
pd.merge(staff_df, student_df, how=’left’, left_on=’Name’, right_on=’Name’)

A

you can use columns instead of indices to join on.

Name Role School
0 Kelly Director of HR NaN
1 Sally Course liasion Engineering
2 James Grader Business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

pd.merge(staff_df, student_df, how=’left’, left_on=’Name’, right_on=’Name’)

A

demonstrates what happens when we have a conflict in data. Adds _x for left column and _y for right column.

28
Q

pd.merge(staff_df, student_df, how=’inner’, left_on=[‘First Name’,’Last Name’], right_on=[‘First Name’,’Last Name’])

A
# many staff and students might having matching first names, but not matching last names.
# the inner join only returns the row in which both keys match so only Sally Brooks is returned.
29
Q

(df.where(df[‘SUMLEV’]==50)
.dropna()
.set_index([‘STNAME’,’CTYNAME’])
.rename(columns={‘ESTIMATESBASE2010’: ‘Estimates Base 2010’}))

A
# demonstrates method chaining
# if you begin w/ an open parentheses you can use it over multiple lines.
30
Q
def min_max(row):
    data = row[['POPESTIMATE2010',
                'POPESTIMATE2011',
                'POPESTIMATE2012',
                'POPESTIMATE2013',
                'POPESTIMATE2014',
                'POPESTIMATE2015']]
    return pd.Series({'min': np.min(data), 'max': np.max(data)})

df.apply(min_max, axis=1)

A
# see map function api. Takes a function and applies it to all items in a list.
# apply takes a function and the axis on which we want the function to operate on.
# axis = 1 applies function across all rows.
		max	min
STNAME	CTYNAME		
Alabama	Autauga County	55347.0	54660.0
Baldwin County	203709.0	183193.0
Barbour County	27341.0	26489.0
31
Q

df.apply(lambda x: np.max(x[rows]), axis=1)

A
# apply functions are used often w/ lambdas
# you can chain several apply calls w/ lambdas together. 
 # calculate the max of function using apply call.
32
Q

for state in df[‘STNAME’].unique():
avg = np.average(df.where(df[‘STNAME’]==state).dropna()[‘CENSUS2010POP’])
print(‘Counties in state ‘ + state + ‘ have an average population of ‘ + str(avg))

A
# hear we use the census date and get a list of unique states.
# for each state we reduce the dataframe and calculate the average.
# this takes a long time to run
33
Q

for group, frame in df.groupby(‘STNAME’):
avg = np.average(frame[‘CENSUS2010POP’])
print(‘Counties in state ‘ + group + ‘ have an average population of ‘ + str(avg))

A
# this does the same thing as for loop w/ unique method, but it is much faster because it uses a groupby object.
# we're interested in grouping by state name and we will calculate the average using one column and all of the data in that column.
34
Q

df = df.set_index(‘STNAME’)

def fun(item):
    if item[0]
A

groupby is used 99% of the time on one or more columns, but you can provide a function to group by also.

use the fun function to let groupby know how to split up dataframe.

if first letter is ‘M’ is we return 0, “Q’ we return 1, and otherwise we return 2

35
Q

df.groupby(‘STNAME’).agg({‘CENSUS2010POP’: np.average})

A
# common for groupby to split data, apply some function, and then combine the results.
# split, apply, combine pattern.
# agg or aggregate applies function to columns of data in group and returns results.
# builds a summary data frame 
# pass in dictionary w/ column and function you want to apply. 
	CENSUS2010POP
STNAME	
Alabama	71339.343284
Alaska	24490.724138
Arizona	426134.466667
Arkansas	38878.906667
California	642309.586207
Colorado	78581.187500
36
Q
# there are two different groupbys, the Series groupby and DataFrame groupby
print(type(df.groupby(level=0)['POPESTIMATE2010','POPESTIMATE2011']))
A

DataFrame groupby

37
Q
# there are two different groupbys, the Series groupby and DataFrame groupby
print(type(df.groupby(level=0)['POPESTIMATE2010'])) # Series groupby
A

Series Groupby

38
Q

(df.set_index(‘STNAME’).groupby(level=0)[‘CENSUS2010POP’]

.agg({‘avg’: np.average, ‘sum’: np.sum}))

A
# calls set_index and tells pandas to groupby index using level parameter.
# level=0 tells it to use columns(I think)
# since there's only one column of data, applies both functions to that column.
# notice this is a series
	sum	avg
STNAME		
Alabama	4779736	71339.343284
Alaska	710231	24490.724138
Arizona	6392017	426134.466667
Arkansas	2915918	38878.906667
39
Q

(df.set_index(‘STNAME’).groupby(level=0)[‘POPESTIMATE2010’,’POPESTIMATE2011’]
.agg({‘avg’: np.average, ‘sum’: np.sum}))

A
# calls set_index and tells pandas to groupby index using level parameter.
# does the same thing as above, but on two different columns
# notice this is a DataFrame
sum	avg POPESTIMATE2010	POPESTIMATE2011	POPESTIMATE2010	POPESTIMATE2011 STNAME				 Alabama	4785161	4801108	71420.313433	71658.328358 Alaska	714021	722720	24621.413793	24921.379310
40
Q

(df.set_index(‘STNAME’).groupby(level=0)[‘POPESTIMATE2010’,’POPESTIMATE2011’]
.agg({‘POPESTIMATE2010’: np.average, ‘POPESTIMATE2011’: np.sum}))

A
# pandas maps the functions directly to columns here instead of creating a hierarchical column.
# pandas uses the dictionary keys as the column names.
POPESTIMATE2011	POPESTIMATE2010 STNAME		 Alabama	4801108	71420.313433 Alaska	722720	24621.413793
41
Q

Four Scales

A
#1. Ratio Scale:
    # units equally spaced
    # mathematical operations +-/* are all valid
    # E.g height and weight measurements
# 2. Interval scale
    # Units are equally spaced, but there is no true zero
    # Temperature is example. 0 degrees doesn't represent a real zero.
# 3. Ordinal Scale
    # The order of the units is important, but not evenly spaced.
    # Letter grades such as A+, A are a good example.
    # when you compare percentage values, regular As represent 5 poiints, but A+ only counta for .
# 4. Nominal Scale:
    # Categories of data, but the categories have no order w/ respect to one another.
    # E.g Teams of a sport
    # categories w/ only two possible values are referred to as binary.
42
Q

df = pd.DataFrame([‘A+’, ‘A’, ‘A-‘, ‘B+’, ‘B’, ‘B-‘, ‘C+’, ‘C’, ‘C-‘, ‘D+’, ‘D’],
index=[‘excellent’, ‘excellent’, ‘excellent’, ‘good’, ‘good’, ‘good’, ‘ok’, ‘ok’, ‘ok’, ‘poor’, ‘poor’])
df.rename(columns={0: ‘Grades’}, inplace=True)
df

A

starts w/ nominal data which is called category data in pandas.

	Grades
excellent	A+
excellent	A
excellent	A-
good	B+
good	B
good	B-
ok	C+
43
Q

df[‘Grades’].astype(‘category’).head()

A
# instructing pandas to refer to data as categorical data.
# dtype has been set to category.
44
Q

grades = df[‘Grades’].astype(‘category’,
categories=[‘D’, ‘D+’, ‘C-‘, ‘C’, ‘C+’, ‘B-‘, ‘B’, ‘B+’, ‘A-‘, ‘A’, ‘A+’],
ordered=True)
grades.head()

A
# you can change this to ordinal data w/ ordered = True flag.
# Ordinal data helps for boolean masking.
# if we compare the values lexigraphically, C+ and C- are greater than C. 
# Ordering the data makes it clear that there is a clear order to the data.
45
Q

grades = df[‘Grades’].astype(‘category’,
categories=[‘D’, ‘D+’, ‘C-‘, ‘C’, ‘C+’, ‘B-‘, ‘B’, ‘B+’, ‘A-‘, ‘A’, ‘A+’],
ordered=True)

grades > ‘C’

A
excellent     True
excellent     True
excellent     True
good          True
good          True
good          True
ok            True
ok           False
ok           False
46
Q

df = df.set_index(‘STNAME’).groupby(level=0)[‘CENSUS2010POP’].agg({‘avg’: np.average})

pd.cut(df[‘avg’], 10)

A
# cut takes an argument of some array-like structure and an int that represents a number of bins.
# separates states into categories by average census populations.
47
Q

df.pivot_table(values=’(kW)’, index=’YEAR’, columns=’Make’, aggfunc=np.mean)

A
# pivot table is in itself a data frame.
# allows us to pivot out a column as a column header and to compare it to the other column headers.
48
Q

df.pivot_table(values=’(kW)’, index=’YEAR’, columns=’Make’, aggfunc=[np.mean, np.min], margins=True)

A
# you can pass aggfunc a list of functions you want to apply.
# you can pass any function you want to aggfunc including those you write yourself.
49
Q

pd.Timestamp(‘9/1/2016 10:05AM’)

A

timestamp interchangable w/ datetime in most cases.

50
Q

pd.Period(‘1/2016’)

A

creates time period

51
Q

t1 = pd.Series(list(‘abc’), [pd.Timestamp(‘2016-09-01’), pd.Timestamp(‘2016-09-02’), pd.Timestamp(‘2016-09-03’)])
t1

A

here each timestamp has a timestamp index

2016-09-01 a
2016-09-02 b
2016-09-03 c

52
Q

t2 = pd.Series(list(‘def’), [pd.Period(‘2016-09’), pd.Period(‘2016-10’), pd.Period(‘2016-11’)])
t2

A

has a period index.

2016-09 d
2016-10 e
2016-11 f

53
Q

d1 = [‘2 June 2013’, ‘Aug 29, 2014’, ‘2015-06-26’, ‘7/12/16’]
ts3 = pd.DataFrame(np.random.randint(10, 100, (4,2)), index=d1, columns=list(‘ab’))
ts3

A
# convert to Datetime
# uses dates as index and fills rows w/ random values.
	a	b
2 June 2013	99	69
Aug 29, 2014	73	76
2015-06-26	82	71
7/12/16	83	30
54
Q

ts3.index = pd.to_datetime(ts3.index)

ts3

A

converts indeces into Datetime format

55
Q

pd.Timestamp(‘9/3/2016’)-pd.Timestamp(‘9/1/2016’)

A

tells us exactly how much time passed between two dates.

56
Q

dates = pd.date_range(‘10-01-2016’, periods=9, freq=’2W-SUN’)
dates

A
# we want to look at 9 measurements taken bi-weekly. 
# we use date_range to create the indeces.

DatetimeIndex([‘2016-10-02’, ‘2016-10-16’, ‘2016-10-30’, ‘2016-11-13’,
‘2016-11-27’, ‘2016-12-11’, ‘2016-12-25’, ‘2017-01-08’,
‘2017-01-22’],
dtype=’datetime64[ns]’, freq=’2W-SUN’)

57
Q

dates = pd.date_range(‘10-01-2016’, periods=9, freq=’2W-SUN’)

df = pd.DataFrame({‘Count 1’: 100 + np.random.randint(-5, 10, 9).cumsum(),
‘Count 2’: 120 + np.random.randint(-5, 10, 9)}, index=dates)
df

A

creates a Dataframe using dates and random data.

58
Q

dates = pd.date_range(‘10-01-2016’, periods=9, freq=’2W-SUN’)

df = pd.DataFrame({‘Count 1’: 100 + np.random.randint(-5, 10, 9).cumsum(),
‘Count 2’: 120 + np.random.randint(-5, 10, 9)}, index=dates)

df.index.weekday_name

A

create a list of the days of week for the specific dates.

59
Q

dates = pd.date_range(‘10-01-2016’, periods=9, freq=’2W-SUN’)

df = pd.DataFrame({‘Count 1’: 100 + np.random.randint(-5, 10, 9).cumsum(),
‘Count 2’: 120 + np.random.randint(-5, 10, 9)}, index=dates)

df.diff()

A

find the difference between each date’s value.

Count 1	Count 2 2016-10-02	NaN	NaN 2016-10-16	1.0	6.0 2016-10-30	4.0	3.0
60
Q

df = pd.DataFrame({‘Count 1’: 100 + np.random.randint(-5, 10, 9).cumsum(),
‘Count 2’: 120 + np.random.randint(-5, 10, 9)}, index=dates)

df.resample(‘M’).mean()

A

find the mean count for each date in DataFrame.

	Count 1	Count 2
2016-10-31	103.0	120.0
2016-11-30	105.5	118.0
2016-12-31	107.0	123.0
2017-01-31	106.0	121.0
61
Q

df[‘2017’]

A

use partial string indexing to find values from a particular year or month

Count 1 Count 2
2017-01-08 108 116
2017-01-22 104 126

62
Q

df[‘2016-12’]

A

use partial string indexing to find values from a particular year and month

Count 1	Count 2 2016-12-11	104	119 2016-12-25	110	127
63
Q

df[‘2016-12’:]

A
# slice on a range of dates starting December 2016
# here we only want the dates from Dec. 2016 onward.
64
Q

df.asfreq(‘W’, method=’ffill’)

A

change the frequency of dates from bi-weekly to weekly and use forward fill to fill in dates.

65
Q

import matplotlib.pyplot as plt
%matplotlib inline

df.plot()

A

allows you to visualize time series in notebook.