Data Science Intro Flashcards

1
Q

cylinders = set(d[‘cyl’] for d in mpg)

A

Use set to return the unique values for the number of cylinders the cars in our dataset have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sum(float(d[‘hwy’]) for d in mpg) / len(mpg)

A

This is how to find the average hwy fuel economy across all cars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

len(mpg) - mpg is the title of a list that includes dictionary keys.

A

csv.Dictreader has read in each row of our csv file as a dictionary. len shows that our list is comprised of 234 dictionaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

import csv

%precision 2

with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))

mpg[:3] # The first three dictionaries in our list.

A

Reads csv file and make a list named mpg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What will the output be?

sales_record = {
‘price’: 3.24,
‘num_items’: 4,
‘person’: ‘Chris’}

sales_statement = ‘{} bought {} item(s) at a price of {} each for a total of {}’

print(sales_statement.format(sales_record[‘person’],
sales_record[‘num_items’],
sales_record[‘price’],
sales_record[‘num_items’]*sales_record[‘price’]))

A

Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

x = (‘Christopher’, ‘Brooks’, ‘brooksch@umich.edu’)
fname, lname, email = x

print(fname)

A

Christopher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Tuple format?

A

list = (“Hi”, “Dave”, 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List format?

A

list = [“hi”, 4, 2, “Dave”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

x = {‘Christopher Brooks’: ‘brooksch@umich.edu’, ‘Bill Gates’: ‘billg@microsoft.com’}

x[‘Christopher Brooks’]

A

Retrieve a value by using the indexing operator

‘brooksch@umich.edu’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CtyMpgByCyl = []

for c in cylinders: # iterate over all the cylinder levels
summpg = 0
cyltypecount = 0
for d in mpg: # iterate over all dictionaries
if d[‘cyl’] == c: # if the cylinder level type matches,
summpg += float(d[‘cty’]) # add the cty mpg
cyltypecount += 1 # increment the count
CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple (‘cylinder’, ‘avg mpg’)

CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl

A

Prints the average mpg for each cylinder size

Lambda sorts CityMpgByCyl by first key index.

[(‘4’, 21.01), (‘5’, 20.50), (‘6’, 16.22), (‘8’, 12.57)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

vehicleclass = set(d[‘class’] for d in mpg)

vehicleclass

A

What are the class types? Only show me one of each

{‘2seater’, ‘compact’, ‘midsize’, ‘minivan’, ‘pickup’, ‘subcompact’, ‘suv’}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

HwyMpgByClass = []

for t in vehicleclass: # iterate over all the vehicle classes
summpg = 0
vclasscount = 0
for d in mpg: # iterate over all dictionaries
if d[‘class’] == t: # if the cylinder amount type matches,
summpg += float(d[‘hwy’]) # add the hwy mpg
vclasscount += 1 # increment the count
HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple (‘class’, ‘avg mpg’)

HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass

A

example of how to find the average hwy mpg for each class of vehicle in our dataset.

[('pickup', 16.88),
 ('suv', 18.13),
 ('minivan', 22.36),
 ('2seater', 24.80),
 ('midsize', 27.29),
 ('subcompact', 28.14),
 ('compact', 28.30)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

import datetime as dt
import time as tm

tm.time()

A

time returns the current time in seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

import datetime as dt
import time as tm

dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

A

Convert the timestamp to datetime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second

A

get year, month, day, etc. from a datetime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
delta = dt.timedelta(days = 100)
# create a timedelta of 100 days

delta

A

timedelta is a duration expressing the difference between two dates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
delta = dt.timedelta(days = 100)
today = dt.date.today()

today - delta

A

Returns date 100 days ago.

datetime.date(2016, 8, 13)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

today > today-delta

A

compare dates

returns True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]
cheapest = map(min, store1, store2)
cheapest

A

stores the lowest values as a list in cheapest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

my_function = lambda a, b, c : a + b

my_function(1, 2, 3)

A

Here’s an example of lambda that takes in three parameters and adds the first two.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
my_list = []
for number in range(0, 1000):
    if number % 2 == 0:
        my_list.append(number)
my_list
A

appends even numbers in range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

my_list = [number for number in range(0,1000) if number % 2 == 0]
my_list

A

shorthand version of :

my_list = []
for number in range(0, 1000):
    if number % 2 == 0:
        my_list.append(number)
my_list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

m = np.array([[7, 8, 9], [10, 11, 12]]) # create array w/ numpy

m.shape

A

Use the shape method to find the dimensions of the array. (rows, columns)

(2, 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

n = np.arange(0, 30, 2)

n

A

arange returns evenly spaced values within a given interval.

start at 0 count up by 2, stop before 30

array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

n = [ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]

n = n.reshape(3, 5)

A

reshape returns an array with the same data with a new shape.

reshape array to be 3x5

array([[ 0, 2, 4, 6, 8],
[10, 12, 14, 16, 18],
[20, 22, 24, 26, 28]])

26
Q

o = np.linspace(0, 4, 9)

o

A

linspace returns evenly spaced numbers over a specified interval.

return 9 evenly spaced values from 0 to 4

array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

27
Q

o = [ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ]

o.resize(3, 3)
o

A

resize changes the shape and size of array in-place.

array([[ 0. , 0.5, 1. ],
[ 1.5, 2. , 2.5],
[ 3. , 3.5, 4. ]])

28
Q

np.ones((3, 2))

A

ones returns a new array of given shape and type, filled with ones.

array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])

29
Q

np.zeros((2, 3))

A

zeros returns a new array of given shape and type, filled with zeros.

array([[ 0., 0., 0.],
[ 0., 0., 0.]])

30
Q

np.eye(3)

A

eye returns a 2-D array with ones on the diagonal and zeros elsewhere.

array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])

31
Q

np.diag(y)

A

diag extracts a diagonal or constructs a diagonal array.

array([[4, 0, 0],
[0, 5, 0],
[0, 0, 6]])

32
Q

np.array([1, 2, 3] * 3)

A

Create an array using repeating list (or see np.tile)

array([1, 2, 3, 1, 2, 3, 1, 2, 3])

33
Q

np.repeat([1, 2, 3], 3)

A

Repeat elements of an array using repeat.

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

34
Q

p = ([[1, 1, 1],
[1, 1, 1]])

np.vstack([p, 2*p])

A

Use vstack to stack arrays in sequence vertically (row wise).

array([[1, 1, 1],
[1, 1, 1],
[2, 2, 2],
[2, 2, 2]])

35
Q

p = ([[1, 1, 1],
[1, 1, 1]])

np.hstack([p, 2*p])

A

Use hstack to stack arrays in sequence horizontally (column wise).

array([[1, 1, 1, 2, 2, 2],
[1, 1, 1, 2, 2, 2]])

36
Q
x = [1 2 3]
y = [4 5 6]

print(x + y)
print(x - y)

A

[5 7 9]

[-3 -3 -3]

37
Q
x = [1 2 3]
y = [4 5 6]

print(x * y)
print(x / y)

A

[ 4 10 18]

[ 0.25 0.4 0.5 ]

38
Q

x = [1 2 3]

print(x**2)

A

raises all elements to power of 2

[1 4 9]

39
Q

x.dot(y) # dot product 14 + 25 + 3*6

A

dot product 14 + 25 + 3*6

x[0]y[0] + x[1]y[1] + x[2]y[2] = 32

40
Q

z = np.array([y, y**2])

print(len(z))

A

prints number of rows in 2d array

41
Q

z = ([[ 4, 5, 6],
[16, 25, 36]])

z. shape is (2,3)
z. T

A

Transposing changes shape of array

array([[ 4, 16],
[ 5, 25],
[ 6, 36]])

42
Q

z.dtype

A

Use .dtype to see the data type of the elements in the array.

dtype(‘int64’)

43
Q

z starts off as int

z = z.astype(‘f’)
z.dtype

A

Use .astype to cast to a specific type.

dtype(‘float32’)

44
Q

a. argmax()

a. argmin()

A

argmax and argmin return the index of the maximum and minimum values in the array.

45
Q

s = np.arange(13)**2

s

A

fills array w/ squares of first 13 index spots

array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])

46
Q

s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])

s[0], s[4], s[-1]

A

Use bracket notation to get the value at a specific index. Remember that indexing starts at 0.

(0, 16, 144)

47
Q

s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])

s[1:5]

A

Use : to indicate a range. array[start:stop]
Leaving start or stop empty will default to the beginning/end of the array.

array([ 1, 4, 9, 16])

48
Q

s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])

s[1:5]

A

Use negatives to count from the back.

array([ 81, 100, 121, 144])

49
Q

s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])

s[-5::-2]

A

A second : can be used to indicate step-size. array[start:stop:stepsize]

Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is reached.

array([64, 36, 16, 4, 0])

50
Q
r = ([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

r[3, 3:6]

A

use : to select a range of rows or columns

array([21, 22, 23])

51
Q
r = ([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

r[:2, :-1]

A

Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column.

array([[ 0, 1, 2, 3, 4],
[ 6, 7, 8, 9, 10]])

52
Q
r = ([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

r[-1, ::2]

A

This is a slice of the last row, and only every other element.

array([30, 32, 34])

53
Q
r = ([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

r[r > 30]

A

We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see np.where)

array([31, 32, 33, 34, 35])

54
Q
r = ([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

r[r > 30] = 30
r

A

Here we are assigning all values in the array that are greater than 30 to the value of 30.

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])
55
Q

r2[:] = 0

r2

A

Set this slice’s values to zero ([:] selects the entire array)

if r2 is the result of a slice of r’s values, r will also change in the sliced positions. Need to copy.

56
Q

test = np.random.randint(0, 10, (4,3))

test

A

Create a new 4 by 3 array of random numbers 0-9.

array([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])

57
Q

test = ([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])

for row in test:
print(row)

A

Iterate by row:

Each row is an index because this is multidimensional array.

[0 8 0]
[0 5 7]
[3 7 4]
[3 4 9]

58
Q

test = ([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])

for i in range(len(test)):
print(test[i])

A

Iterate by index:

[6 9 4]
[8 1 9]
[4 8 1]
[7 2 2]

59
Q

test = ([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])

for i, row in enumerate(test):
print(‘row’, i, ‘is’, row)

A

Iterate by row and index:

row 0 is [6 9 4]
row 1 is [8 1 9]
row 2 is [4 8 1]
row 3 is [7 2 2]

60
Q

test = ([[0, 8, 0],
[0, 5, 7],
[3, 7, 4],
[3, 4, 9]])

test2 = test**2
test2 prints as:
array([[36, 81, 16],
       [64,  1, 81],
       [16, 64,  1],
       [49,  4,  4]])

for i, j in zip(test, test2):
print(i, ‘+’, j, ‘=’, i+j)

A

Use zip to iterate over multiple iterables.

for i, j in zip(test, test2):
print(i, ‘+’, j, ‘=’, i+j)

[6 9 4] + [36 81 16] = [42 90 20]
[8 1 9] + [64 1 81] = [72 2 90]
[4 8 1] + [16 64 1] = [20 72 2]
[7 2 2] + [49 4 4] = [56 6 6]