Data Science Intro Flashcards

(60 cards)

1
Q

cylinders = set(d[‘cyl’] for d in mpg)

A

Use set to return the unique values for the number of cylinders the cars in our dataset have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sum(float(d[‘hwy’]) for d in mpg) / len(mpg)

A

This is how to find the average hwy fuel economy across all cars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

len(mpg) - mpg is the title of a list that includes dictionary keys.

A

csv.Dictreader has read in each row of our csv file as a dictionary. len shows that our list is comprised of 234 dictionaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

import csv

%precision 2

with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))

mpg[:3] # The first three dictionaries in our list.

A

Reads csv file and make a list named mpg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What will the output be?

sales_record = {
‘price’: 3.24,
‘num_items’: 4,
‘person’: ‘Chris’}

sales_statement = ‘{} bought {} item(s) at a price of {} each for a total of {}’

print(sales_statement.format(sales_record[‘person’],
sales_record[‘num_items’],
sales_record[‘price’],
sales_record[‘num_items’]*sales_record[‘price’]))

A

Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

x = (‘Christopher’, ‘Brooks’, ‘brooksch@umich.edu’)
fname, lname, email = x

print(fname)

A

Christopher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Tuple format?

A

list = (“Hi”, “Dave”, 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List format?

A

list = [“hi”, 4, 2, “Dave”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

x = {‘Christopher Brooks’: ‘brooksch@umich.edu’, ‘Bill Gates’: ‘billg@microsoft.com’}

x[‘Christopher Brooks’]

A

Retrieve a value by using the indexing operator

‘brooksch@umich.edu’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CtyMpgByCyl = []

for c in cylinders: # iterate over all the cylinder levels
summpg = 0
cyltypecount = 0
for d in mpg: # iterate over all dictionaries
if d[‘cyl’] == c: # if the cylinder level type matches,
summpg += float(d[‘cty’]) # add the cty mpg
cyltypecount += 1 # increment the count
CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple (‘cylinder’, ‘avg mpg’)

CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl

A

Prints the average mpg for each cylinder size

Lambda sorts CityMpgByCyl by first key index.

[(‘4’, 21.01), (‘5’, 20.50), (‘6’, 16.22), (‘8’, 12.57)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

vehicleclass = set(d[‘class’] for d in mpg)

vehicleclass

A

What are the class types? Only show me one of each

{‘2seater’, ‘compact’, ‘midsize’, ‘minivan’, ‘pickup’, ‘subcompact’, ‘suv’}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

HwyMpgByClass = []

for t in vehicleclass: # iterate over all the vehicle classes
summpg = 0
vclasscount = 0
for d in mpg: # iterate over all dictionaries
if d[‘class’] == t: # if the cylinder amount type matches,
summpg += float(d[‘hwy’]) # add the hwy mpg
vclasscount += 1 # increment the count
HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple (‘class’, ‘avg mpg’)

HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass

A

example of how to find the average hwy mpg for each class of vehicle in our dataset.

[('pickup', 16.88),
 ('suv', 18.13),
 ('minivan', 22.36),
 ('2seater', 24.80),
 ('midsize', 27.29),
 ('subcompact', 28.14),
 ('compact', 28.30)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

import datetime as dt
import time as tm

tm.time()

A

time returns the current time in seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

import datetime as dt
import time as tm

dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

A

Convert the timestamp to datetime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second

A

get year, month, day, etc. from a datetime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
delta = dt.timedelta(days = 100)
# create a timedelta of 100 days

delta

A

timedelta is a duration expressing the difference between two dates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
delta = dt.timedelta(days = 100)
today = dt.date.today()

today - delta

A

Returns date 100 days ago.

datetime.date(2016, 8, 13)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

today > today-delta

A

compare dates

returns True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]
cheapest = map(min, store1, store2)
cheapest

A

stores the lowest values as a list in cheapest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

my_function = lambda a, b, c : a + b

my_function(1, 2, 3)

A

Here’s an example of lambda that takes in three parameters and adds the first two.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
my_list = []
for number in range(0, 1000):
    if number % 2 == 0:
        my_list.append(number)
my_list
A

appends even numbers in range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

my_list = [number for number in range(0,1000) if number % 2 == 0]
my_list

A

shorthand version of :

my_list = []
for number in range(0, 1000):
    if number % 2 == 0:
        my_list.append(number)
my_list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

m = np.array([[7, 8, 9], [10, 11, 12]]) # create array w/ numpy

m.shape

A

Use the shape method to find the dimensions of the array. (rows, columns)

(2, 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

n = np.arange(0, 30, 2)

n

A

arange returns evenly spaced values within a given interval.

start at 0 count up by 2, stop before 30

array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
n = [ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] n = n.reshape(3, 5)
reshape returns an array with the same data with a new shape. reshape array to be 3x5 array([[ 0, 2, 4, 6, 8], [10, 12, 14, 16, 18], [20, 22, 24, 26, 28]])
26
o = np.linspace(0, 4, 9) | o
linspace returns evenly spaced numbers over a specified interval. return 9 evenly spaced values from 0 to 4 array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])
27
o = [ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ] o.resize(3, 3) o
resize changes the shape and size of array in-place. array([[ 0. , 0.5, 1. ], [ 1.5, 2. , 2.5], [ 3. , 3.5, 4. ]])
28
np.ones((3, 2))
ones returns a new array of given shape and type, filled with ones. array([[ 1., 1.], [ 1., 1.], [ 1., 1.]])
29
np.zeros((2, 3))
zeros returns a new array of given shape and type, filled with zeros. array([[ 0., 0., 0.], [ 0., 0., 0.]])
30
np.eye(3)
eye returns a 2-D array with ones on the diagonal and zeros elsewhere. array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])
31
np.diag(y)
diag extracts a diagonal or constructs a diagonal array. array([[4, 0, 0], [0, 5, 0], [0, 0, 6]])
32
np.array([1, 2, 3] * 3)
Create an array using repeating list (or see np.tile) array([1, 2, 3, 1, 2, 3, 1, 2, 3])
33
np.repeat([1, 2, 3], 3)
Repeat elements of an array using repeat. array([1, 1, 1, 2, 2, 2, 3, 3, 3])
34
p = ([[1, 1, 1], [1, 1, 1]]) np.vstack([p, 2*p])
Use vstack to stack arrays in sequence vertically (row wise). array([[1, 1, 1], [1, 1, 1], [2, 2, 2], [2, 2, 2]])
35
p = ([[1, 1, 1], [1, 1, 1]]) np.hstack([p, 2*p])
Use hstack to stack arrays in sequence horizontally (column wise). array([[1, 1, 1, 2, 2, 2], [1, 1, 1, 2, 2, 2]])
36
``` x = [1 2 3] y = [4 5 6] ``` print(x + y) print(x - y)
[5 7 9] | [-3 -3 -3]
37
``` x = [1 2 3] y = [4 5 6] ``` print(x * y) print(x / y)
[ 4 10 18] | [ 0.25 0.4 0.5 ]
38
x = [1 2 3] print(x**2)
raises all elements to power of 2 [1 4 9]
39
x.dot(y) # dot product 1*4 + 2*5 + 3*6
dot product 1*4 + 2*5 + 3*6 x[0]y[0] + x[1]y[1] + x[2]y[2] = 32
40
z = np.array([y, y**2]) | print(len(z))
prints number of rows in 2d array
41
z = ([[ 4, 5, 6], [16, 25, 36]]) z. shape is (2,3) z. T
Transposing changes shape of array array([[ 4, 16], [ 5, 25], [ 6, 36]])
42
z.dtype
Use .dtype to see the data type of the elements in the array. dtype('int64')
43
z starts off as int z = z.astype('f') z.dtype
Use .astype to cast to a specific type. dtype('float32')
44
a. argmax() | a. argmin()
argmax and argmin return the index of the maximum and minimum values in the array.
45
s = np.arange(13)**2 | s
fills array w/ squares of first 13 index spots array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144])
46
s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]) s[0], s[4], s[-1]
Use bracket notation to get the value at a specific index. Remember that indexing starts at 0. (0, 16, 144)
47
s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]) s[1:5]
Use : to indicate a range. array[start:stop] Leaving start or stop empty will default to the beginning/end of the array. array([ 1, 4, 9, 16])
48
s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]) s[1:5]
Use negatives to count from the back. array([ 81, 100, 121, 144])
49
s = ([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]) s[-5::-2]
A second : can be used to indicate step-size. array[start:stop:stepsize] Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is reached. array([64, 36, 16, 4, 0])
50
``` r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]]) ``` r[3, 3:6]
use : to select a range of rows or columns array([21, 22, 23])
51
``` r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]]) ``` r[:2, :-1]
Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column. array([[ 0, 1, 2, 3, 4], [ 6, 7, 8, 9, 10]])
52
``` r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]]) ``` r[-1, ::2]
This is a slice of the last row, and only every other element. array([30, 32, 34])
53
``` r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]]) ``` r[r > 30]
We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see np.where) array([31, 32, 33, 34, 35])
54
``` r = ([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]]) ``` r[r > 30] = 30 r
Here we are assigning all values in the array that are greater than 30 to the value of 30. ``` array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 30, 30, 30, 30, 30]]) ```
55
r2[:] = 0 | r2
Set this slice's values to zero ([:] selects the entire array) if r2 is the result of a slice of r's values, r will also change in the sliced positions. Need to copy.
56
test = np.random.randint(0, 10, (4,3)) | test
Create a new 4 by 3 array of random numbers 0-9. array([[0, 8, 0], [0, 5, 7], [3, 7, 4], [3, 4, 9]])
57
test = ([[0, 8, 0], [0, 5, 7], [3, 7, 4], [3, 4, 9]]) for row in test: print(row)
Iterate by row: Each row is an index because this is multidimensional array. [0 8 0] [0 5 7] [3 7 4] [3 4 9]
58
test = ([[0, 8, 0], [0, 5, 7], [3, 7, 4], [3, 4, 9]]) for i in range(len(test)): print(test[i])
Iterate by index: [6 9 4] [8 1 9] [4 8 1] [7 2 2]
59
test = ([[0, 8, 0], [0, 5, 7], [3, 7, 4], [3, 4, 9]]) for i, row in enumerate(test): print('row', i, 'is', row)
Iterate by row and index: row 0 is [6 9 4] row 1 is [8 1 9] row 2 is [4 8 1] row 3 is [7 2 2]
60
test = ([[0, 8, 0], [0, 5, 7], [3, 7, 4], [3, 4, 9]]) ``` test2 = test**2 test2 prints as: array([[36, 81, 16], [64, 1, 81], [16, 64, 1], [49, 4, 4]]) ``` for i, j in zip(test, test2): print(i, '+', j, '=', i+j)
Use zip to iterate over multiple iterables. for i, j in zip(test, test2): print(i, '+', j, '=', i+j) [6 9 4] + [36 81 16] = [42 90 20] [8 1 9] + [64 1 81] = [72 2 90] [4 8 1] + [16 64 1] = [20 72 2] [7 2 2] + [49 4 4] = [56 6 6]