Midterm 1 Flashcards

1
Q

Data Science Lifecycle Step 1

A

Frame the problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Science Lifecycle Step 2

A

Collect raw data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Science Lifecycle Step 3

A

Process the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Science Lifecycle Step 4

A

Explore the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Science Lifecycle Step 5

A

Perform in depth analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Science Lifecycle Step 6

A

Communicate results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Association

A

any relation or link

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Causality

A

One thing causes the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Categorical

A

Each value is from a fixed inventory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Numerical

A

Each value is a number (not a code)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Values

A

can be numerical or categorical, and of many subtypes within these

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution

A

For each different value of the variable, the frequency of individuals that have that value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Randomize

A

If you assign individuals to treatment and control at random, then the two groups are likely to be similar apart from the treatment
Random =/ = Haphazard … regardless of what the dictionary says (in probability theory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assignment statements

A

statements don’t have a value ; they perform an action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

f(27)

A

(f- what function to call) (27-argument to the function) “Call f on 27”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

t.select(label)

A

constructs a new table with just the specified columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

t.drop(label)

A

constructs a new table in which the specified columns are omitted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

t.sort(label)

A

constructs a new table with rows sorted by the specified column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

t.where(label, condition)

A

constructs a new table with just the rows that match the condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Integers

A

an integer of any size,
an int never has a decimal point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Floats

A

has an optional fractional part
always has a decimal point
may use scientific notation
they have limited size (but the limit is huge)
they have limited precision of 15-16 decimal places
after arithmetic the final few decimal places can be wrong

22
Q

string

A

a set of characters of any length - ‘A’

23
Q

Arrays

A

A collection of things
sequence of values of the same type (Arrays -> Columns)

24
Q

Ranges

A

A range is an array of consecutive numbers

25
np.arange(end)
An array of increasing integers from 0 up to end
26
np.arange(start,end)
An array of increasing
27
np.arange(start,end,step)
A range with step between consecutive values
28
NOTE: The range always includes start but excludes end
29
Table.read_table(filename)
reads a table from a spreadsheet
30
Numerical Attribute types
Each value is from a numerical scale Numerical measurements are ordered Differences are meaningful
31
Categorical Attribute types
Each value is from a fixed inventory May or may not have an ordering Categories are the same or different
32
Use line polts for sequential data if
Your x axis has an order Sequential differences in y values are meaningful Theres only one y-value for each x-value Usually x-axis is time or distance
33
Use scatter plots for non-sequential data
When you're looking for associations
34
Binning
counting the number of numerical values that lie within rages, called bins
35
Bins
defined by their lower bounds (inclusive) The upper bound is the lower bound of the next bin
36
Histogram
Chart that displays the distribution of a numerical value / attribute Uses bins; there is one bar corresponding to each bin Uses the area principle The area of each bar is the percent of individuals in the corresponding bin
37
Height formula
(% in bin/width of bin)
38
Area of bar formula
% in bin = Height x width of bin
39
Scatter plot
relation between numerical variables
40
Line graph
sequential data (over time)
41
Bar chart
distribution of categorical data
42
Histogram
distribution of numerical data
43
Grouped Table
One combo of grouping variables per row Any number of grouping variables Aggregate values of all other columns in table Missing combos absent
44
Pivot table
One combo of grouping variables per entry Two grouping variables: columns and rows Aggregate values of values column Missing combos = 0 (or empty string)
45
Probability
Lowest value: 0 Chance of even that is impossible Highest value: 1 (or 100%) Chance of event that is certain Complement: if an event has chance 70%, then the chance that it doesn’t happen is 100% - 70% = 30% 1 - 0.7 = 0.3
46
Equally likely outcomes
P(A) = (number of outcomes that make A happen) / (total number of outcomes)
47
Multiplication Rule
Chance that two events A and B both happen P(A) = P(A happens) x P(B happens given that A has happened)
48
Addition Rule
If event A can happen in exactly one of two ways, then P(A) = P(first way) + P(second way)
49
Pivot
Cross-classified according to two categorical variables Produces a grid of counts or aggregated values Two required arguments First (A): variable that forms column labels of grid Second (B): variable that forms row labels of grid Two optional arguments (include both or neither) values = column_label_to_aggregate collect = function_to_aggregate_with
50
Lists
sequence of values of different types (Lists -> Rows)
51
Groups
collect rows by some column