Exam 1 Concepts Review Flashcards

(20 cards)

1
Q

What are the two main approaches of statistics?

A

frequentist: focuses on the idea that probabilities are the frequency of outcomes over many trials, uses data to test hypothesis without incorporating prior beliefs.
Bayesian: incorporates prior beliefs and knowledge into decision making and updates these beliefs as new data comes in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three subfields of data science?

A

statistics
Machine learning
Data mining/analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the five sources of raw data?

A

Public data,
data from an existing product, human-in-the-loop,
brute force,
buying data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are three ways to prepare data?

A
  1. filtering impurities in the data.
  2. merging data
  3. labeling data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name three ways to speed up data labeling.

A

External annotation services, internal annotation teams,
and using tools like supervised prediction or active learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define the purpose of each of the following plotting options:
1. Histogram.
2. Bar charts:
3. Pie charts:
4. Scatter plot:

A
  1. looks at the distribution of one quantitative, typically continuous value. Each bar= a range of values and the height is the frequency.
  2. counts frequency of discrete categories in one field.
  3. show the relative frequency of discrete categories in one field.
  4. show the relationship between usually 2 variables. Looking for correlation between two variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are bar charts used for?

A

allow for easy comparison between different categories, to show trends, and patterns among those categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Relative Frequency:

A

how often something happens divided by all outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. =AVERAGE(range)
  2. =SUM(range)
  3. =MIN(range)
  4. =MAX(range)
  5. =COUNT(range)
A
  1. → Calculates the mean (average) of numbers in a range.
  2. → Adds up all the numbers in the range.
  3. → Returns the smallest number in the range.
  4. → Returns the largest number in the range.
  5. → Counts how many numeric values are in the range.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define the following spreadsheet functions:
1. Averageif
2. Countif
3. Countifs

A
  1. calculates the average of cells based on a single condition, that allows you to place a condition on what area to take the average from.
  2. used to count cells within a range that fit a specific condition.
  3. allows for multiple conditions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a confounding variable?

A

an external variable that affects your independent/ dependent values which makes it hard to attribute the results to the experimentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define the following terms:
1. Randomizing:
2. Blind RCT:
3, Double Blind:
4. Sampling Bias:

A
  1. if you randomly assign individuals to either a control group or a treatment group, then the 2 groups are likely to be similar except in the treatment.
  2. when the participants don’t know if they are in the control or treatment group.
  3. when nor the researchers nor the participants know who is in which group.

4, when the data used to form a conclusion isn’t well representative of the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What theory did John Snow challenge in his cholera experiment?

A

Miasma theory (belief that disease was caused by bad air).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What did John Snow believe was the cause of cholera?

A

Contaminated food and water.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What did John Snow create to investigate the cholera outbreak?

A

A map of London showing the locations of cholera cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What was significant about the Broad Street pump in Snow’s investigation?

A

It was the location where most cholera cases were concentrated.

17
Q

What were the two groups in Snow’s second experiment?

A

S&V (treatment group) and Lambeth (control group).

18
Q

Why did the experiment work with the treatment and control groups?

A

Because the populations in both areas were similar apart from the water source.

19
Q

What was the outcome after the Broad Street pump handle was removed?

A

The number of cholera cases decreased significantly.

20
Q

How did John Snow upgrade his study to establish a causal connection between cholera and contaminated water?

A

John Snow compared two groups: one using contaminated water from the Broad Street pump (treatment group) and one using cleaner water from Lambeth (control group). The higher cholera rate in the treatment group helped establish the water source as the cause of the disease.