Exam 1 Real Flashcards

1
Q

What is statistics?

A

Study of methods for measuring aspects of populations from samples and for quantifying the uncertainty of the measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a population versus a sample?

A

A population is all of the individual units of interest and a sample is a subset of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are variables?

A

Characteristics that differ among individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a parameter?

A

A quantity describing a population (real)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an estimate or statistic?

A

A related quantity calculated from a sample (a subset of the population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does error value of an estimate or statistic depend on?

A

Depends on the variability within the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is estimation?

A

The process of inferring an unknown quantity of a population using sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a random sample?

A

In a random sample each member of the population has an equal and independent chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do random samples achieve?

A

Minimizes bias and makes it possible to measure the amount of sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a sample of convenience?

A

A collection of individuals that are easily available to the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the parameter?

A

The truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is sampling error?

A

The difference between an estimate and population parameter being caused by chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is bias?

A
  • Bias is a systematic discrepancy between estimates we would obtain if we could sample a population again and again, and the true population
  • Error in the same direction if you repeated the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is volunteer bias?

A

Resulting from systematic differences between the pool of volunteers and the population to which they belong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is accurate?

A

Closer the statistic or estimate is to the truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is precise?

A

Describing how repeatable an estimate is - could be due to low variability in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data can be___|_____

A

Categorical or numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Categorical data can be ________ or ________

A

Nominal - no inherent order
Ordinal - inherent order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Numerical data can be ________ or ______

A

Continuous - any real number
Discrete - indivisible units (# of children)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a frequency distribution?

A

The number of times each value of a variable occurs in a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are two types of studies?

A

Experimental and observational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are two types of variables?

A

explanatory and response variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How are variables graphed?

A

Explanatory variable on the x axis and response variable on the y axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a lurking/confounding variable?

A

A variable that masks or distorts the causal relationship between measured variables in a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are 3 problems with 3D bar graphs?
Takes the average, difficult to make comparisons because of the way data is displayed, magnitudes are distorted making the differences out of proportion
26
What is good about graphs?
Good when you want to show trends or patterns in values
27
When are tables good?
When you want to report/compare specific values with precision
28
What is a bar graph used for?
Uses the height of rectangular bars to display the frequency distribution of a categorical variable
29
What is a grouped bar graph?
Uses the height of rectangular bars to display the frequency distributions of two or more categorical variables
30
Which is better a bar graph or a pie chart?
Textbook prefers bar graph to pie chart, pie chart only if there are only two categories
31
What is a histogram?
Like a bar graph but the x axis has numerical variables
32
Describe the aspects of a histogram shape.
- the mode is the highest peak in the frequency distribution - skew refers to asymmetry in the shape - outlier
33
What is a plot with area of rectangles?
mosaic plot
34
What does a mosaic plot display?
- uses the area of rectangles to display the relative frequency of occurence of two categorical variables
35
What is a scatter plot?
graphical display of two numerical variables, each observation a point on a graph of two axes
36
What is a strip plot?
a graphical display of a numerical variable and a categorical variable in which each observation is represented as a dot
37
What is useful about a strip plot?
Gives a good idea of sample size
38
What is a box plot?
a graph that uses lines and a rectangle box to display the median, quartiles, range, and extreme measurements of the data
39
What is a violin plot?
a graph that shows an approximation of the frequency distribution of a numerical variable in each group and its mirror image
40
What does the width of a violin plot indicate
- distribution of the data - width is proportional to the density of data points
41
What is a good tip for multiple histograms?
- better to stack vertically rather than side by side because it is easier to compare groups -use same scale for x axis
42
What is interquartile range?
upper quartile - lower quartile
43
What describes the spread of a distribution?
standard deviation and variance
44
What is deviation?
difference between a data point and the mean
45
What is the sum of squares?
the sum of squared deviation
46
What is variance?
s.d. squared
47
What is standard deviation?
48
Why is there a preference for standard deviation?
- never negative - in the same units as the observation - helpful rule of thumb properties
49
What are the rule of thumb properties standard deviation?
50
What is the issue with comparing the spread of distributions in different populations?
mouse vs elephant weights, just because there is a larger deviation value doesnt mean there is a bigger relative spread
51
What is useful for comparing the spread of distributions in different populations?
coefficient of variation
52
What is the coefficient of variation?
- the standard deviation expressed as a percentage of the mean - CV = s / mean x 100% - Larger CV = wider spread
53
What is the median?
the middle measurement of a set of observations
54
What do percentiles indicate?
xth percentile is the sample below which x percent of the observations lie
55
What is the line in the middle of a box plot?
the median
56
Explain the box and whiskers in a box plot.
- Box covers entire IQR - The upper whisker is the highest point within the quartile 3 + 1.5*IQR - The lower whisker is the lowest point within the quartile 1 – 1.5*IQR - If there is a data point lower than the floor there are dots – outliers
57
Where should the median be in a bell shaped curve?
right in the middle of the box
58
What is the plot with frequency lines?
- Cumulative relative frequency at a given measurement is the fraction of observations less than or equal to that measurement -A steep jump indicates the clustering of a lot of data points - A horizontal line indicates a gap in data points
59
What is the IQR?
the difference between the third and first quartiles of the data. It is the span of the middle 50% of the data
60
Median is ____ mean is_____
Median is the middle value, while the mean is the center of gravity
61
What is proportion?
- Proportion of observations in a given category - P = num in category / n - The p has a little hat on it when you are estimating the proportion in a sample
62
Describe how sampling distributions change with different numbers of samples.
- The spread of the sampling distribution depends on the number of samples - As you increase (observations/sample) the spread (sd) decreases
63
What is the standard error of an estimate?
- The standard error of an estimate is the standard deviation of the estimate’s sample distribution - SE_Y=s/√n - Reflects the precision of the estimate - The smaller the standard error the less uncertainty there is in the estimate of the target parameter
64
What is the standard error of the mean?
σ=σ/√n - we usually don't know the actual population standard deviation so we approximate with sample standard deviation as an estimate of σ
65
σ
population standard deviation
66
s
sample standard deviation
67
What is a confidence interval?
a range of values surrounding the sample estimate that is likely to contain the population parameter
68
What is the normal confidence interval?
The 95% confidence interval provides a most plausible range for a parameter.
69
How do you describe confidence interval certainty?
- Right: We are 95% confident that the true mean lies between ___ and ____ - Wrong: there is a 95% probability that the true mean falls between 2827.8 and 3828.4
70
What are error bars?
- lines on a graph extending outward from the sample estimate to illustrate uncertainty about the value of the parameter being estimated - used to display the uncertainty, not the spread of the data
71
What is the 2SE rule?
A rough approximation of the 95% confidence interval for a mean can be calculated as the sample mean plus and minus two standard errors
72
What is a random trial?
- a process or experiment that has two or more possible outcomes - die, coins
73
What is an event in a random trial?
- Event (of interest): any potential subset or all possible outcomes - Flipping coin: heads - Rolling die: 3
74
What is probability?
the proportion of times the event would occur if we repeated a random trial over and over again under the same conditions
75
How do you abbreviate probability?
Pr[A] means “the probability of event A”
76
What does mutually exclusive mean?
Two events are mutually exclusive if they cannot occur at the same time
77
What is probability distribution?
a list of the probabilities of all mutually exclusive outcomes of a random trial
78
How do you represent the probability distribution of different variables?
- A discrete variable is measured in indivisible units - All categorical variables (present or absent) and many numerical variable (number of mates) - Continuous variables can take on any real number value within some range - Probability of Y being in some range is indicated by the area under the curve
79
What is the addition rule?
if two events A and B are mutually exclusive then Pr[A or B] = Pr[A] + Pr[B]
80
What is the general addition rule?
- Not all events are mutually exclusive, so extra term is needed so you don’t double count outcomes - Pr[A or B] = Pr[A] +Pr[B] – Pr[A and B]
81
What are independent events?
- Two events are independent if the occurrence of one does not inform us about the probability that the second will occur - Two flips of a coin or roll of a die
82
What is the multiplication rule?
If two events are independent then the probability that they both occur is the probability of the first event multiplied by the probability of the second event
83
What are dependent events?
the probability of a particular event in the second trial depends on what happened in the first trial
84
What is the general multiplication rule?
- Finds the probability that both of two events occur even if the two are dependent - Pr[A and B] = Pr[A]Pr[B|A]
85
Standard deviation, standard error, 95% confidence interval
SD > 95% > SE
86
Explain the difference between a bar plot and a histogram.
Bar graphs are used to show the frequency distribution of a categorical variable whereas histograms are used to show the frequency distribution of a numerical variable.
87
How do you identify a skew?
where ever the tail is
88
sd
89
The standard error of a sample mean is ___.
the standard deviations of the means of randomly drawn samples from the population
90
Select the proper interpretation of a confidence interval for a mean at a confidence level of C%. A range of values _____.
produced by a method such that C% of confidence intervals produced by the same method contain the population mean