Glossary Flashcards

(198 cards)

1
Q

The increase or decrease from a reference value to a new value

Absolute change = new value = reference value

A

Absolute Change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The difference between a compared value and a reference value

Absolute difference = compared value — reference value

A

Absolute Difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The amount by which a claimed or measured value differs from the true value

Absolute error = claimed or measured value — true value

A

Absolute Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The number of accidents due to some particular cause, expressed as a proportion of all people at risk for that cause
- For example, an accident rate of “5 per 1000 people” means that an average of 5 in 1000 people suffer an accident from this particular cause

A

Accident Rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How closely a measurement approximates a true value
- An accurate measurement is very close to the true value

A

Accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A statement or claim that can be supported only if the null hypothesis is rejected

A

Alternative Hypothesis (Ha)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A method of testing the equality of three or more population means by analyzing sample variances

A

Analysis of Variance (ANOVA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The probability that event A and event B will both occur
- How it is calculated depends on whether the events are independent or dependent
- Also called joint probability

A

And Probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A diagram consisting of bars representing the frequencies (or relative frequencies) for particular categories of qualitative data
- The bar lengths are proportional to the frequencies

A

Bar Graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The line on a scatter diagram that lies closer to the data points than all other possible lines (according to a standard statistical measure of closeness)
- Also called regression line

A

Best-fit Line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Any problem in the design or conduct of a statistical study that tends to favor certain results

A

Bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A distribution with two peaks, or modes

A

Bimodal Distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To group data into categories (bins), each of which covers a range of possible data values

A

Bin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The practice of keeping experimental subjects and/or experimenters in the dark about who is in the treatment group and who is in the control group

A

Blinding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A graphical display presenting a five number summary
- A number line is used for reference, the values from the lower to the upper quartiles are enclosed in a box, a line is drawn through the box at the median, and two “whiskers” are extended to the low and high data values
- Also called box and whisker plot

A

Boxplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A relationship present when one variable is a cause of another

A

Causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The collection of data from every member of a population

A

Census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Theorem stating that the distribution of the means of numerous random samples (all of the same size) of a variable with any distribution (not necessarily a normal distribution) will, as the sample size increases, tend to be approximately a normal distribution

A

Central Limit Theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A number used to determine the statistical significance of a hypothesis test presented in a contingency table (or two way table)
- If this statistic is less than a critical value (which depends on the table size and the desired significance level), the differences between the observed frequencies and the expected frequencies are not significant

A

Chi-square Statistic (X2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Dividing a population into groups, or clusters; selecting some of these clusters at random; and then obtaining the sample by choosing all the members within each of the selected clusters

A

Cluster Sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A number that describes how well data fit a best-fit equation found through multiple regression

A

Coefficient of Determination (R2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A number that is compared to a reference value in computing a relative difference

A

Compared Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

For an event A, all outcomes in which A does not occur, expressed as A
- Its probability is P(A) = 1 - P(A)

A

Complement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The probability of one event given the occurrence of another event, written P(B given A) or P(B|A)

A

Conditional Probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
A range of values associated with a confidence level, such as 95%, that is likely to contain the true value of a population parameter
Confidence Interval
26
Confusion in the interpretation of statistical results that occurs when the effects of different variables are mixed such that the effects of the individual variables being studied cannot be determined
Confounding
27
Any variables in a statistical study that can lead to confounding
Confounding Variables
28
An index number designed to measure the rate of inflation - It is computed and reported monthly, based on a sample of prices of more than 60,000 goods, services, and housing costs
Consumer Price Index (CPI)
29
Quantitative data that can take on any value in a given interval
Continuous Data
30
A map that uses curves (contours) to connect geographical points that have the same data values
Contour Map
31
The group of subjects in an experiment who do not receive the treatment being tested
Control Group
32
A sample chosen for convenience, because it is readily available, rather than by using a more effective procedure
Convenience Sample
33
A statistical relationship between two variables
Correlation
34
A measure of the strength of the relationship between two variables - Its value is always between -1 and 1 (that is, -1 < r < 1)
Correlation Coefficient (r)
35
Values of the test statistic (such as the standard z score, t test statistic, or chi square statistic) that separate unlikely values from those that are likely to occur
Critical Values (for Statistical Significance in Hypothesis Tests)
36
For any data category, the number of data values in that category and all preceding categories
Cumulative Frequency
37
The number of deaths due to some particular cause, expressed as a fraction of all people at risk for that cause - For example, a death rate of "5 per 1000 people" means that an average of 5 in 1000 people die from this particular cause
Death Rate
38
The sample size minus one (n - 1)
Degrees of Freedom (for a t Distribution used with Inferences about a Single Population Mean)
39
Two or more events for which the outcome of one affects the probability of the other(s)
Dependent Events
40
A branch of statistics that deals with describing raw data using graphics and sample statistics
Descriptive Statistics
41
How far a particular data value lies from the mean of a data set, used to compute the standard deviation
Deviation
42
Quantitative data that can take on only particular values and not other values in between (for example, the whole numbers 0, 1, 2, 3, 4, 5…)
Discrete Data
43
The way the values of a variable are spread over all possible values - This can be displayed with a table or with a graph
Distribution
44
The distribution that results when the means (X) of all possible samples of a given size are found
Distribution of Sample Means
45
The distribution of the proportions (p) from all possible samples of a given size
Distribution of Sample Proportions
46
A diagram similar to a bar graph except that each individual data value is represented with a dot
Dotplot
47
An experiment in which neither the participants nor the experimenters know who belongs to the treatment group and who belongs to the control group
Double Blind Experiment
48
The probability that either event A or event B will occur - How it is calculated depends on whether the events are overlapping or non overlapping
Either/Or Probability
49
In probability, a collection of one or more outcomes that share a property of interest
Event
50
The frequency one would expect to occur by chance in a given cell of a two way table if the row and column variables were independent of each other
Expected Frequency
51
The mean value of the outcomes of a random variable
Expected Value
52
A study in which researchers apply a treatment to some or all of a group of subjects and then observe its effects
Experiment
53
An effect that occurs when a researcher or experimenter somehow influences subjects through such factors as facial expression, tone of voice, and/or attitude
Experimenter Effect
54
In cases where cause and effect may be involved, the variable that represents the cause; that is, a change in the explanatory variable causes or explains a change in a second (response) variable - In regression, it is the independent or x variable
Explanatory Variable
55
A type of scale in which increments grow by powers of values (for example, powers of 10), or exponents - Also called logarithmic scale
Exponential Scale
56
A number used to determine the statistical significance of a hypothesis test conducted through analysis of variance (ANOVA)
F Test Statistic
57
A test result that incorrectly indicates that the condition being tested for is absent when in fact it is present
False Negative
58
A test result that incorrectly indicates that the condition being tested for is present when in fact it is not
False Positive
59
A description of the variation of a data distribution in terms of the minimum value, lower quartile, median, upper quartile, and maximum value
Five Number Summary
60
The number of times data values fall within a particular category
Frequency
61
A table that lists all the categories of data in one column and the frequency for each category in another column
Frequency Table
62
The mistaken belief that a streak of bad luck makes a person "due" for a streak of good luck or the mistaken belief that a streak of good luck will continue
Gambler's Fallacy
63
Data that can be assigned to different geographical locations
Geographical Data
64
A number used to describe the level of equality or inequality in an income distribution
Gini Index
65
The mechanisms by which carbon dioxide and other greenhouse gases warm a planet
Greenhouse Effect
66
Certain atmospheric gases, such as carbon dioxide and water vapor, that can trap heat and warm a planet
Greenhouse Gases
67
A bar graph showing a distribution for quantitative data (at the interval or ratio level of measurement) - The bars have a natural order, and the bar widths have specific meaning
Histogram
68
In statistics, a claim about a population parameter, such as a population proportion, p, or a population mean, u
Hypothesis
69
A standard statistical procedure for testing a claim about a population parameter
Hypothesis Test
70
Two or more events for which the outcome of one does not affect the probability of the other(s)
Independent Events
71
A number that provides a simple way to compare measurements made at different times or in different places - The value at one particular time (or place) must be chosen to be the reference value (or base value) - The index number for any other time or place is index number = value divided by reference value x 100
Index Number
72
A branch of statistics that deals with using sample data to make inferences about populations
Inferential Statistics
73
The increase over time of prices and wages - Its overall rate is measured by the CPI
Inflation
74
A complex graphic that presents a large, interrelated set of information in a visual way
Infographic
75
A level of measurement for quantitative data in which differences, or intervals, are meaningful but ratios are not - Data at this level have an arbritary zero point
Interval Level of Measurement
76
An important result in probability that applies to a process for which the probability of an event A is P(A) and the results of repeated trials are independent - It is stated as follows: if the process is repeated through many trials, the larger the number of trials, the closer the proportion should be to P(A) - Also called Law of Averages
Law of Large Numbers
77
A distribution in which the values are more spread out on the left side; a negatively skewed distribution
Left Skewed Distribution
78
A hypothesis test that involves testing whether a population parameter lies to the left (lower values) of a claimed value
Left Tailed Test
79
The number of additional years a person of a given age can be expected to live, on average - It is based on current health and medical statistics and does not take into account future changes in medical science or public health
Life Expectancy
80
A graph showing the distribution of quantitative data as a series of dots connected by lines - The horizontal position of each dot corresponds to the center of the bin it represents and the vertical position corresponds to the frequency value for the bin
Line Chart
81
A value used to determine a range of values that is likely to contain a population parameter - Its size depends on the desired level of confidence
Margin of Error
82
The sum of all values in a data set divided by the total number of values - Its what is most commonly called the average value
Mean
83
The middle value in a sorted data set (or the value halfway between the two middle values if the number of values is even)
Median
84
For binned data, the bin into which the median data value falls
Median Class
85
A study in which researchers analyze many individual studies (on a particular topic) as a combined group, with the aim of finding trends that were not evident in the individual studies
Meta Analysis
86
The most common value (or group of values) in a data set
Mode
87
A simple extension of a regular bar graph, in which two or more sets of bars allow comparison of two or more data sets
Multiple Bar Graph
88
A simple extension of a regular line chart, in which two or more lines allow comparison of two or more data sets
Multiple Line Chart
89
A technique that allows the calculation of an equation that represents the best git between one response variable (such as price) and a combination of two or more explanatory variables (such as weight and volume)
Multiple Regression
90
A correlation in which two variables tend to change in opposite directions, with one increasing while the other decreases
Negative Correlation
91
Absence of any apparent relationship between two variables
No Correlation
92
A level of measurement for qualitative data that consist of names, labels, or categories only and that cannot be ranked or ordered
Nominal Level of Measurement
93
A relationship between two variables that cannot be expressed with a linear (straight line) equation
Nonlinear Relationship
94
Two events for which the occurrence of one precludes the occurrence of the other
Non-overlapping Events
95
A symmetric, bell shaped distribution with a single peak that corresponds to the mean, median, and mode of the distribution - Its variation can be characterized by the standard deviation
Normal Distribution
96
The value in a data set that divides the bottom n% of data values from the top (100 - n)% - A data value that lies between two percentiles is often said to lie in the lower percentile
Nth Percentile
97
A specific claim (such as a claim that a population parameter has a specific value) against which an alternative hypothesis is tested
Null Hypothesis (H0)
98
A study in which researchers observe or measure characteristics of the members of a sample, but do not attempt to influence or modify these characteristics
Observational Study
99
A rule for comparisons that is stated as follows: if the compared value is P% more than the reference value, then it is (100 + P)% of the reference value - If the compared value is P% less than the reference value, then it is (100 - P)% of the reference value
Of Versus More Than (Less Than) Rule
100
A level of measurement for qualitative data that can be arranged in some order - It generally does not make sense to do computations with such data
Ordinal Level of Measurement
101
In probability, the most basic possible result of an observation or experiment
Outcome
102
A value in a data set that is much higher or much lower than almost all other values
Outlier
103
Two events that could possibly both occur at the same time
Overlapping Events
104
In a hypothesis test, the probability of selecting a sample at least as extreme as the observed sample, assuming that the null hypothesis is true
P Value
105
A bar graph of data at the nominal level of measurement, with the bars arranged in descending order
Pareto Chart
106
People (as opposed to animals or objects) who are the subjects of a study
Participants
107
Bias that occurs any time participation in a study is voluntary
Participation Bias
108
A process by which several experts in a field evaluate a research report before it is published
Peer Review
109
Values that divide a data distribution into 100 segments, each representing about 1% of the data values
Percentiles
110
A graph embellished with added artwork
Pictograph
111
A graph in the form of a circle divided into wedges, each of which represents the relative frequency of a particular category - The wedge size is proportional to the relative frequency, and the entire pie represents the total relative frequency of 100%
Pie Chart
112
Something that lacks the active ingredients of a treatment but looks or feels like the treatment - Thus, participants in a study cannot distinguish the placebo from the real treatment
Placebo
113
The effect whereby participants in a study improve simply because they believe they are receiving a useful treatment, when in fact they are receiving only a placebo
Placebo Effect
114
The complete set of people or things being studied
Population
115
The true mean of a population, denoted by the Greek letter u ( pronounced "mew")
Population Mean
116
Specific numbers describing characteristics of a population that a statistical study is designed to estimate
Population Parameters
117
The true proportion of some characteristic in a population, denoted by P
Population Proportion
118
The true standard deviation for an entire population, denoted by the Greek letter ó ("sigma")
Population Standard Deviation
119
A type of correlation in which two variables tend to increase (or decrease) together
Positive Correlation
120
The amount of detail in a measurement
Precision
121
The likelihood that a particular event will occur - The probability of an event, written as P(event), is always between 0 and 1 inclusive - A probability of 0 means the event is impossible and a probability of 1 means the event is certain
Probability
122
The complete distribution of the probabilities of all possible events associated with a particular variable - It may be shown as a table, graph, or formula
Probability Distribution
123
A study intended to collect data in the future from groups that share common characteristics
Prospective Study
124
Data consisting of values that describe qualities or nonumerical categories - Also called Categorical Data
Qualitative Data
125
Data consisting of values representing counts or measurements
Quantitative Data
126
The median of the data values in the lower half of a data set - Also called First Quartile
Lower Quartile
127
The overall median of a data set - Also called Second Quartile
Middle Quartile
128
The median of the data values in the upper half of a data set - Also called Third Quartile
Upper Quartile
129
Values that divide a data distribution into four parts with approximately 25% of the data values in each part
Quartiles
130
Errors that occur because of random and inherently unpredictable events in the measurement process
Random Errors
131
A sample in which every member of a population has an equal chance of being selected to be part of the sample
Random Sample
132
The process of ensuring that the subjects of an experiment are assigned to the treatment or control group at random and in such a way that each subject has an equal chance of being assigned to either group
Randomization
133
The difference between the lowest and highest values in a data set
Range
134
A guideline stipulating that, for a data set with no outliers, the standard deviation is approximately equal to the range divided by 4
Range Rule of Thumb
135
A level of measurement for quantitative data in which both intervals and ratios are meaningful - Data at this level have a true zero point
Ratio Level of Measurement
136
The actual measurements or observations collected from a sample
Raw Data
137
The number that is used as the basis for a comparison
Reference Value
138
The site of an absolute change in comparison to the reference value, expressed as a percentage - Relative change = new value - reference value divided by reference value x 100%
Relative Change
139
The size of an absolute difference in comparison to a reference value, expressed as a percentage - Relative difference = compared value - reference value divided by reference value x 100%
Relative Difference
140
The relative amount by which a measured value differs from the true value, expressed as a percentage - Relative error = absolute error divided by true value x 100%
Relative Error
141
The fraction or percentage of the total number of data values that fall in a particular category - Relative frequency = frequency in category divided by the total of all frequencies
Relative Frequency
142
A probability estimate based on observations or experiments that allow us to observe or measure the relative frequency of the event of interest - Also called Empirical Probability
Relative Frequency Probability
143
A sample in which the relevant characteristics of the members are generally the same as the characteristics of the population
Representative Sample
144
In cases where cause and effect may be involved, the variable that represents the effect; that is, the variable that changes in response to a change in another (explanatory) variable - In regression, it is the dependent variable, or y
Response Variable
145
A study that uses data from the past, such as official records or past interviews
Retrospective Study
146
A distribution in which the values are more spread out on the right side; a positively skewed distribution
Right Skewed Distribution
147
A hypothesis test that involves testing whether a population parameter lies to the right of (has a higher value than) a claimed value
Right Tailed Test
148
For statistical calculations, the practice of stating answers with one more decimal place of precision than is found in the raw data - For example, the mean of 2, 3, and 5 is 3.3333..., which is rounded to 3.3
Rounding Rule
149
A subset of the population from which data are actually obtained
Sample
150
The mean of a sample, denoted x (“x-bar”)
Sample Mean
151
The proportion of some characteristic in a sample, denoted p (“p-hat”)
Sample Proportion
152
Numbers that describe characteristics of a sample and that are found by consolidating or summarizing the raw data
Sample Statistics
153
The process of choosing some data from a population
Sampling
154
The distribution of a sample statistic, such as a mean or proportion, taken from all possible samples of a particular size
Sampling Distribution
155
Error introduced when a random sample is used to estimate a population parameter; the difference between a sample result and a population parameter
Sampling Error
156
A graph, often used to investigate correlations, in which each point corresponds to the values of two variables - Also called Scatter Diagram
Scatterplot
157
Bias that occurs whenever researchers select their sample in a biased way - Also called Selection Effect
Selection Bias
158
A survey in which people decide for themselves whether to participate - Also called Voluntary Response Survey
Self Selected Survey
159
Sampling done in such a way that every possible sample of a particular size has an equal chance of being selected
Simple Random Sampling
160
A statistical paradox that arise when the results for a whole group seen inconsistent with those for its subgroups; it can occur whenever the subgroups are unequal in size
Simpson's Paradox
161
An experiment in which the participants do not know whether they are members of the treatment group or the control group but the experimenters do know - Or, conversely, the participants do know but the experimenters do not
Single Blind Experiment
162
A distribution with a single mode - Also called Unimodal Distribution
Single Peaked Distribution
163
Guideline stating that, for a normal distribution, about 68% (actually, 68.3%) of the data values fall within 1 standard deviation of the mean, about 95% (actually, 95.4%) of the data values fall within 2 standard deviations of the mean, and about 99.7% of the data values fall within 3 standard deviations of the mean
68-95-99 Rule
164
A type of bar graph or line chart in which two or more different data sets are stacked vertically
Stack Plot
165
A single number commonly used to describe the variation in a sample of data, calculated as Standard deviation = Sum of all (deviations from the mean) squared divided by the total number of data values - 1
Standard Deviation
166
A score for a particular data value that indicates the number of standard deviations (usually denoted by z) between that value and the mean of the distribution Z = standard score = data value - mean divided by standard deviation
Standard Score
167
A measure of the likelihood that a result is meaningful
Statistical Significance
168
In a statistical study, a set of measurements or observations that is unlikely to have occurred by chance - The most commonly quoted levels of statistical significance are the 0.05 level (the probability of the results having occurred by chance is 5% or less, or less than 1 in 20) and the 0.01 level (the probability of the results having occurred by chance is 1% or less, or less than 1 in 100)
Statistically Significant Result
169
The data that describe or summarize something
Statistics (plural)
170
The science of collecting, organizing, and interpreting data
Statistics (singular)
171
A graph that looks much like a histogram turned sideways, with lists of the individual data values in place of bars - Also called stem and leaf plot
Stemplot
172
Subgroups of a population
Strata
173
A sampling method that addresses differences among subgroups, or strata, within a population - First the strata are identified, and then a random sample is drawn from each stratum - The total sample consists of all the samples from the individual strata
Stratified Sampling
174
An estimate of a probability based on experience or intuition
Subjective Probability
175
In a statistical study, the people, animals, or objects chosen for the sample
Subjects
176
A distribution in which the left half is a mirror image of the right half
Symmetric Distribution
177
Errors that occur when there is a problem in the measurement system that affects all measurements in the same way
Systematic Errors
178
Using a simple system to choose a sample, such as selecting every 10th or every 50th member of the population
Systematic Sampling
179
A distribution that is very similar in shape and symmetry to the normal distribution but that accounts for the greater variability expected with small samples - It approaches the normal distribution for large sample sizes
T Distribution
180
A number used to determine the statistical significance of a hypothesis test with the t distribution, for hypothesis tests involving a population mean, it is calculated with the formula t = x-u divided by s/n
T Test Statistic
181
A probability estimate based on a theory, or set of assumptions, about the process in question - Assuming that all outcomes are equally likely, the theoretical probability of a particular event is found by dividing the number of ways the event can occur by the total number of possible outcomes - Also called a priori probability
Theoretical Probability
182
A histogram or line chart in which the horizontal axis represents time
Time Series Graph
183
Something given or applied to the members of the treatment group in an experiment
Treatment
184
The group of subjects in an experiment that receive the treatment being tested
Treatment Group
185
A distribution with three modes
Trimodal Distribution
186
A test result that correctly indicates that the condition being tested for is absent
True Negative
187
A test result that correctly indicates that the condition being tested for is present
True Positive
188
A hypothesis test that involves testing whether a population parameter lies to either side of a claimed value
Two Tailed Test
189
A table showing the relationship between two variables by listing the values of one variable in its rows and the values of the other variable in its columns - Also called Contingency Table
Two Way Table
190
In a hypothesis test, the mistake of rejecting the null hypothesis, H0, when it is true
Type I Error
191
In a hypothesis test, the mistake of failing to reject the null hypothesis, H0, when it is false
Type II Error
192
A distribution in which all data values have the same frequency
Uniform Distribution
193
In a data distribution, values that are not likely to occur by chance, such as those values that are more than 2 standard deviations away from the mean
Unusual Values
194
Any item or quantity that can vary, or take on different values
Variable
195
In a statistical study, the items or quantities that the study seeks to measure
Variables of Interest
196
A measure of how widely data values are spread out about the center of a distribution
Variation
197
Data concerning births and deaths of people
Vital Statistics
198
A mean that accounts for differences in the relative importance of data values - Each data value is assigned a weight, and then this formula gives the weighted mean Weighted mean = sum of (each data value x its weight) divided by sum of all weights
Weighted Mean