Chap I Flashcards

1
Q

Individuals

A

The objects described by a set of data - can be people, animals, or things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variable

A

Any characteristic of an individual - can take different values for different individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical Variable

A

places in individual into one of several groups or categories - values are names or labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative Variable

A

takes numerical values for which it makes sense to find an average - represent a measurable quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discrete Variables

A

A variable that cannot take on any value between its minimum and maximum value - for example, when flipping a coin, the number of heads can be any integer value between 0 and plus infinity, but could not be any value because you could not get 2.5 heads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Continuous Variable

A

A variable that can take on any value between its minimum and maximum value - for example, the weight of a firefighter between 150-250 pounds, because the firefighter’s weight could be any value between 150-250 pounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Univariate Data

A

A study that looks at only one variable - e.g. a study that looks at the weight of high school students

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bivariate Data

A

A study that examines the relationship between two variables - e.g. a study looking at the relationship between the height and weight of high school students.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Population

A

The total set of observations that can be made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sample

A

A set of observations drawn from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Census

A

A study that obtains data from every member of a population - often no practical because of time/cost involved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution

A

Tell us what values the variable takes and how often it takes those values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Inference

A

Drawing conclusions that go beyond the data at hand, though it depends on how the data is produced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Frequency Table

A

Displays counts (frequencies) of x variable in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relative Frequency Table

A

Displays percentages (relative frequencies) of x variable in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interquartile Range (IQR)

A

Measures of the range of the middle 50% of the data - measure of variability, equal to Q3 - Q1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Five-Number Summary

A

Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation & divides each distribution roughly into quarters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Boxplot

A

A type of graph used to display patterns of quantitative data & splits the data into quartiles, consisting of a box the size of the Q1 & Q3, with a line in the middle representing the median and lines, or whiskers, extending from the box to the largest and smallest observations that aren’t outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Standard Deviation

A

A numerical value used to indicate how widely individuals in a group vary - measures the deviation from the mean and differs based upon population or a sample. Standard deviation for a population is found using σ = sqrt [ Σ ( Xi - X )2 / N ] and standard deviation for a sample is found using s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Variance

A

A numerical value used to indicate how widely objects in a group vary and is equal to the square of standard deviation. Variance of a population is found using σ2 = Σ ( Xi - X )2 / N & variance of a sample is found using s2 = Σ ( xi - x )2 / ( n - 1 )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Roundoff Error

A

When the exact percentages add up to 100%, but the rounded percentages only come close - does not indicate mistakes in work

22
Q

Pie Chart

A

Shows distribution of categorical variable as a pie, with slices sized by count or percentage per category - must have all categories

23
Q

Bar chart

A

Represent each category as a bar, where heights show the count or percentage - can be more flexible than a pie chart and display the distribution of categorical variables or compare quartiles.

24
Q

Two-Way Table

A

Examines relationships between categorical variables - contains a row variable and a column variable

25
Marginal Distribution
The distribution of values of one of the categorical variables in a two-way table of counts among all individuals described by the table, though a percentage is often more informative. Divide the row/column total by the table total and convert to a percentage to get the MD.
26
Conditional Distribution
Describes values of that variable among individuals who have a specific value of another variable - separate conditional distribution for each value of the other variable, often uses relative frequencies
27
Segmented Bar Graph
A bar graph that uses one category to separate into bars (ex: male/female) and another divided into connected segments of the bar, adding up to 100%.
28
Side-by-Side Bar Graph
A bar graph where two categories (ex: male/female) are made of two (or more) separate bars for one category, the bars being repeated each category
29
Association
When knowing the value of a variable helps to predict the value of the other
30
Simpson's Paradox
An effect where the marginal association between two categorical variables is qualitatively different than the partial association between the same two variables - tldr - averages can be misleading
31
Dotplot
A plot where each data value is shown as a dot above its locative on a numberline.
32
Shape
Describes the way a graph looks - focus on the main features, such as major peaks, clusters, obvious gaps, and potential outliers
33
Mode
the most common value
34
Center
the midpoint of the data
35
Spread
similar to range, but not a singular value - data varies from __ to __
36
Range
A measure of variability that shows the full spread of the data - single value gotten by subtracting the smallest value from the largest value
37
Outlier
Any observation that falls more than 1.5 x IQR above the third quartile or below the first quartile
38
Symmetric Distribution
When the right and left sides of a graph are approximately mirror images of the other
39
Skewed Right
When the right side of the graph is longer than the left - in the direction of the tail
40
Skewed Left
When the left side of the graph is longer than the right - in the direction of the tail
41
Unimodal
having a single peak
42
Bimodal
having two clear peaks
43
Multimodal
having more than two clear peaks
44
Stemplot
A plot used to display quantitative data, usually from smaller data sets, consisting of a stem (including all but the final digits of an observation) and leaves (the final digit of an observation
45
Splitting Stems
Dividing a stem into further pieces - eg 0-9 stem becomes two 0 stems, one with a spread of 0-4 and the other with a spread of 5-9
46
Back-to-Back Stem
A stemplot plot where leaves are on either side of the stem, often to represent two different categories of data
47
Plots
A graphing technique used to represent a data set, often showing the relationship between or more variables
48
Histogram
A graph of distribution using quantitative data where nearby values are grouped together
49
Mean
An average score that shows how large each data value would be if the total were split equally amongst the observations & found by finding the sum of individual scores and diving it by the number of individuals. Not resistant measure of center.
50
Median
The midpoint of distribution, where around half of the observations are smaller than the value and about half are larger. Resistant measure of center.