Chpt 3 - Numerical Descriptive Measures Flashcards

(129 cards)

1
Q

How can we organize numerical data?

A

Graphical Methods

Numerical Methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does a histogram compare to a bar chart in what data they are representing?

A

They are similar but bar charts are for categorical data and histograms are for numerical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does a histogram compare to a bar chart in how close the bars are to each other?

A

Bars are touching in a histogram, but not a bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does a histogram compare to a bar chart in what each bar represents?

A

Bar charts have each bar representing a different variable, but in a histogram each bar represents a group of values that the variable can take

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does a histogram compare to a bar chart in the height of each bar?

A

In a bar chart, the height of a bar is determined by frequency or relative frequency.

In a histogram, the height of the bar is the frequency or relative frequency of the group of values that the bar represents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How should we group the values when making a histogram for discrete data with only a small number of distinct values?

A

Single value grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When should single value grouping be applied to a histogram?

A

When using discrete data with only a small number of distinct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is single value grouping for a histogram?

A

Each bar represents a distinct value (similar to bar charts)

The height of the bar is determined by the frequency or relative frequency of the corresponding values in the sample

These would be called a frequency histogram or a relative frequency histogram respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of histogram uses the height of the bar to represent relative frequency?

A

relative frequency histogram :)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How should we group the values when making a histogram for discrete data with many distinct values?

A

Limit grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the steps to making a histogram using limit grouping?

A
  1. Choose an appropriate range which includes all the distinct values
  2. Divide the range into sub-intervals of equal strength
  3. Summarize the data using f or f/n table. Here a frequency is the number of individuals falling into a sub-interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When should limit grouping be applied to a histogram?

A

When using discrete data with many distinct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the number of sub-intervals that work best for limit grouping? Explain

A

Should be between 5-20

Otherwise it won’t tell information about the data. Imagine if there was only one bar in the histogram or each bar corresponding to a distinct value with 100 values. Gross lol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Let’s say we want to analyze how many hours per week students are studying. A survey of 20 people gave answers ranging from 5 hrs to 96 hours. How would you sub-intervals to make the limit grouping histogram?

A

Option A:
0-19
20-39
40-59
60-78
80-99

Option B:
0-9
10-19
etc. (would give 10 sub-intervals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What grouping is applied to continuous data when making a histogram?

A

Cutpoint grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is cutpoint grouping used in a histogram?

A

When using continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is cutpoint grouping?

A

Used for continuous data, it defines sub-intervals such athat any value (decimals or whole number) in an interval can be assigned to one, and only one, sub-interval. This is because the possible values that continuous variable can take is any number in an interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the steps to creating a histogram using cutpoint grouping?

A
  1. Choose the whole interval which includes all of the data values
  2. Divide this whole interval into 6 sub-intervals of equal length (i.e. 0-under 10, 10-under 20 etc.)
  3. Count the number of individuals falling into each sub-interval and summarize in a frequency or relative frequency table
  4. Plot the histogram with 1 bar corresponding to a sub-interval and the height of the bar = frequency or relative frequency as desired
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the purpose of organizing data?

A

To analyze the distribution of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is distribution and what are it’s 2 important features?

A

Distribution of a variable is a table, graph, or formula that provides

  1. All the possible values that this variable can take
  2. How often these values occur
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why is it important to determine the shape of the distribution of a variable?

Give an example

A

Plays a role in determining the appropriate inferential methods to analyze its data

If the distribution of a variable is bell shaped, a lot of inferential methods can be applied to analyze its data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the 3 important aspects when describing the shape of a distribution?

A

Symmetry

Skewness

Modality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is symmetry in regards to distribution shape?

A

The left side of the distribution mirrors the right side, such as a bell-shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is skewness in regards to distribution shape?

A

Used for an asymetric shape and therefore has a longer tail to one side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
If a distribution has a longer left tail, what is this called?
Left skewed, or negatively skewed
26
What is it called when the distribution has a longer right tail?
Right skewed or positively skewed
27
What is left skewed distribution?
When the left has a longer tail (so the peak is to the right)
28
What is right skewed distribution?
When the right has a longer tail (so the peak is to the left)
29
What is modality in regards to distribution shape?
Its the number of peaks in a distribution. May have one (unimodal), two (bimodal), or many (multimodal)
30
What is a unimodal distribution?
There is only one peak in the distribution
31
What is called when there are many peaks in the distribution?
multimodal
32
What is bimodal distribution?
When there are 2 peaks in the distribution
33
What are 2 well-known distribution shapes?
Bell-shaped Uniform
34
What are the features of a bell-shaped distribution?
Unimodal Symmetric
35
What is another name for a bell-shaped distribution?
Normal distribution
36
What are the features of a uniform model of distribution?
1. If all the possible values that a variable can take have equal chance to happen, the distribution of this variable is a uniform distribution 2. Uniform distributions have no mode and are symmetric
37
Give examples of graphical methods for organizing numerical data (4)
-histogram graph -stem-and-leaf diagram -dot-plot -boxplot
38
Give examples of numerical methods for organizing numerical data (2)
-calculating center of data (mode, mean, median) -calculating spread (range, IQR, standard deviation)
39
What is the leaf?
The rightmost digit of the data value 2005 - leaf is 5 34 - leaf is 4
40
What is a stem?
All data values except the rightmost digit 2005 - stem is 200 34 - stem is 3
41
What are the stem an leaf values of 15?
Leaf - 5 Stem - 1
42
What are the stem and leaf values of 183
Leaf - 3 Stem - 18
43
What are the steps to creating a stem-and-leaf diagram?
1. Identify stem and leaf of each data value 2. Draw a vertical line, write the stems from the smallest to largest in the vertical column to the left of the vertical line 3. Write each leaf to the right of the vertical line in the same row as it's corresponding stem 4. Arrange the leaves in each row from the smallest to the largest
44
How is a dot plot read?
Each point corresponds to a data value. Points of the same value are stacked
45
What are descriptive measures?
Using numerical methods to summarize numerical data which includes finding the center of a numerical data set and describing it's spread
46
What is the center of a data set?
The most typical value of the data set
47
What is the most typical value of a data set called?
Center
48
What are the 3 options for the center of a data set?
Mode, mean, median
49
20 students are asked who they are going to vote for in the next election, these are the results UCP - 8 Liberal - 5 NDP - 3 Green - 4 What is the mode?
UCP
50
What is the mode of a data set?
The value that occurs most frequently
51
20 students are asked who they are going to vote for in the next election, these are the results UCP - 8 Liberal - 5 NDP - 3 Green - 4 What type of data is this?
Categorical data
52
What value occurs most frequently in a data set?
The mode
53
16 students were asked how many email addresses they had and below are the results 1 email - 3 2 emails - 4 3 emails - 7 4 emails - 2 What is the mode?
3 emails
54
What is the mode in this data set? {2, 4, 1, 6, 5, 7}
There is no mode in this example as no value occurs more than once
55
What is the mode in this data set? {2, 4, 1, 2, 4, 6, 5}
Two modes: 2, 4
56
What does this symbol mean? x̄
Pronounced X Bar Denotes the mean of a data set
57
What does this symbol mean? ∑
Summation (or add up the included values)
58
What is the mean for the following data set? {5, 7, 10, 13, 15}
x̄ = ∑x / n ∑x = 5+7+10+13+15 = 50 n = 5 x̄ = 50/5 = 10
59
How do we denote a sample mean?
x̄ Pronounced X Bar
60
How do we denote a population mean?
μ Pronounced mu
61
How do you find the mean of the population?
μ = ∑x / N So you add up all of the individual values of the whole population, and then divide that by the number of individuals in the entire population
62
Is the sample mean the same as the population mean?
No, the sample mean is only an estimation of the population mean
63
Because the sample mean is only an estimate of the population mean, what do we introduce?
Error or sample error
64
How can we measure a sampling error?
By using statistical inferential methods (if we learn this later, I don't know it yet lol)
65
What are the steps to finding the median?
Sort the data values from the smallest to largest If the number of data values is odd, the median is the middle value of the sorted data If the number of the data values is even, the median is the average of the two values in the middle of the sorted data
66
What is the median?
A numerical value separating the higher half of values in a data set from the lower half
67
What is the numerical value separating the higher half of values in a data set from the lower half?
Median
68
How do you determine the median if the number of data values in a set is odd?
It is the middle value of the sorted data
69
How do you determine the median if the number of data values in a set is even?
It is the average of the two values in the middle of the sorted data
70
Find the median in the data set {4, 7, 9, 12, 101}
9 It's just the middle number in the ordered data set
71
Find the median of the following data set {1, 5, 2, 7, 9}
reorder to 1, 2, 5, 7, 9 Median is 5
72
Find the median of the following data set {3, 6, 2, 8, 4, 7}
reorder to 2, 3, 4, 6, 7, 8 Median is the average of 4 and 6 (4+6)/2 = 5
73
What can be used to describe the center of a data set?
mode, mean, median
74
How do we describe the center of categorical data?
Mode
75
What are the most common ways to find the center of a data set?
Mean and medians are used more commonly than mode
76
If a data set does not have outliers and its distribution is symmetric, what method should be used for describing the center of the data?
Mean
77
If a data set has outliers, what method should be used for describing the center of the data?
Median
78
How do we determine which method should be used for describing the center of data?
Mode - used for categorical data Mean - used for numerical sets that has symmetrical distribution and no outliers Median - used for numerical sets that have outliers
79
What is an outlier in a data set?
Observations very far away from most data values
80
What can be used to describe the spread of a numerical data set?
Range Interquartile range (IQR) Standard deviation
81
How do we calculate range?
Range = maximum-minimum
82
Determine the range of the following data set: {2, 8, 12, 38, 58}
Range = max - min Range = 58 - 2 = 56
83
Determine the range of the following data set: {38, 12, 39, 24, 24, 5}
Range = max - min Range = 39 - 5 = 34
84
What equation is used to determine the IQR?
IQR = Q3 - Q1
85
What does IQR stand for?
Interquartile Range
86
How much of the data set is included in the IQR?
The middle 50% of the data values
87
What does a small/large IQR tell us about the data?
Small IQR - small spread of the middle data values Large IQR - Large spread of the middle data values
88
What is Q2 equivalent to?
The median
89
What are the steps to determining the IQR?
1. Arrange data values in increasing order and determine the median (Q2) 2. Find the higher half and lower half of the data set 3. Find Q1, which is the median of the lower half, and Q3 which is the median of the upper half 4. IQR = Q3-Q1
90
Determine the IQR of the following data set: {13, 15, 21, 25, 26, 27, 30, 32, 34, 35, 38, 41, 43, 236}
Q2 (median) = 31 Q1 = 25 Q3 = 38 IQR = Q3-Q1 IQR = 38-25 =13
91
Determine the IQR of the following data set: {13, 15, 16, 20, 21, 25, 26, 27, 30, 31, 32, 32, 34, 35, 38, 38, 41, 43, 46}
Q2 (median) = 31 Q1 = 23 Q3 = 36.5 IQR = Q3 - Q1 IQR = 36.5-23 = 13.5
92
What is the best way to describe the range of a data set when there are outliers?
IQR
93
While the range of a data set is easy to find, what is it very sensitive to?
Extreme values or outliers
94
What does xi mean?
Data values in a set
95
What is a standard deviation?
The "average" distance between data values and the sample mean
96
What value determines the "average" distance between data values and the sample mean?
Standard deviation
97
What is the notation for sample standard deviation?
s
98
What does s stand for?
Standard deviation in a sample
99
Which standard deviation equation needs to be used if you only have the sums but not the individual values?
The computing formula
100
Which standard deviation equation should be used if you have all the individual values?
Either the defining formula or computing formula
101
What is the difference in outcome (or answer) between the defining and computing formulas of standard deviation?
Nothing, the answers are the same, they just get you there a different way
102
What is the defining formula for standard deviation
The square route of ∑(xi-x̄) squared ----------------------- n-1
103
What is the computing formula for standard deviation
The square route of (∑xi) squared (∑xi squared) - -------------------- n -------------------------------------------- n - 1
104
What is the difference between (∑xi) squared and (∑xi squared)
(∑xi) squared = the values are added and then squared (∑xi squared) = the values are squared and then added
105
What does the value of s tell us about the spread of a set of data values?
It tells us the "average distance between data values and the sample mean, so if the s value is large, the spread is large, if the s value is small, the spread is small
106
Generally speaking, if a data set has no outliers and is not skewed, what methods should be used to describe its center and spread?
Mean and standard deviation, respectively
107
What is standard deviation sensitive to?
Outliers
108
If a data set has outliers and is skewed, what methods should be used to describe its center and spread?
Median and IQR, respectively
109
What is μ?
The population mean Pronounced mu
110
What is σ?
The population standard deviation Pronounced sigma
111
What is the population standard deviation denoted by?
σ Pronounced sigma
112
Why is the population mean (μ) and population standard deviation (σ) usually unknown?
Because ALL of the population values are needed, but this is often impossible to obtain
113
What is a parameter?
Descriptive measure for a population including population mean (μ) or a population standard deviation (σ)
114
Is a parameter fixed or variable?
It is fixed, for example, a population has only one mean (μ)
115
What are statistics?
Descriptive measures for a sample such as sample mean (x̄) and sample standard deviation (s)
116
Are statistics fixed or variable?
They are variable; each sample is going to have slightly different values and therefore slightly different sample means (x̄) and sample standard deviations (s)
117
What are the properties of parameters?
fixed usually unknown
118
What are the properties of statistics?
easily calculated given examples varies from sample to sample
119
What is the five-number summary of a data set?
Minimum Q1 Q2 (mean) Q3 Maximum
120
What is a boxplot used for?
Provide a graphical display of the center and variation of a numerical data set
121
What is a boxplot based off?
The five-number summary
122
What are the steps to creating a box plot?
1. Draw short horizontal lines at Q1, Q2, Q3. Then connect them with vertical lines to form a box 2. Find potential outliers which are data values < lower limit or > upper limit and denote these outliers by dots in the boxplot 3. Find the max and min of the data values that are NOT outliers and draw short horizontal lines at these values; draw a "whisker" from the box to these lines
123
How do you find the upper and lower limits of a box plot?
Upper limit = Q1 - 1.5 X IQR Lower limit = Q3 + 1.5 X IQR
124
What can we tell about the data set distribution when a boxplot has an upper whisker that is longer than the lower whisker and there is a large distance between the Q2-Q3 with a small distance between Q1-Q2?
It is right skewed
125
How can you tell that a data set has a right skewed distribution when looking at a boxplot?
- upper whisker is longer than lower whisker - large distance between Q2-Q3; small distance between Q1-Q2
126
How can you tell that a data set has a left skewed distribution when looking at a boxplot?
- lower whisker is longer than upper whisker - large distance between Q1-Q2; small distance between Q2-Q3
127
How can you tell that a data set has a bell shaped distribution when looking at a boxplot?
- upper and lower whiskers have equal lengths - the box in the middle is divided into 2 equal parts
128
What can we tell about the data set distribution when a boxplot has a lower whisker that is longer than the upper whisker and there is a large distance between the Q1-Q2 with a small distance between Q2-Q3?
Left skewed distribution
129
What can we tell about the data set distribution when a boxplot when the upper and lower whiskers have equal lengths and the box in the middle is divided into 2 equal parts?
It has a bell shaped distribution