Chance and data Flashcards
Bar graphs
Always support statements with statistical data from graphs)
- Shape
(Eg. Both graphs are similar shape because both dot plots are unimodal) - Symmetry
(Eg. Both dot plots are reasonably symmetrical, but both have a few older competitors which skews the distributions slightly to the right) - Shift
(Eg. The peak on the athletics graph is located higher up the age scale than that for Swimmers) - Overlap
(Eg. The ages of the middle 50% of the competitors are much the same) - Centre
(Eg. The median age of swimmers is younger than the median age of athletics competitors) - Spread
(Eg. The age range for athletics is larger than that for swimmers)
Writing probabilities
- Probabilities can be written as fractions, decimals or percentages
- Probabilities can not be less than 0 or greater than 1
Converting probabilities
• fraction > decimal (Divide numerator by denominator) • decimal > percentage (Multiply decimal by 100) • percentage > fraction (Write percentage as a fraction of / 100 and simplify)
Probability equation
Theoretical probability
Probability (event) = Number of favorable outcomes / Total possible number of outcomes
(Number of favorable outcomes is how many times the result should occur)
Probability equation
Experimental probability
Probability (event) = Number of outcomes / Total possible number of outcomes
(Number of outcomes is how many times the result did occur)
• This is for when the probability of an event is difficult or impossible to calculate. Many trials are done and the amount of times an event occurs is recorded. The true value of the probability will not be known, but the greater the number of trials, the closer the estimated probability will be to the actual probability.
Expected number of outcomes equation
Expected number of outcomes = Probability (event) x Number of trials
Combining probabilities
(Probability tree)
For calculating probabilities where several events occur
- Make a probability tree by deciding what the events are, and in what order they occur. Write the events at ends of the branches.
- Write the probabilities of each event on the middle of each branch, and check that these each add to 1.
- Calculate the probabilities at each end by multiplying the probabilities along each branch, and write the probability at the end of each branch.
- To find the overall probability for a question…
If one event occurs OR another event occurs > Add the probabilities
If one event occurs AND another event occurs > Multiply the probabilities.
Combining probabilities
(Two-way frequency table)
For calculating probabilities where several events occur
• When given a table with probabilities, first calculate the actual numbers and then fill them in using the number of the entire population given.
• To find the overall probability for a question…
If one event occurs OR another event occurs > Add the probabilities
If one event occurs AND another event occurs > Multiply the probabilities.
(Tick the boxes to which they apply to help)
Data handling
When collecting data we take a sample from a population
• A sample of 30 is considered to be sufficient for most purposes. A larger sample means you can have more confidence in findings.
• Bias occurs when some members of the population are more likely than others to be selected for the sample so that it does not accurately represent the population.
(Eg. ‘self-selected’ samples)
• To avoid bias, every member of the population has an equal chance of being sampled
(Eg. ‘random’ samples)
‘Self selected’ samples
‘Self selected’ samples occur if a member of the population decides whether they will be selected or not.
Eg. Ringing a radio station, filling in a form, going to a website to give feedback, completing a survey
(People may choose not to respond, and only those with an interest in the topic of the survey will be in the sample)
Eg. Surveying in a particular location
(Only those who go to that location, have time to stop and answer will be in the sample)
‘Random’ samples
‘Random’ samples occur if every member of the population has an equal chance of being selected.
Eg. Writing names on equal-sized pieces of paper and drawing them from a hat, giving every member of the population a number and using random numbers to decide who is selected, using random numbers to decide who will be selected from the electoral roll, or selecting every (5th) person as the (____).
Measures of center
Measures of center give a measure of where the middle of a distribution lies.
Mean, median, mode
• The median is middle data value and the best measure of centre for the data, as it is not distorted by very large or small values and is clearly able to be calculated for each set of data. Whereas, the mean is the sum of all data values / the total number of data values which represents the average data value and is therefore distorted by very large or small values. The mode is the data value which occurs most frequently, which is also unreliable as a measure of centre as often there are two or no modes (if there are more than 2 modes, there is no mode).
Measures of spread
Measures of spread give a measure of how widely spread the data is.
Upper quartile, lower quartile, inter-quartile range, range
• The inter quartile range is the difference between the upper and lower quartiles (IQR = UQ - LQ) and the best measure of spread for the data, as it is not distorted by very large or small values. Whereas, the range is the difference between the maximum and minimum values and is therefore distorted by very large or small minimum and maximum values.
Upper quartile and lower quartile
(UQ) The upper quartile is the middle data value of the top half of the data
(LQ) The lower quartile is the middle data value of the bottom half of the data
Displaying data
- Dot plots
A visual representation of each data point - Box plots
A visual representation of each 25% of the data
(Useful for representing data and comparing sets of data, but it does not show the distribution of all the data points and it is affected by very large or small values)
How to describe dot plots/box plots
Comment on…
- Mean and median (data distribution)
- Whisker length/skew OR Modes > 1
- Gaps and clusters in data (if relevant)
- This shows… (in context to data, eg. women are better than men)
Describing dot plots/box plots
If mean and median are close) (mean, skew
- The mean (Mean) and the median (Median) are very close because the data is distributed fairly symmetrically around the mean.
- The length of the right whisker is much longer than the left because of just one value (Maximum value).
- The data is clustered in the center because the gaps within the box are smaller than the lengths of the whiskers, which means the lower quartile, median and upper quartile are close together (small gaps).
Describing dot plots/box plots
If mean and median are close) (median, even
- The mean (Mean) and the median (Median) are very close because the data is distributed fairly symmetrically around the median.
- The dot plot shows the data is bimodal (has 2 modes). This is not shown by the box and whisker plot.
- The data is fairly evenly spread because there is a gap in the middle of the data, which means the minimum, lower quartile, median, upper quartile, and maximum are fairly evenly spread.
Describing dot plots/box plots
If mean and median are not close
- The mean (Mean) is larger than the median (Median) because the ‘tail’ of bigger numbers to the right increases the mean, but not the median.
- The data is skewed to the right.
- The data is mostly clustered to the left, because the length of the right whisker is larger than the left, which means the minimum, lower quartile and median are close together (small gaps).
Drawing conclusions from box plots
(for the ‘lengths of individuals)
A. |——-| | |————|
B. |———-| | |———|
- If the boxes of each sample overlap with each other and both medians lie within the box of the other sample.
(comment on center)
The (lengths) in sample B tend to be greater than those in sample A, because the box in B extends further to the right than that of A. This is supported by the median in B (__) being bigger than the median in A (__).
(comment on spread)
Although the range is the same for both samples (__), the IQR for B (__) is larger than that of A (__), so the (lengths) of the middle 50% in B are more spread than those in A.
For the population :
We can not conclude that (individuals) in population B tend to be (longer) than (individuals) in population A because both of the medians lie within the middle 50% of the other sample, and the boxes overlap.
Drawing conclusions from box plots
(for the ‘lengths of individuals)
A. |——-| | |————|
B. |————–| | |———|
- If the boxes of each sample overlap with each other and one or both medians lie outside the box of the other sample.
(comment on center)
The (lengths) in sample B tend to be greater than those in sample A, because the box in B extends further to the right than that of A. This is supported by the median in B (__) being bigger than the median in A (__).
(comment on spread)
Although the range is the same for both samples (__), the IQR for B (__) is larger than that of A (__), so the (lengths) of the middle 50% in B are slightly more spread than those in A.
For the population :
We can conclude that (individuals) in population B tend to be (longer) than (individuals) in population A on average because although the middle 50% of each sample overlap, both medians lie outside the boxes of the other group.
Drawing conclusions from box plots
(for the ‘lengths of individuals)
A. |——-| | |————–|
B. |——————-| | |———|
- If the boxes of each sample do not overlap with each other and both medians lie outside the box of the other sample.
(comment on center)
The (lengths) in sample B tend to be greater than those in sample A, because the box in B extends further to the right than that of A. This is supported by the median in B (__) being bigger than the median in A (__).
(comment on spread)
The IQR for B (__) is larger than that of A (__), so the (lengths) of the middle 50% in B are more spread than those in A.
For the population :
We can conclude that (individuals) in population B tend to be (longer) than (individuals) in population A on average because the boxes do not overlap and both medians lie outside the boxes of the other group.
Time series
Draw a trend line
- State the overall trend
- Describe seasonal variation
• State the variation around the trend line (remains consistent, increases/decreases) - Unusual features
Limitations of data
(If two graphs are given, make sure comments are comparative).
Time series
- State the overall trend
(If trend is positive) For both (graph 1) and (graph 2), the overall trend is increasing over the time frame. This is because, the (y axis value) increases by approximately (\_\_) in (graph 1) and increases by approximately (\_\_) in (graph 2) over the (\_\_) years shown. This increase is greater for (y axis value) in (graph 1).
(If trend is constant) For both (graph 1) and (graph 2), the overall trend is steady over the time frame. This is because, the (y axis value) increases and decreases by approximately (\_\_) over the (\_\_) years shown.