Statistics Flashcards
(31 cards)
discrete vs continuous data
discrete:
- set number of values, eg shoe size
continuous:
- can have any value, eg height
definition:
population
total set of possible values that could be selected for the sample
definition
sampling unit
a single member of the population
definition
sample
a selection of sampling units observed to make conclusions about population as a whole
definition
sampling frame
a list of all members of the population
advantages and disadvantages:
sample
advantages
- less time consuming/ expensive
- fewer people to respond
- less data to process than census
disavantages:
* data may not as accurate as census
* may not be large enough to give info abt small sub groups of population
dis/advantages
census
pros
* should give accurate results
cons
* time / expensive
* can’t be used when testing process destroys the item
* hard to process large quantity of data
Systematic sampling definition
A sample is formed by choosing members of a population at regular intervals using a list
stratified sampling
- population divided into specific groups & random sample taken from e/ group
- proportion chosen from group equal to proportion sample size n is of total population N
pros and cons of stratified sampling
PROS
* useful when very diff groups in population
* sample represenative of population structure
* members selected randomly
CONS
* can’t be used if not possible to split population into specific groups
* same cons as simple random
opportunity sampling
sample is formed using available members of population who fit criteria
Pros and cons of opportunity sampling
PROS
* Quick and easy
* useful when list of population not possible
CONS
* unlikely to be representative of population structure
* likely to produce biased results
pros and cons of quota sampling
PROS
* useful when sampling frame not available
* sample will be representative of population structure
CONS
* may introduce bias as some members of the population may choose not to be sampled
in a data set
outliers are
any data points 2 standard deviations more or less than mean
in a box plot
outliers are
any data point that is 1.5x IQR more or less than upper or lower quartile
how to work out estimated mean in a frequency table
- mid interval value (x)
- frequency (f)
- Efx / f
coding
measure of location is affected by:
measure of spread is affected by:
measure of location is affected by: all operations
measure of spread is affected by: only multiplication or division
linear interpolation
what do you do to the value when finding quartiles / percentiles for discrete data?
- decimal number: round up
- whole number: take average of x and next number
How to work out outliers?
if not in the range:
[Q1-1.5(IQR)] , [Q3+1.5(IQR)]
2 events CANNOT be both:
independent and mutually exclusive
because
- when mutually exclusive: P(A n B) = 0
- when independent: P(A n B) = P(A) x P(B) and these 2 cannot be equal
to work out P(A l B’):
P(A n B’) / P(B’)
probability
condition for independency:
P(AnB) = P(A) x P(B)
condition for mutually exclusive:
P (A n B) = 0
What is a histogram?
- A histogram: for grouped continuous data whereas a bar chart: discrete or qualitative data
- no gaps betw
- Whilst in a bar chart the frequency is read from the height of the bar, in a histogram the height of the bar is the frequency density
- On a histogram frequency density is plotted on the y– axis. This allows a histogram to be plotted for unequal class intervals
- It is particularly useful if data is spread out at either or both ends
- The area of each bar on a histogram will be proportional to the frequency in that class