lec 5 variables and scales Flashcards Preview

biostatistics > lec 5 variables and scales > Flashcards

Flashcards in lec 5 variables and scales Deck (31)
Loading flashcards...

what is a variable

a condition/ characteristic that can change or have different values 

defining characteristics

  • attribute that describes a person/ place/ thing
  • value can vary betw/ diff entity's 


qualitative vs quantitative variables 


qualitative:  values that are names/ labels 

quantitative: numeric variables that measure quantity


discrete vs continuous variables

continuous variable: a variable that can have any value bet/ it's minimum and max values

discrete: variable that can't have any value betw/ min and max 


univariate vs bivariate data 

Univariate data: when a study consists of only one variabe 

Bivariate data: when a study examines the relationship bet/ 2 variabels


what is a nominal scale 

  • lowest  statistical measurement level
  • this scale is given to items that are divided into categoris without any order or structure 
    • e.g. 
      • gender
      • eye colour
        • blood type 


what is the ordinal scale 

consists of variables that have an inherent order to the relationship among diff categories

  • a ranking of responses that may have diff meaning among individuals 
  • allows gross order but not the relative distance between them as the distance is not equal 
  • properties of ordinal scale:
  • 1)Identity: quality being measured 
  • 2) Magnitude:  amount of the quality being measured gives a quantitative distance betw/


what is the interval scale 

variables that have a constant and equal distances between values but the zero point is arbitrary 


  1. identity
  2. magnitude
  3. equal distance:  shows how the difference bet/ points

e.g. IQ score, pain scale w/ no, 



what is a ratio scale 

top level of measurement with all the properties of abstract an abstract number system but with an  absolute zero


  1. identity
  2. magnitude
  3. equal distance
  4. absolute zero: allows how many times greater one case is from another
  • allows use of all mathematical operations 
    • e.g.
      • wieght,
      • pulse rate
      • respiratory rate


what is a measure of central tendency / central location 

a single value that attempts to describe a set of data by identifying the central position within the data set

  • mean 
  • median 
  • mode


describe the mean §

  • most familiar measure of central tendancy
  • most common value in the data set even though its not one of the values=> min error
  • used wi/ discrete and continous data, latter most common
  • sum of all the values divided by no of values in data set to min error
  • includes every value of data set
  • only central tendency w/ the sum of deviations  of each value from mean = 0
  • sample mean = X bar
  • populaiton mean = µ


what is the main disadvantage of the mean 

very susceptible to outliers (values unusual compared to data set by small/ narge numerical value)

mean can be skewed by these values

if so the median is a better measure of central tendency 


when not to use the mean and use the median instead

presence of outliers

skewed distribution- the mean moves away from the centre but the median stays central and is least influenced

  • in normal distribution: mean= median=mode



what is the median 

the middle score for a set of data that has been arranged in order of magnitude

  • least affected by outliers and skewed data
  • order the values and find te middle, if even no. find mean of the two 


what is the mode 

most frequent score in the data set. 

  • highest bar on histogram
  • used for categorical data when the most popular option is sought after 


problems with the mode

  • non unique,
    • causes problems when 2 values are equally popular
    • even more problematic for continous data as a finding an exact mode is unlikley=> mode is rarely used w/ continous data 
  • if the mode is far from the rest of the data in the set then it's inaccurate


which data sets are best used in normal and sxewed distributions 

normal distribution 

  • mean or median can be used but mean ideal because: 
    • as it has least amount of error since it includes all values in data set
    • any change in the scores affects the value of the mean but not mode /meadian 


skewed distribution

  • the mean is dragged in the direction of the skew so the median is best
  • increased skew increases the ddx bet/w mean and median


normal distribution w/ NON NORMAL DATA SETS by normality tests 

  • median> mean as a rule of thumb unless there's no reall dx betw/ median and mean


match the variable to the type of central tendency preffered 

nominal= Mode


interval/ratio(non skew)= Mean

interval/ratio skew=Median


 what is a measure of sprad // measure of dispersion

describes the variability i a sample/ population

  • used wlongside measures of central tendency to give a describtion of the overall data


what is the purpose of measuring a data spread

  • shows how well the central tendency represents the data
  • large spread suggests large diff betw individual scores and vv for small spread
  • consists of 
    • range
    • quartiles 
    • absolute deviation
    • standard deviation 



what is the range

the difference between the highest and lowest scores in a data set and is the simplest measure of spread

  • range =max value-min value
  • sets the boundraries for scores 
    • useful for measuring critilically high or low thresholds
  • detects errors  when inputing data 


what are quartile and interquartile ranges

quartiles: breaks data into quarters

even numbers: finds the mean of the 2 scores at the quarterly places in the data set 

odd number: the value at 25th, 50th and 75th, positions are the quartiles

Q2 i=median


benefits of qurtiles and what is interquartile range

  • less affected by outliers and skewed data like the median so are best choice for measuring the spread of these data sets
  • interquartile range= the dx bet/w Q3 & Q1 which shows the range in the mid half of the distribution score
    • Q3-Q1= interquartile range
  • semi interquartile range: half the interquartile range= (Q3-Q2) /2


Drawback of quartiles

they dont rake into account every score in the data set 


what is the absolute/ variance/ standard deviation


how to calculate absolute & mean absolute deviation

shows the amount of deviation/variation that occurs around the mean score

total variability: addition of the deviation of each score/ by the number of scores


the choice of absolute deviation, variance and standard deviation depends on the type of statistic 


  • easiest way to calc deviation = individual score minus mean score
  • values above mean are +ve and below are -ve
  • total variability would be 0 cause of the positive and negative cancelling so the signs are ignored and only absolute values are used = absolute deviation=>divided to give == mean absolute deviation


how to calc variance

achieves positive values of the deviations from the mean by squaring them 

addition of te squared deviations gives the sum of squares

the sum of squares is divided by n


  • if the values in the data are spread out from the mean then the variance is a large number 
  • if the values are closer to the mean then the variance is small 


  • problems with variance

  • squaring gives more values to extreme scores so is susceptible to outliers 
  • the units of variance are squared so they differ from the units of the data set so they can't be directly related to data set values 
  • calulating the standard deviation solves this problem


what is the standard deviation 

a measure of the spread of scores w/in a data set 

sample SD's divver from population SD's in their calculation


when to calculate the pop SD

  1. if data on entire pop is present
  2. if the sample is all you're interested in and don't want to generalize your result

when to use the sample SD: if you have sample data and wish to generalize to population


NB: the sample SD is not a deviation of the sample itself but an estimate of the pop SD based on sample date


which type of data of data should be used to calculate SD

  • SD is used along w/ the mean to summarize continous data NOT CATEGORICAL DATA 
  • anly appt if the data is normally distributed/ non skewed 



: for a normal distribution nearly all of the data will fall within three standard deviations of the mean 



what are the three parts of the empirical rule

  1. 68% of data falls inthe 1st SD from the mean: µ ± 1xSD
  2. 95% falls w/in 2 SD's: µ ± 2xSD
  3. 99.7% fall w/in 3SD;s: µ ± 3xSD


aka the 3 sigma rule