lec 5 variables and scales Flashcards
what is a variable
a condition/ characteristic that can change or have different values
defining characteristics
- attribute that describes a person/ place/ thing
- value can vary betw/ diff entity’s
*
qualitative vs quantitative variables
qualitative: values that are names/ labels
quantitative: numeric variables that measure quantity
discrete vs continuous variables
continuous variable: a variable that can have any value bet/ it’s minimum and max values
discrete: variable that can’t have any value betw/ min and max
univariate vs bivariate data
Univariate data: when a study consists of only one variabe
Bivariate data: when a study examines the relationship bet/ 2 variabels
what is a nominal scale
- lowest statistical measurement level
- this scale is given to items that are divided into categoris without any order or structure
- e.g.
- gender
- eye colour
- blood type
- e.g.
what is the ordinal scale
consists of variables that have an inherent order to the relationship among diff categories
- a ranking of responses that may have diff meaning among individuals
- allows gross order but not the relative distance between them as the distance is not equal
- properties of ordinal scale:
- 1)Identity: quality being measured
- 2) Magnitude: amount of the quality being measured gives a quantitative distance betw/
*
what is the interval scale
variables that have a constant and equal distances between values but the zero point is arbitrary
properties:
- identity
- magnitude
- equal distance: shows how the difference bet/ points
e.g. IQ score, pain scale w/ no,
what is a ratio scale
top level of measurement with all the properties of abstract an abstract number system but with an absolute zero
properties
- identity
- magnitude
- equal distance
- absolute zero: allows how many times greater one case is from another
- allows use of all mathematical operations
- e.g.
- wieght,
- pulse rate
- respiratory rate
- e.g.
what is a measure of central tendency / central location
a single value that attempts to describe a set of data by identifying the central position within the data set
- mean
- median
- mode
describe the mean §
- most familiar measure of central tendancy
- most common value in the data set even though its not one of the values=> min error
- used wi/ discrete and continous data, latter most common
- sum of all the values divided by no of values in data set to min error
- includes every value of data set
- only central tendency w/ the sum of deviations of each value from mean = 0
- sample mean = X bar
- populaiton mean = µ
what is the main disadvantage of the mean
very susceptible to outliers (values unusual compared to data set by small/ narge numerical value)
mean can be skewed by these values
if so the median is a better measure of central tendency
when not to use the mean and use the median instead
presence of outliers
_skewed distributio_n- the mean moves away from the centre but the median stays central and is least influenced
- in normal distribution: mean= median=mode
what is the median
the middle score for a set of data that has been arranged in order of magnitude
- least affected by outliers and skewed data
- order the values and find te middle, if even no. find mean of the two
what is the mode
most frequent score in the data set.
- highest bar on histogram
- used for categorical data when the most popular option is sought after
*
problems with the mode
-
non unique,
- causes problems when 2 values are equally popular
- even more problematic for continous data as a finding an exact mode is unlikley=> mode is rarely used w/ continous data
- if the mode is far from the rest of the data in the set then it’s inaccurate
which data sets are best used in normal and sxewed distributions
normal distribution
- mean or median can be used but mean ideal because:
- as it has least amount of error since it includes all values in data set
- any change in the scores affects the value of the mean but not mode /meadian
skewed distribution
- the mean is dragged in the direction of the skew so the median is best
- increased skew increases the ddx bet/w mean and median
normal distribution w/ NON NORMAL DATA SETS by normality tests
- median> mean as a rule of thumb unless there’s no reall dx betw/ median and mean
match the variable to the type of central tendency preffered
nominal= Mode
Ordinal=Median
interval/ratio(non skew)= Mean
interval/ratio skew=Median
what is a measure of sprad // measure of dispersion
describes the variability i a sample/ population
- used wlongside measures of central tendency to give a describtion of the overall data
what is the purpose of measuring a data spread
- shows how well the central tendency represents the data
- large spread suggests large diff betw individual scores and vv for small spread
- consists of
- range
- quartiles
- absolute deviation
- standard deviation
what is the range
the difference between the highest and lowest scores in a data set and is the simplest measure of spread
- range =max value-min value
- sets the boundraries for scores
- useful for measuring critilically high or low thresholds
- detects errors when inputing data
what are quartile and interquartile ranges
quartiles: breaks data into quarters
even numbers: finds the mean of the 2 scores at the quarterly places in the data set
odd number: the value at 25th, 50th and 75th, positions are the quartiles
Q2 i=median
benefits of qurtiles and what is interquartile range
- less affected by outliers and skewed data like the median so are best choice for measuring the spread of these data sets
-
interquartile range= the dx bet/w Q3 & Q1 which shows the range in the mid half of the distribution score
- Q3-Q1= interquartile range
- semi interquartile range: half the interquartile range= (Q3-Q2) /2
Drawback of quartiles
they dont rake into account every score in the data set
what is the absolute/ variance/ standard deviation
how to calculate absolute & mean absolute deviation
shows the amount of deviation/variation that occurs around the mean score
total variability: addition of the deviation of each score/ by the number of scores
the choice of absolute deviation, variance and standard deviation depends on the type of statistic
- easiest way to calc deviation = individual score minus mean score
- values above mean are +ve and below are -ve
- total variability would be 0 cause of the positive and negative cancelling so the signs are ignored and only absolute values are used = absolute deviation=>divided to give == mean absolute deviation