Data analysis Flashcards
(5 cards)
DESCRIPTIVE STATISTICS
help to summarise data sets and make it easy to se any obvious patterns or trends
MEASURE OF CENTRAL TENDENCY
mathematical way to find a midpoint/average score from a data set
MEAN: measure of central tendency calculated by adding up all the data points and dividing by the number of items in data set
+ uses all the data points in a set to calculate average - most informative measure
- can only be used with certain types of data (quantitative)
- outliers can skew the data
MEDIAN: measure of central tendency that identifies the middle score of data set
+ not affected by extreme scores
- doesn’t reflect outliers well
- can only be used with certain types of data (quantitative)
MODE: measure of central tendency that identifies the most frequent data point(s) in a data set
+ can be used with qualitative and quantitative data
+ provides info about frequency
+ not affected by extreme scores
- data may have several modes (bi-modal = 2 modes)
- doesn’t reflect outliers well
OUTLIER
data point that differs significantly from other data points in the set - may be due to variability in measurement
MEASURE OF SPREAD
mathematical way to describe the variation or dispersion within a data set
STANDARD DEVIATION: average difference between each score in a data set and the mean - the higher the value the more variation in your scores - “the mean of the squares minus the square of the mean”
+ all values are taken into account - more precise and sensitive measure of spread
+ looks at difference between each data point and the mean deviation - not just the extremes
- time consuming to calculate
- may hide some characteristics of the data (e.g doesn’t tell us of the data is positively or negatively skewed)
RANGE: the difference between the biggest and smallest values in the data set
+ simple measure of spread
+ easy to calculate
+ shows outliers
- doesn’t take into account the number of values in data set
- doesn’t tell us if data is clustered or spread out
GRAPHS
visual representation to help researchers quickly communicate their results
BAR CHART:
- used for discrete data
- has gaps between each bar cuz they’re not related in a linear way
- often compare mean/median/mode of different levels of IV
- bars can also represent totals/frequencies etc.
- DV on y axis, levels of IV on x axis
- can’t check distribution of data
HISTOGRAM:
- used for continuous data (can be measured on an infinite scale)
- shows distribution of items in a data set
- freq. of DV (percentage or number count) on y axis, DV on x axis
- allows us to check if the distribution of data is skewed
SCATTER GRAPH:
- used to display correlational study
- each point on the graph represents the point where a Ps data points on the two co-variables meet
- shows relationship between two co-variables
- helps to interpret relationship between co-variables and correlation coefficients
- a regression line (line of best fit) may be added to show the trend in data
- strong/weak positive, strong/weak negative, none
- no causation in correlation - correlations are not experiments