lecture 3 - sem 1 Flashcards
(28 cards)
what measures location (central tendency)
mean
median
mode
what measures dispersion (variability)
range and percentile
quartiles and interquartile range
mean deviation
variance
standard deviation
what is range
the difference between the maximum and the minimum values in a data set
range=max-min
what is percentile
provides information about how the data are spread over the interval from the smallest value to the largest value
what is the pth percentile
is a value such that at least p per cent of the observations are less than or equal to this value and at least (100-p) per cent of the observations are greater then or equal to this value
how to calculate the pth percentile
arrange the data in ascending order
compute an index i
i= (p/100)n
p=percentile of interest
n=number of observations
if i is not an integer, round up
the next integer greater than i denotes the position of the pth percentile
if i is an integer the pth percentile is the value in positions i
which percentile does the median represent
the 50th percentile
what are quartiles
when data is divided into 4 parts containing approximately 25% ot the observations
Q1=25th percentile
Q2=50th percentile
Q3=75th percentile
what is an interquartile range
measure of variability that overcomes the dependency on extreme values is the interquartile range
difference between Q3 and Q1
what is a box plot/ whisker plot
chart often used in explanatory data analysis
shows the distribution of numerical data and skewness through displaying the data quartiles and averages
what 5 key points does the box plot show
minimum, maximum, Q1, Q3
what is central tendency
provides a single value that represents the centre or typical value of a dataset
mean
median
mode
what are distributional measures
how the data is spread out or dispersed across the range of values
range
variance
standard deviation
skewness
kurtosis
what do scatter plots represent
uses dots to represent values obtained for 2 different variables
shows relationship between the 2 variables
what are line graphs used for
value of something over time
what is covariance
mean of the product of the deviations of x and y from their respective means
cannot be used to indicate how strong the relationship is
what does a covariance larger than 0 show
positive relationship
what does a covariance smaller then 0 show
negative relationship
what does a covariance equal to 0 show
no relationship
what does the coefficient of correlation show
the strength of the relationship between 2 variables
what does a coefficient of corelation close to 1 show
strong positive relationship between 2 variables
positive slope
what does a coefficient of corelation close to -1 show
strong negative relationship between 2 variables
negative slope
what does a coefficient of corelation close to 0 show
no linear relationship