Theory Flashcards
(36 cards)
What is nominal data?
Assigns observations to unordered categories. Examples include hair colour, political party etc. They can be graphed with a pie or bar chart (being discrete) and summarised with number/percentages
What is ordinal data?
Assigns observations to ordered categories. Examples include level of education, meaningfullness categories etc. They can be graphed with a pie or bar chart (being discrete) and summarised with number/percentages
What is interval/ratio data?
Assigns scores on a scale with quantitative information, with outcomes of calculations being sensible. Ratio scale also has a true meaningful zero-point. Examples include reaction time, IQ scores etc. They can be graphed with a histogram and boxplot and summarised with a 5 number summary and mean+stdev
What are the special features of mode?
It is not sensitive to outliers, making it a resistant measure. There can also be multiple modes in a data set
What are the special features of median?
It is a resistant measure
What are the special features of mean?
It is not a resistant measure
What are the special features of range?
it is not a resistant measure
What are the special features of variance?
It is not a resistant measure and it is often expressed in the square of the unit of the variable (ie. cm^2)
What are the special features for standard deviation?
It is not a resistant measure and is expressed in the same unit as the variable
What is the 1.5 x IQR rule?
If a value is higher than 1.5 x IQR + Q3 or lower than 1.5 x IQR - Q, the value is an outlier
What are some characteristics of a density curve?
- The curve is always above the horizontal axis
- The total area under the curve is equal to 1
- The area under the curve and above any range of values is the proportion of all observations that fall in that range
When does the empirical rule apply?
When the distribution is normal
What is the z-score?
The number of stdevs that an observation is from the mean
What is a correlation as a measure?
It is a measure that indicates how strong a linear relationship is between 2 quantitative variables. It is sensitive to outliers.
Why is it always beneficial to create a scatterplot before calculating a correlation?
To check whether the association is linear and whether there are outliers present in the data set.
What is an independent variable?
A variable that explains or causes changes in the response variable
What is a dependent variable?
A variable for which a predication is made
What is a regression line?
A straight line in a scatterplot that is as close as possible to the points in the graph
What is the meaning of the slope and intercept in a regression line?
Slope is the amount by which y changes when x increases by 1 unit
Intercept is the value of predicted y when x = 0
What is a residual and what is its correct notation?
The difference between the observed score yi and the predicted score y(hat)i that reflects the “error” you make in the prediction of observation i:
yi - y(hat)i
What is the principle of ordinary least squares regression?
The best line is the line with the smallest sum of the squared residuals (vertical distances from the points to the regression line):
min∑(yi - y(hat)i)^2
What is the meaning of correlation coefficient in relation to the regression line?
A measure of the linear association between x and y which indicates how well the point in a scatterplot follows the straight line.
How can you use R^2 to examine the fit of the regression model?
The square of the correlation determines the percentage of variance in y that is explained by changes in x (percentage explained variance). The higher this percentage, the better prediction of y
What is a residual plot and how can you use it to investigate the fit of the regression model?
A graphical representation (scatterplot) of the residuals with on x-axis, the value of the independent variabel x, and on the y-axis, the value of the residuals. In case of a good model fit, the residuals are small and have a value around 0; in case of perfect model fit, all residuals are equal to 0