L2 Multiple Regression Flashcards
(46 cards)
What is multivariate analysis?
Analysing the impact of multiple variables for predicting changes in Y.
What are the two types of multivariate analysis?
Dependent and independent
What are two analysis of dependence?
multiple regression and discrimination
What are three analysis of independence?
cluster analysis, principal component analysis, factor analysis
What is the purpose of multiple regression?
It is designed to isolate the effects of each x-variable (predictor) upon the y-variable
What is a coefficient?
A numerical description of an x-variable’s relationship upon the y-variable
What is a regressor/predictor?
an independent variable
What is the model?
term used to describe the collection of regressors involved in predicting y
Given an example of a multiple regression case and 5 potential regressors?
Plant growth rate (y) is dependent upon 1) temperature, 2) radiation, 3) carbon dioxide, 4) nutrient supply, 5) water
How and WHY does the line of best fit changes for multiple regression?
Because there are more x-variables, their relationship is now collectively non-linear so the line becomes a plane of best fit
How does the r^2 statistics differ in multiple regression?
it becomes r^2v(adj) [adjusted r squared value].
What is the adjusted r squared metric and why does it differ from its linear counterpart?
The linear r squared considered how much of y was explained by a singular predictor. Whereas the adjusted version measures how well the whole model explains variance in Y.
What happens to the adjusted r squared value if you add more regressors to the model and why?
The adjusted r squared can either increase or decrease. This is because the other regressors may either contribute an increase to the understanding/explanation of y, or they may not be relevant and so cloud out/decrease the model’s overall explanation of Y.
What is crucial before we compare adjusted r squared values?
Standardizaiton
Why do we need to standardize in order to compare adjusted r squared values?
because of the different data, scales and models between different samples that make it inappropriate to compare
What does scale dependency mean and why is it important?
In multiple regression because there are different regressors inputted in to the model, of which they have different units of measurement, it means that when you add another regressor, the initial ones change i.e. their scale is dependent upon the others. Importantly however the way they change is not necessarily correct because of the different units of measurement between the different variables
What is an example of scale dependency being important ?
Plant growth rate - water is measured in millilitres, carbon dioxide is measured in ppm usually. This means that they respond in incorrect ways
What technique allows us to overcome the scale dependency?
standardization
What happens to the name of the coefficients as they are standardised?
they transform from partial regression coefficients in to beta regression coefficients
To overcome the different units of the different regressors what happens to the units?
They become z units
What is the t value?
the significance of the coefficient in explaining the variance
What is multicollinearity?
When the regressors in a model correlate with each other
Why is multicollinearity a problem for multiple regression?
Because multiple regression seeks to determine the specific effect of each regressor separately upon Y.
If multicollinearity is present, then how would the finding of one regressor impacting Y in a certain way be hampered?
because a regressor correlates with another regressor then the impact of the initial regressor upon Y would have to be shared between both regressors, and there would be no way of determining which one is a more important regressor. Furthermore, they may both be controlled by an underlying factor