Behavioural Analytics Flashcards
(202 cards)
What does GAM stand for?
Generalised Additive Model
Why are GAMs useful in looking at emotion?
We have to understand how emotions change over time - looking at people interacting we often have time series data.
What are often used to look at emotional states?
Often use valence and arousal to look at the emotional state.
What is trace annotation?
Looks at videos that people are looking at and says what is their valence and arousal at any time.
This typically goes up and down. Our classical statistical techniques we use (eg linear models) aren’t great for this. Therefore we need GAMs.
What doe GAMs allow?
Allows you to analyse trace data in a way that is similar to the ideas within regression.
There are advanced modelling abilities.
What command in R is used for linear models?
lm()
What is the equation for a straight line?
y = mx + c
y = a + Bx + E
where E (epsilon) represents the errors around the line
In regression, what should we always be checking?
Our assumptions of linearity
Check the partial plots (in a multiple regression) or regression plots and investigate to see if the data has some curved nature.
Eg curvature in residuals vs the fitted values suggests a straight line is not a good way to capture the model.
What is Anscombes Quartet?
A visual warning that you should always visualise the data and do some EDA.
Different versions of data have similar statistics but are different.
What is a more modern version of Anscombes Quartet?
The Datasaurus dozen
What is Simpson’s Paradox?
A statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations.
What are other non-linear models like GAMs?
LOESS - locally estimated scatterplot smoothing
ARIMA - auto regressive integrated moving average
What is the function for a GAM?
y = a + f(x) + E
It is the regression function as before, but we swap out the single beta coefficient for a function.
This function allows us to come up with a way of capturing the data - splines that combine to make a non-linear smooth representation of the function.
Coefficients tell you about the nature of the basis functions, you had them together to give you an overall “wiggly” line.
What do we call the combination of basis functions?
A smooth
What is the line of code to produce a GAM model?
model <- gam(dependent ~ s(predictor), data)
What is different about gam() compared to lm()?
The predictor variable is wrapped in s() which instructs R to come up with a function which best fits this data.
Once we have created a GAM model, what functions do we call on the model?
- summary(model)
- coef(model)
- plot(model)
- gam.check(model, pages = 1)
In the output, what is the EDF?
Effective degrees of freedom
What does a very small p value represent?
When there is a good fit to a curve.
What does gam.check() do?
Checks to see if there is enough of a curve.
We don’t want a very small P value in the GAM check.
Higher p-values are preferred because they suggest the model residuals are well-behaved.
What should we change about the GAM model?
Change the basis functions or knots by adjusting the k argument within the smooth function. K controls the wiggliness of the lines.
model <- gam(response ~ s(predictor, k = 15), data)
What is concurvity?
An issue we need to deal with, the smooth equivalent of collinearity.
What determines the wiggliness of a smooth?
- The number of knots / basis functions
- The smoothing parameter - lambda
How do we change the lambda smoothing parameter?
Use the term sp = within the GAM specification
model <- gam(response ~ s(predictor, k = 15), sp = 0.1, data)