Intro Flashcards
(47 cards)
Load libraries into R
library(Caret)
Edit data in R
fix()
View names in data frame
names()
Load Variable Names Into Environment So Don’t have to type the name of columns
attach(DataFrame)
basic linear model function R
lm.(y~x, data = DataFrame)
display statistics on model
first send model output to variable lm.output = lm(y~x, data=Dataframe)
Then, summary(lm.output)
Show information after fitting a model
names(model.output)
summary(model.output)
Show model coefficients and confidence intervals
coef(model.output) #shows the coeff
confint(model.output) #shows the 95% conf interval for the coefficients
Use model to predict new values
Predict()
Predict(model.output, dataframeofx’s, interval=”confidence)
Prediction vs. Confidence Interval
When predicting a new data point, want prediction interval. Confidence Interval is about the where the average of future values lie.
To get PI, Predict(model.output, dataframeofx’s, interval = “prediction”)
Scatter Plot with Regresion Line
Plot(x, y)
abline(model.output, lwd=3, col= “red”) #adds line to scatterplot
lwd is for width
See diagnostic plots of linear regression
plot(model.output) #Automatically does it, b/c model output contains it, wow!
if there are 4 graphs, first create 4 tiles, so first do:
par(mfrow=c(2,2))
how does predict() work
predict(model.output) will return a vector of predicted Y values
predict(model.output,
inspect functions
type function name predict
if there is call to method, use methods(methodname)
Get max of vector
which.max(vector), returns index of max position
Shorthand formula for regression in R
formula = lm(Yvariable ~ ., data = DATAFRAME)
instead of writing x1 + x2 + etc you can just put a dot.
Function to use when there is Colinearity
Need to see the Variance Inflation Factor VIF, part of car package.
library(car)
vif(lm.fit) #use on model output
remember VIF > 5 indicates colinearity
How to see a correlation matrix
cor() all columns must be numeric, if a column isn’t numeric use matrix notation such as cor(data.frame[,-9])
Logistic Regression R
logreg = glm(Direction ~Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = SMarket, family = binomial) key is family = binomial, using the glm() genaralized linear model function
When doing logistic regression on a factor variable with two levels, how do we handle the dummy coding?
we dont have to do anything, the glm function will dummy code it for you automatically! However, you can retireve what the dummy coding values are by using contrasts()
attach(Smarket)
contrasts(Direction)
How to use logistic regression output to predict values in your dataset
predict(logreg, type = ‘response’)
the type = ‘response’ tells R to output probabilities instead of other variables.
How to convert Logistic regression probabilities into actual predictions.
- Create a vector corresponding to “zero” probability.
logreg. predict = rep(‘Down’, 1250) - Rename vector based on probability
logreg. predict[logreg.probs > .5] = ‘Up’
How to create a confusion matrix
use the table function
table(VectorOfPredictions, Vector of True Values)
table(logreg.predict, Direction)
The two vectors have to have the same values like “Up/Down”, so make sure they are converted to same values.
sub select training data in time series
train = (Year<2005) #Stores binary vector in trainingset Smarket.2005 = Smarket[!train, ] #Data before 2005 Direction.2005 = Direction[!train]