module 11 logistic regression Flashcards by Nina de Goede

what is logistic regression?

It is about predicting group membership.For example, consider the question, ‘Is someone a narcissist or not? Yes or no?’. What we’re interested in predicting is whether someone is going to be classified as a narcissist or not, and we do that based on our independent variables. Is someone likely to be a narcissist depending on how many selfies they might take over a period of a week, for example?In this example, we have independent variables and dependent variables, just like we do in regression, the difference being we’re predicting the probability that a case will belong to one group as opposed to another
You want to be able to predict a person’s group membership, or category membership, from their scores on one or more independent variables. Like multiple regression, you can have one independent variable, or you can have multiple independent variables. But the DV is always only two groups. So when doing a binary logistic regression, the DV is always two groups.
There is multinomial logistic regression, where you can have more than two groups as a DV, but for this unit, we’re only looking at binary logistic regression.
We know the DV has to be dichotomous. But the IVs can be a range of things. They can be continuous, like in multiple regression, or they can be dichotomous, like in multiple regression, male or female, for example, or they can be categorical with more than two categories.

We’re always predicting through questions such as, ‘What’s the probability or chance that someone is in one group compared to another?’. The following questions are example hypotheses. We can ask:
-Can we predict whether or not a person will survive an operation or not based on their pre-operational health status, their age, their gender, or their condition type?
-Can we predict the likelihood that a person will be found guilty or not of rape by knowing a jury’s level of conservatism, the jurors’ gender, and the jurors’ age?
-Can we predict success or failure on the quantitative methods exam by knowing a person’s IQ, their past marks, and the number of lectures they attended?

How well did you know this?

Not at all

Perfectly

logistic regression compared to linear regression

Both logustic regression and linear regression aim to predict a dv from one or more i/v’s together. In both, can get a B value (coefficient) but they are calculated differently. The B value indicates size and direction of relationship.
In logistic regression, the d/v is always dichotomous, whereas in linear regression it must be continuous. The relationship b/n the dv and iv in logistic regression, is not entirely linear. Linear regression predicts scores on the dv, whears logistic regression predicts probabilities of being in one group or another.
In logistic regression, we are asking “ how well do the independent variables classify people into different groups? And, How much does the probability of being in one group as opposed to another, increase or decrease with the independent variables?”

How well did you know this?

Not at all

Perfectly

logistic regression compared to discriminant function analysis

Both aim to predict group membership from a combination of independent variables.
The group variable (i.e. the DV) in logistic regression can only have 2 categories. Discriminant function analysis can have more than two groups or categories.
All predictors in logistic regression can be categorical.
Fewer assumptions in logistic regression.

How well did you know this?

Not at all

Perfectly

Differences between logistic regression and all other general linear models (GLM)

General linear model (GLM) means that whole groups of statistics are looking to find linear relationships, like multiple regression, correlations, or ANOVA. They tend to have similar assumptions.
But logistic regression is not based on a linear relationship. It’s based on what could be described as a sort of curvy linear relationship by the nature of the equation that it uses to calculate the estimates that we need. Therefore, the assumptions are different in logistic regression. In logistic regression, there are no assumptions about the following:
-equal sample size in each group—one of the benefits of logistic regression is that you can compare groups that might be rare
-linear relationships because that’s not what we’re testing
-equality of variance
-normality.

How well did you know this?

Not at all

Perfectly

How to use logistic regression?

You use logistic regression when you want to predict group membership, when you only have two groups as the dependent variable and when you have categorical independent variables. Additionally, when your assumptions for other techniques are violated, you might choose instead to use logistic regression. (but make sure adjust research question so logistic regression is the appropriate analysis).
Logistic regression is also used in medical and health research using a function called an odds ratio, but that function is not part of this module.

How well did you know this?

Not at all

Perfectly

What does logistic regression do?worked example.

The research question: What is the probability that a psychiatrist will breach confidentiality with their client?
This example is not based on any real data.

As psychology students, you are aware that confidentiality with your clients is sacred, however, there are times psychologists are required to breach confidentiality if there’s a risk that the client will harm either themselves or someone else, and if they’ve committed a serious crime. What is the chance of a psychiatrist actually breaching confidentiality?

For this exercise, our total sample size of psychiatrists is 144. If the chance of breaching confidentiality is based on nothing (0), then we could say there’s a 50% probability that psychiatrists will breach confidentiality. There are going to be dependent factors, but for now, all we know is that it is 50–50. The chance of breaching confidentiality versus non-breaching could be 50% compared to 50%. This can be described as a ratio, which you will often find in logistic regression, and for this example, the ratio would be 1:1.

Let us now assume we have information about the psychiatrists, a variable where we’ve measured the psychiatrist’s perception of the likelihood that a client would harm others. It has been measured on a 10-point scale, where the psychiatrist rates from 0% when they think the client is not at all likely to harm others, through to 100% where they believe the client will harm others. This exercise provides us with a rating scale of how harmful the psychiatrist perceives the client to be.

We’re also interested in the probability of breaching confidentiality, compared to not-breaching, based on how harmful the psychiatrist has rated the client.

Logistic regression first calculates the predicted probability of breaching confidentiality for each value of the independent variable. What is the probability of breaching confidentiality for a 0% harm score? What is the probability of breaching confidentiality for a score of 10%, the probability of breaching confidentiality for a score of 20%, and so on? This is calculated by the following formula, which has contained within the linear regression line, like the constant ‘a’ plus ‘beta 1’ times the actual value of the variable. You don’t need to know that formula, but you do need to understand that it’s different from logistic regression, although it does contain that element of linear regression within it.

How well did you know this?

Not at all

Perfectly

What does logistic regression do?worked example2

There’s the linear part of the formula and then the whole formula. What the formula does is map out what’s called a binomial distribution. This means we’re no longer looking at a linear relationship, we’re looking at a binomial relationship, so a two-variable relationship. On the y-axis, you’ve got the predicted probability of a person being in a certain group. In our example, it’s the predicted probability that a psychiatrist will breach confidentiality, which is the probability on the y-axis. On the x-axis, we’ve got the actual IV values, the harm score, which goes from 0 up to 100. The formula will plot this binomial distribution as a vaguely S-shaped curve, so we can work out the probability that a psychiatrist will break confidentiality based on any of the scores on the harm scale that we want.

How well did you know this?

Not at all

Perfectly

What does logistic regression do?worked example3

Remember, this graph is the probability of being in the group where you’re breaching confidentiality and often, people like to work out what the median effective level is. However, for our research question and with logistic regression, it’s different in that the probability doesn’t increase the same amount as you move up for every unit of the harm scale. There’s a point where you can get to the 50% probability. The 50% probability, which is often called the median effective level, is like a threshold, that’s the point where change goes from one category to another. You flip over from not breaching to breaching.
People often want to know something, such as which score on the harm scale is going to correspond to this threshold or the median effective level. For example, if we use the curve shape, we can actually find it by finding 50% probability, with 50% being the threshold. Once you find 50%, where you could be either in one or the other group, it acts as a cut-off, then we go to the curve and align down to see that it corresponds to a harm score of 60. Once the psychiatrist’s readings of the harm score get to 60, that’s the point when they might be likely to shift into breaching confidentiality.

How well did you know this?

Not at all

Perfectly

how to calculate logistic regression

spss»Analyse»regression»binary logistic.
Add in the dependent variable and covariates (independent variables).

How well did you know this?

Not at all

Perfectly

outputs

There are numerous outputs:
-chi squared statistic
-more fit statistics (pseudo R squared)-classification of cases
-the equations table
-the odds ratio
-confidence intervals

How well did you know this?

Not at all

Perfectly

chi squared statistic

It can be broadly described as a fit statistic, which indicates how well a model with predictors in it is explaining or predicting the probabilities in the dependent variable compared to a model with no independent variables.
A significant chi-square can function in a similar way to the ‘f’ test in multiple regression. If the ‘f’ test in multiple regression is significant, it indicates that the R squared is significant, and the IVs are explaining significant variance in the DVs. A significant chi-square shows that independent variables are successfully predicting the probability of being in one group or another group on the dependent variable. These are similar concepts but differ in some ways—both are important sources of information.

How well did you know this?

Not at all

Perfectly

more fit statistics -pseudo Rsquared

The accompanying image shows a model summary table, which creates fit statistics to the chi-square. The focus here will be on understanding the chi-square. The Cox and Snell R square and the Nagelkerke R square are called ‘pseudo R squares’ because, although a lot of people think they are similar to R square in regression. However, this is not strictly speaking true.

The Nagelkerke R squared value is a variation of the Cox and Snell R square. It has been adjusted so that it goes from 0 to 1, with a higher value denoting a better value. Both these two values indicate that the model with the IVs is better than the model without the IVs in it.

How well did you know this?

Not at all

Perfectly

classification of cases

Rather than try and interpret R squares and similar measures, it can be more useful to look at how successful a model or equation is in classifying cases. The image here shows the option to select Classification plots, which results in the following classification table of results. Based on the independent variables we can see in the equation, which, for this example, is just harm, how well does that actually classify people into two groups of whether they’ll breach or not breach confidentiality?

How well did you know this?

Not at all

Perfectly

classification of cases2

The data output looks like the accompanying table. ‘Observed’ indicates the value psychiatrists got in the file. Where they are coded as ‘1’, they breached confidentiality. Where they are coded as ‘0’, they didn’t breach confidentiality. Those codes represent those two categories in our file which are observed. The two columns at the top of the table are the predicted classifications based on the model or harm scores.
The data output shows whether they were predicted not to breach confidentiality (0) and whether they are predicted by the equation to breach confidentiality (1). That’s how SPSS is coded: breach and not breach for the predicted. As the table shows, 103 psychiatrists didn’t breach confidentiality, and the model predicted them not to breach confidentiality. There are 14 who actually did breach confidentiality, who were also predicted based on their harm scores to breach confidentiality. Those two squares are like agreement.
There is some disagreement. Some psychiatrists who didn’t breach were actually predicted by the model to breach confidentiality; this is also true of the inverse prediction. There are 21 psychiatrists who did breach, but based on their harm scores, were predicted not to breach confidentiality. Working out percentages of those gives us an overall percentage correct classification score. In this case, overall, 81.25% of the psychiatrists were classified correctly based on this logistic regression equation, where we only have one predictor. That’s quite high for one predictor and a much more important measure of confidence of the logistic regression than provided by the pseudo R squares.

How well did you know this?

Not at all

Perfectly

the equations table

Variables in the equation table indicate whether our independent variables are actually predicting the probability of being in one group compared to another. Reading across the following table shows a regression coefficient (B) and a standard error (SE) for that coefficient, a Wald statistic, degrees of freedom (‘df’), significance (sig.), and the last column is the exponential of the B (Exp[B])—which is essentially the odds ratio. In SPSS, the odds ratio is the X bracket B bracket coefficient. Each of these results indicates as following:
-B gives us an idea of the direction of the relationship.
-The standard error of B provides us with an idea of the amount of error involved in obtaining the B value.
-The Wald statistic is a test of significance for the B value.
-The value with the label Exp(B) is called the ‘odds ratio’. It provides an indication of how much the odds of breaching confidentiality increase or decrease with a one-unit increase in the IV.

B indicates the direction of the relationship—here, you can see that it’s positive. It’s important to take note of how categories are coded because they can change their relationships and because categorical variables can be coded however we like. Here, harm is a continuous variable, so coding is less important for that, and we know that as the scores on harm go up, so does the rating of the chance they’re going to harm someone else. However, the coding of the DVs is important.

Remember, 1 was a breach, and 0 was not a breach. With a positive B, as the scores on the harm scale go up, so does the value in the DV, which is 1 (breach). If they’re coded the other way around, breach was 0, and not breach was 1, that B would be negative. B tells us the direction of the relationship. This is key information on how to interpret the direction of the relationship from the B: whether it’s significant or not and the odds ratio.

How well did you know this?

Not at all

Perfectly

the odds ratio (Exp [B])

Study These Flashcards

From the previous table, the odds ratio is 1.053. This figure is the natural log of B, it can be calculated using the exponential of B—SPSS does this for you.

It is expressed as a ratio 1:1.053.

Take off the 1, and it would be a 0.053 probability. For this exercise, we can round it off to 0.05, which translates to 5% when converted into a percentage. That would represent how much the probability of breaching confidentiality compared to not breaching goes up with a one-unit increase in the harm score.

The scale here is a representation of this. If you selected someone who rated the harm as 10 and someone who predicted the harm as 20, as it moves up from 10 to 20, the chance of breaching confidentially is going to increase by 5% by one unit. If you compared people that chose 30 with 70, it would go up by four times that (5% for each unit increase), increasing probability by 20%. That’s how you would interpret the odds correctly; the chance of breaching confidentiality increases by 5%, or the probability of breaching confidentiality increases by 5% as the harm scale goes up by one unit.

confidence interval (CI)

Study These Flashcards

If you want to get an idea of the error involved in the odds ratio, you can select confidence intervals (CIs). To do this in SPSS, select Options and then tick the CI for exp(B) option, as shown in the accompanying image. The default value for the confidence interval is 95%—generally, this is an acceptable level. If you want to be more strict, you could choose 99%, which will become clearer when we interpret the CIs.

The confidence intervals for our example, as can be seen in the accompanying image, show that the lower bound value is 1.033 and the upper bound value is 1.073. This means that we can be 95% confident that the odds ratio is between 1.033 and 1.073 in the population.

confidence interval2

Study These Flashcards

The following image shows the output from SPSS with the confidence interval data.
This is a small range, which suggests not much error in the odds ratio. If the range was large, it would suggest more error in the estimation of the odds ratio. The odds ratio is calculated from the exponential of the standard error (SE), so if the SE is small, we will get a smaller range, and if larger, a larger CI. Our SE is quite small (0.01), suggesting higher confidence in the B and the odds ratio. If we wanted to be more strict, we could ask for 99% confidence intervals, which would increase the range as our confidence is now stronger—99% compared to 95%. Usually, 95% is acceptable as it corresponds to the p < 0.05 level (and 99% to the p < 0.001 level).

what happens if you have more than 1 iv?

Study These Flashcards

Previously we considered the example of the likelihood of a psychiatrist breaching confidentiality increasing if the psychiatrist thinks there is a high chance the client will harm others (using a 10-point scale). What happens when we have more than one independent variable? What if we also thought that the likelihood of breaching confidentiality increases with other variables, such as:

the psychiatrist considers the client to be dangerous (0 = not dangerous, 1 = dangerous)
when the client has had a past history of assault (0 = no past history, 1 = past history)
when the psychiatrist is older rather than younger (1 = young, 2 = middle, 3 = older).
You now have one DV (breach of confidentiality) and 4 independent variables with their own scales: the harm scale (continuous on a scale of 0–10); dangerous classification (dichotomous); past history (dichotomous); and age of the psychiatrist (categorical). We now have the equation:

what happens if you have more than 1 iv?2

Study These Flashcards

There is now a B value for each of the independent variables.
To enter them in SPSS, you include the extra three IVs in the Covariates box. Because the three IVs are dichotomous or categorical, we need to tell SPSS this. Otherwise SPSS will treat them as continuous, and we would get different results. You do this by selecting the Categorical option and then using the arrow select to move each of the three IVs to the Categorical Covariates panel.
The variables ‘danger’ and ‘history’ have only two categories, so we don’t really need to look at any special contrasts as the analysis will just compare how being in one category compared to another (on the IV) increases or decreases the probability of being in the breach group. The variable age group, however, consists of three age groups: young, middle and older. We, therefore, have to make a decision on which age categories we want to compare. Of course, this depends on the research question. Imagine that the hypothesis is that more experienced psychiatrists are more likely to breach confidentiality than inexperienced psychiatrists. We would therefore be particularly interested in comparing the younger with, the older group. We could therefore ask SPSS for polynomial contrasts, which test for a linear relationship across categories.

Notice in the following image that we have kept the default contrast for ‘danger’ and ‘history’ as
(Indicator)
because there are only two categories to compare.
(Indicator)
just means it will compare the last category (as
Last
is selected as the Reference Category) with the other categories. If we had kept the indicator contrast for ‘age’, it would have compared the last age category (older) with the other two categories. If you wanted to change the indicator to the first category, you would simply click the
First
button. We have, however, asked for a polynomial contrast for the three age groups as our hypothesis is that the chance of breaching will increase (or perhaps decrease) as age increases.

output (more than 1 iv)

Study These Flashcards

DEPENDENT VARIABLE ENCODING

This tells us how spss has treated the codes for the variables. (how coded affects direction of relationship and odds ratio).
note that ‘no history’ was coded 0 and ‘history’ was coded 1, but SPSS has coded them the other way around. This will affect your analysis.
Note that ‘no history’ was coded 0 and ‘history’ was coded 1, but SPSS has coded them the other way around. This will affect your analysis.
The following image tells you how the categories of the independent variables are being compared. For the variables ‘history’ and ‘danger’, it is simply a comparison between the two groups. The 1 corresponds to 0 (e.g., ‘no history’) and the –1 corresponds to 1 (‘history’). For the polynomial contrasts for age groups, the contrasts are testing the hypothesis that the probability of breaching will increase with each age group starting from younger to older.

output (more than 1 i/v)2

Study These Flashcards

You get two comparisons for age groups labelled (1) and (2). (1) is the linear comparison, and (2) is the quadratic comparison. The quadratic contrasts test an inverse U-shape or the hypothesis that the chance that confidentiality will be breached increases for both the younger and older psychiatrists and decreases for the middle age group. The following image shows how these compare when graphed, where (1) is a linear contrast that’s represented by –0.707, 0, and then +0.707 and (2) the second contrast is represented by 0.408, –0.816, 0.408. Essentially, it’s testing those relationships, and you have a linear contrast and a quadratic contrast. If your B coefficients were significant, it would suggest that as age goes up to the linear 1, the probability of breaching goes up as well.
If the probability of your quadratic 1 was significant, it would mean that younger and older are more likely to breach than the middle group. You get the coding that SPSS has given to your dependent variable, and often that’s the same, but you need to check because sometimes it flips 1 and 0 around from what you’ve coded.

output (more than 1 i/v)3

Study These Flashcards

CHI SQUARED
The next output you get is a chi-square. For the example, the result is X2 (5) = 41.61, p < 0.001, which is significant. Together the 4 variables or 5 effects (remember there are two contrasts for age) significantly predict the probability of breaching confidentiality. That means that our model with the four predictors in it is a better fit than a model with no predictors.
We also get a classification table which you can see in the following image, which shows that with all the predictors together, you can see 83.3% of cases are correctly classified. It hasn’t gone up much from when we just had ‘harm’ as the only IV, so you could interpret that as the additional independent variables not doing very much.

output (more than 1 i/v)4

Study These Flashcards

3.EQUATION TABLE
From the coefficients in the table in the following image, you can see that the only IV with a significant Wald value was the variable ‘harm’. That is, the psychiatrist’s score on the ‘harm’ scale was the only variable to predict the probability of breaching confidentiality significantly. There is no need to look at the odds ratio if the Wald statistics are not significant, even though the odds ratio for the first age group comparison is around 14% (i.e. 1:1.1356). Just for the purposes of interpretation, if Wald was significant, then the odds ratio for the age group variable would suggest that with a change from one group to another (e.g. from the younger to middle group or the middle to the older group), the chance of breaching confidentiality would increase by 14%. Because this is a linear comparison (i.e. polynomial), we could conclude that the older group were 28% more likely to breach confidentiality than the younger group. The second comparison we get for the age group category is a test for a quadratic relationship between the age group and the predicted probability of breaching confidentiality. That is a U-shape relationship where the predicted probability of breaching confidentiality for the younger and older age groups would be significantly higher than the middle age group. Since the Wald statistics are not significant, there is no quadratic relationship between age group and breaching confidentiality.

output (more than 1 iv)5

4.CORRELATION MATRIX If we look back at the equation table, we see only the independent variable of 'harm' is really significant. None of the other IVs are significant at all. You will also notice that the odds ratio is slightly down, it was 1.05, and now it's 1.04. The 'harm' odds ratio has reduced from 5% to 4%. This is because the IVs, like in multiple regression, are controlling for the overlap or the inter-correlation between the IVs, as well. In logistic regression, to further understand what is happening, you can use the correlation of the estimates function. This gives you a correlation matrix of all of your effects correlated with other effects. You can see in the following image that 'harm' actually relates quite highly to the 'danger' variable. So that might be why danger didn't predict whether they'd breach confidentiality or not because it's highly related to the 'harm' scale. This is a similar concept to multiple regression, where the IVs are uniquely predicting the probability of being in one group versus the other.

output (more than 1 iv)6

5.ODDS RATIO Sometimes you get an odds ratio that goes under 1, as indicated in the following image. You get them under 1 when you have a negative relationship or a negative beta. If you get a negative beta, you probably wouldn't consider these because they're not significant. However, if they are, there is an odds ratio for the danger of 0.7290. To interpret that, or to rescale it into something that's more interpretable, because B is the inverse of the odds ratio, we can simply just divide that 0.7290 by 1 to get the actual odds ratio for the variable 'danger'. If we did that, 1 divided by 7.290, we would get 1.37. This means there is a 37% increase in the probability of breaching if the person has a history of danger. You can see how you can convert an odds ratio that goes under 1, which is often confusing when you first do logistic regressions. This will happen when you've got a negative beta.

logistic regression methods

There are several methods of logistic regression, just as there are several methods with multiple regression. Methods include: -Enter Method (default). All variables are entered at once. How well does the set of IVs predict the probability? -Sequential logistic regression (like hierarchical regression). IVs separated into blocks. Does one IV or set of IVs predict group membership over and above the others? -Stepwise Methods (use statistical criteria to add or take out independent variables so you end up with the best model that explains your probabilities). a)Forward: Variables are entered one at a time based on some criteria. What is the most parsimonious model? b)Backwards: Variables are removed one at a time until further removal harms the model. What are all of the significant predictors? Inclusion/exclusion can be based on three criteria: c)Conditional – probability of Chi-square change. Likelihood Ratio – change in the Exp(B). Wald – the significance of Wald statistic.

multicollinearity

Multicollinearity needs to be addressed because, like multiple regression, the relationship between the IVs is held constant. If independent variables are very highly correlated with each other, you will experience issues, as with multiple regression.

outliers

You can get a list of outliers in your output by selecting the standardised residuals. Because they're standardised, we can define an outlier as being greater than 2 or less than negative 2, depending on where they lie on the scale, similar to regression. They are derived differently from the category, the difference between their predicted membership and their actual membership.

sample size

It is important to have five times the number of cases as cells in the design. The cells depend on how many categories are in your IV and your DV, making it a large contingency table. In the example, there are 'to breach': yes or no; 'danger': yes or no; 'history' or 'no history' and three age groups. This means there is a 24-cell table. No cells should have 0 people in them. For example, you don't want to have psychiatrists who have indicated their client has no history, and they are not in the younger age group.

module 11 logistic regression Flashcards

(30 cards)