L9 - Regression analysis Flashcards
(37 cards)
Cross-tabulation: Chi-square test definition (Malhotra, 2013)
A statistical technique that describes two or more variables simultaneously and results in a table that reflects the joint distribution of two or more variables that have a limited number of categories or distinct value.
When to use cross-tabulation
- to test the difference/association between variables
- to compare the behaviour and intentions for different categories of predictor variables such as income, sex and marital status.
Role of Cross-tabulation (Malhotra, 2013)
(1) Simple to conduct analysis and appealing to less sophisticated researchers.
(2) Results can be easily interpreted and understood.
(3) Clear interpretation provides a stronger link between research results and managerial action.
(4) Greater insights into a complex phenomenon than a single multivariate analysis.
(5) Alleviate the problem of sparse cells in discrete multivariate analysis
Four possibilities of cross-tabulation for three or more variables (Malhotra, 2013)
- Refined association between two original variables.
- No association between two original variables despite initial observation.
- Some association between two original variables despite initial observation.
- No change in the initial association.
Process in cross-tabulation (Malhotra, 2013)
1) Test Ho
2) If reject Ho, determine the strength of association by phi coefficient, contingency coefficient, etc.
3) Interpret the pattern of relationship by computing the percentages in the direction of the independent variables
4) Conclude
Cons of Cross-tabulation (Malhotra, 2013)
1) Produce an endless variety of cross tabulation tables.
2) Complex and inefficient as it only examines the association between variables, not causation.
Expected count (expected frequency) calculation
fe = nr*nc / n
Chi-square calculation
X^2 = Σ (observed frequency - expected frequency)^2 / expected frequency
Chi-square analysis definition (slide)
- assess how closely the observed frequencies fit the pattern of the expected frequencies, and is referred to as a “goodness-of-fit” (poor fit - reject Ho).
- analyze the nominal-nominal and nominal-ordinal scaled.
Chi-square distribution definition
A skewed distribution whose shape depends solely on the number of df. As the number of df increases, the chi-square distribution becomes more symmetrical.
Measures for the strength of association
Phi coefficient (Ф), Contingency coefficient, Cramer’s V, Lambda coefficient, Other statistic (tau b, tau c, gamma)
Phi coefficient definition
to measure the strength of association in the special case of a table with two rows and two columns.
Phi coefficient (Ф) calculation
Ф = √ ( X2 / n )
+ Ф = 0: no association
+ Ф = 1: perfectly positive association
+ Ф = -1: perfectly negative association
Relationships between variables can be described in several ways:
Presence, direction, strength of association, and type of relationship (linear or curvilinear).
Covariation definition
The amount of change in one variable that is consistently related to the change in another variable of interest. Or simply, it is the degree of association between 2 variables.
Scatter diagram definition
A graphic plot of the relative position of two variables using a horizontal and a vertical axis to represent the values of the respective variables.
Pearson correlation coefficient (Product Moment correlation)
- A statistical measure of the strength of a linear relationship between two metric variables.
- r varies between -1.00 and 1.00.
Assumption of Pearson Correlation Coefficient:
- Two variables are used interval/ratio-scaled measures.
- The linear relationship between the variables of interest.
- The variables being analyzed have a normally distributed population.
When the correlation coefficient is weak, there are two possibilities:
(1) there is not a consistent, systematic relationship between the two variables
(2) the association exists, but it is not linear, and other types of relationships must be investigated further.
Coefficient of determination (R^2) definition
- Measures the proportion of variation in dependent variable explained by independent variable.
- The larger of R^2, the stronger the linear relationship.
R^2 calculation
R^2 = SSR / TSS (explained variation / total variation)
Adjusted R^2
- R^2 is adjusted for the number of independent variables and sample size for diminishing return.
- It indicates how well the model generalizes.
Role of the Regression analysis
- Predict the values of the dependent variables.
- Determine the structure or form of the relationship
- Indicate relative importance of independent variables
Bivariate / Multivariate regression analysis definition
Analyzes the linear relationship between two / multiple variables by estimating coefficients for an equation for a straight line.