A subset of the population of interest - Have to be careful when taking a sample as a low representativeness can lead to the properties of the population are over or underrepresented in the sample Leading to High sampling error

module 5 Flashcards by Menuka Rangani

Variables and observations:

In research studies, information is collected from specific subjects, such as consumers or firms.

These data provide insight into how these subjects score on various variables of interest.

For every subject, data is collected for each variable.

Hence, we observe the scores on the variables for each subject.

How well did you know this?

Not at all

Perfectly

Data sets:

In a data set :

rows capture observations (on, e.g., consumers or firms).

In cross-sectional studies (studies at one point in time):

the number of rows equals the number of subjects (consumers, firms, …) in the data set.

Columns display variables. A variable can take on different values for different subjects.

How well did you know this?

Not at all

Perfectly

Why do we convert textual variables into numerical code and how

Even though variables can take on textual values, manually entering textual data in a data set is often a nuisance.

It takes a lot of time
The probability of making mistakes increases

Therefore, researchers typically recode textual values: they replace text-based answers with a numerical code.

For example, in a survey, respondents may have to indicate whether they are male or female. Males may be coded as 1, whereas females may be coded as 0.

While other numbers might be used to replace the textual values (e.g., male = 1, female = 2; or male = 6, female = 9).

How well did you know this?

Not at all

Perfectly

Whats a Dummy variables:

Variables that only take on the values 0 or 1 are called dummy variables.

Researchers typically use 0 and 1 because this simplifies the interpretation of statistical analyses.

How well did you know this?

Not at all

Perfectly

Ensuring your variables match your unit of analysis

The variables in your data set must match the study’s unit of analysis. Specifically:

The dependent variable is measured at the level of the unit of analysis. So are the mediator variables.

Independent and moderator variables are measured at the level of the unit of analysis or a more aggregate level.

How well did you know this?

Not at all

Perfectly

What sampling does:

The rows in a dataset consist of subjects (firms, customers, products, …). But how do you determine which subjects will be part of your dataset? This is where sampling comes in.

(For example:

In March 2006, the American Medical Association reported disturbing rates of binge drinking and unprotected sex among college students during spring break. The report was based on what the researchers initially claimed was a survey of “a random sample” of 644 students.
The survey results were breathlessly reported on the Today Show, the CBS Early Show, and hundreds of reports followed on local television and radio newscasts. The findings also were reported in Time magazine and the New York Times.
One problem in the spring break study: the sample was not random but self-selected: the results were based on students who volunteered to answer the question as part of an online survey panel. In addition, only about a quarter of these students had ever even gone on a spring break trip. The Times eventually published a correction explaining the misrepresentation.
Statistics are certainly useful in finding answers to research questions, but they can do more harm than good if the sample is not correct. That is, if data are not collected from the right subjects, the research outcomes might be highly misleading, as the spring break example demonstrates.
The lesson here: sampling is critical! )

How well did you know this?

Not at all

Perfectly

What is a Sample ?

A subset of the population of interest

Have to be careful when taking a sample as a low representativeness can lead to the properties of the population are over or underrepresented in the sample

Leading to High sampling error

How well did you know this?

Not at all

Perfectly

What is a population ?

Entire group of people, firms, events, or things of interest for which you would like to make inferences in.

How well did you know this?

Not at all

Perfectly

Sample size does it matter

But, Sample size and representativeness are two related, but different issues. The size of a sample is not a guarantee of its ability to accurately represent a population. Large unrepresentative samples can perform as badly as small unrepresentative samples.

But, Indeed, a higher sample size can decrease the sampling error between your sample and the population of interest! This of course only holds if you use the appropriate sampling design.

How well did you know this?

Not at all

Perfectly

Steps in the sampling process:

1) Define the population you are interested in.

2) Determine the sampling frame. The sampling frame is the physical representation of the population through which one can reach out to that population.

3) Decide on the sampling design.

How well did you know this?

Not at all

Perfectly

Erros during sampling frame and the solutions

When defining the sampling frame errors can occur:

Under coverage : True population members are excluded

Miss coverage : Non population members are included

Solution :

If small recognize but ignore,
If large redefine the population in terms of the sampling frame

How well did you know this?

Not at all

Perfectly

When Deciding on the sampling design what are the classifications

sampling designs can be broken down to 2 sub categories
1) Probability and 2) Sampling

the it can be further boken down as follows

1) Probability

i)simple random sampling
ii)systematic sampling
iii)stratified sampling
iv)Cluster sampling

2) Sampling
i) Connivence sampling
ii) Quota sampling
iii) Juddgemt sampling
iv) Snowball sampling

How well did you know this?

Not at all

Perfectly

factors that guide the selection of a suitable statistical test/technique

The number of independent variables in the conceptual model: one vs. multiple.
The measurement levels of these variables: metric vs. categorical.

How well did you know this?

Not at all

Perfectly

Statistical tests in the case of a conceptual model with

one independent variable

whats the most suitable statistical test

when

both the dependent variable and the (single!) independent variable in the conceptual model are metric (interval or ratio),

When

both the dependent variable and the (single!) independent variable in the conceptual model are metric (interval or ratio),

Pearson’s correlation coefficient

is the most suitable statistical test to uncover whether the two variables are related.

How well did you know this?

Not at all

Perfectly

Statistical tests in the case of a conceptual model with

one independent variable

when

where the dependent variable and the independent variable are categorical (nominal or ordinal),

a chi-square test is the appropriate statistical test.

How well did you know this?

Not at all

Perfectly

Statistical tests in the case of a conceptual model with

one independent variable

When

The dependent variable is metric, but the (single) independent variable is categorical,

Study These Flashcards

either a t-test or a one-way analysis of variance (one-way ANOVA) is suitable.

The selection between these two hinges on the number of levels of the independent variable.

When

the independent variable has just two levels,

a t-test is the appropriate choice.

When the independent variable comprises three or more levels,

a one-way ANOVA is the appropriate test.

Statistical techniques in the case of a conceptual model with multiple independent variables

The dependent variable is metric,

but the independent variables are typically categorical,

Study These Flashcards

ANOVA analyses is the way

(ANOVA: Typically categorical but can handle metric variables.
)

Experimental studies typically use (variations of) ANOVA.

Statistical techniques in the case of a conceptual model with multiple independent variables

The dependent variable is metric,

but the independent variables are typically metric,

Study These Flashcards

linear regression analysis is the way

(Regression: Typically metric but can handle categorical variables.
)

Archival and survey studies typically use (variations of) regression analysis.

Statistical techniques in the case of a conceptual model with multiple independent variables

When the dependent variable is categorical

and,

the independent variable(s) can encompass metric and/or categorical variables.

Study These Flashcards

the appropriate technique is a logit analysis. (logitstic regression )

i) Pearson’s correlation coefficient: a metric DV and IV

Study These Flashcards

Pearson’s correlation coefficient measures the strength of the linear relationship between two metric (interval or ratio) variables.

The possible range of values for the correlation coefficient is -1.0 to 1.0. In other words, a correlation cannot exceed 1.0 or be less than -1.0. A correlation of -1.0 indicates a perfect negative correlation, while a correlation of 1.0 indicates a perfect positive correlation.

A positive relationship exists between two variables if the correlation coefficient is greater than zero.

A positive correlation means that when one variable increases, the other one also increases.

Conversely, if the correlation coefficient is less than zero, it is a negative relationship. A negative correlation means that when one variable increases, the other one decreases.

A correlation of zero indicates that there is no linear relationship between the two variables.

Note that a correlation of zero between two variables does not mean that there is no relationship between the two variables at all. It means that there is no linear relationship. The relationship could still be non-linear.

To calculate a correlation coefficient in SPSS, click Analyze –> Correlate –> Bivariate. In the pop-up window, specify the two variables to be used in the analysis by clicking the blue arrow button.

ii) Chi-square tests: a categorical DV and IV

Study These Flashcards

A chi-square test tests whether there is a relationship between two categorical variables (nominal or ordinal).

Basically, it checks whether the frequencies occurring in the sample differ significantly from the frequencies one would expect. Thus, the observed frequencies are compared with the expected frequencies and their deviations are examined.

Suppose we want to examine whether there is a relationship between gender and the highest level of education.

The frequency distribution is displayed in the displayed contingency table.

A chi-square test, tests whether gender and education are related by comparing the numbers in this table with the numbers we would expect if the two variables were independent.

How to in spss:
In SPSS, click Analyze –> Descriptives –> Crosstabs. Crosstabs creates a contingency table with the distribution of the two variables, as in the example above. Enter one categorical variable as the row variable and the other categorical variables as the column variable. To run the chi-square test, check the Chi-square box in the Statistics window.

Note that you are only allowed to perform a 𝜒2 test if you have collected enough data.

But what constitutes enough? Ideally, you want all the cells in your cross table to have an expected value of 1 or higher. Additionally, the percentage of cells with an expected count lower than 5 should not be more than 20% of the total number of cells. You can find this information just under the table in the SPSS output (in the red boxes).

iii) T-test: a metric DV and a categorical IV with 2 levels

Study These Flashcards

When the dependent variable is metric and the independent variable is categorical with two levels (just two, no more), a t-test is appropriate.

T test is used to establish wether the mean between the 2 groups are significant

To compare independent samples you need a independence samples T tests called Unpaired samples t test

Independent samples T test requirement is as follows:

has to have one nominal variable with 2 levels (eg: gender) and

has to have one metric variable (eg: salary) to calculate the mean

To test 2 dependent groups you need a Dependent samples T test eg : when we sample the same groups at 2 different points of time.

and you need 2 metric variables one per time period

When P is less than 0.05 we say there is a statistically significant difference between the 2 groups

A p-value less than 0.05 is typically considered to be statistically significant, in which case the null hypothesis should be rejected.

A p-value greater than 0.05 means that deviation from the null hypothesis is not statistically significant, and the null hypothesis is not rejected.

Null Hypothesis : no mean difference exist between the groups.

Alternative hypothesis:

1) Non-Directional: A mean difference exist between the groups

2) Directional : The mean of group 1 is larger than the mean of group 2 (or vv)

iv) One-way ANOVA: a metric DV and a categorical IV with 3 or more levels

Study These Flashcards

When the dependent variable is always metric and the independent variable is categorical with three levels (or more), a one-way ANOVA is appropriate.

The goal of a one way Anova is to establish if the difference between the means of 3 or more groups is significant

One way Anova is the update of the t test but this time able to go beyond only 2 groups (levels) to more groups

Null Hypothesis : no mean difference exist between the groups.

Alternative hypothesis:

At least 2 group means are significantly different from each other

Hoc test is used to test which groups means differ

In SPSS, a one-way analysis of variance with repeated measures can be run by clicking Analyze –> General Linear Model –> Repeated Measures.

v) ANOVA or regression analysis

Study These Flashcards

When the dependent variable is metric, an ANOVA or a regression analysis can be used.

Regression analysis [The dependent variable is metric, but the independent variables are typically metric,] focuses on how changes in continuous independent variables affect the dependent variable (but can also deal with categorical IVs),

while ANOVA [The dependent variable is metric, but the independent variables are typically categorical,] focuses on uncovering group differences (but can also deal with continuous IVs).

v.1) ANOV (one way ANCOVA)

One-way ANCOVA When a study contains one manipulated IV and one metric IV, it is called a one-way ANCOVA (analysis of covariance). You can think of a one-way ANCOVA as an extension of the one-way ANOVA that incorporates a covariate. Like the one-way ANOVA, the one-way ANCOVA is used to determine whether there are any significant differences between two or more independent (unrelated) groups on a dependent variable. However, whereas the ANOVA looks for differences in the group means, the ANCOVA looks for differences in adjusted means (i.e., adjusted for the covariate). As such, compared to the one-way ANOVA, the one-way ANCOVA has the additional benefit of allowing you to "statistically control" for a third variable (also known as a "confounding variable" or "control variable"), which you believe will affect your results. This third variable that could be confounding your results is called the covariate; hence the name analysis of covariance. You can have more than one covariate. Although covariates are traditionally measured on a metric scale, they can also be categorical.

v.1) ANOV (2 way ANOVA or ANCOVA)

The "one-way" part of one-way ANOVA refers to the number of manipulated independent variables. If you have two independent categorical variables rather than one, you run a two-way ANOVA. Again, you can add one or more control variables, in which case we would refer to a two-way ANCOVA.

Regression analysis

When the dependent variable is metric, a linear regression analysis can be used. The independent variables in a linear regression can be metric and/or categorical. (A t-test compares the means between two groups (e.g., the housing price in urban vs. rural neighborhoods), but does not predict outcomes based on two independent variables. for example The dependent variable is metric. One independent variable is metric, while the other is categorical with two levels. Linear regression analysis can deal with both types of independent variables.) Goal of regression analysis is to explain the difference between DV and one or more IVs, it provides analysis of how DV changes if the one IV changes Furthemore, a regression analyses can also be used to make a prediction what a DV would be be based on one or more IVs In essence a simple liner regression use one straight line to summarise bunch of data , The stronger the relation between DV and IV the more data points lies on the straight line Since mutiple linear regression analysis can accommodate multiple independent variables, it is highly suitable to test moderating effects,

logit analysis (logistic regression )

When the dependent variable is categorical, the appropriate test is a logit analysis. Again, the independent variable(s) in a logit analysis can also be metric and/or categorical.

Explain the General formula of the simple linear regression

Yi = b0+b1Xi+ei Y is the dependent variable i is the observations b0 is the intercept b1 is the slope (By how much does Y change if the X increases with one unit) X is the independent variable e is the error

Explain the General formula of the mutiple linear regression

Since mutiple linear regression analysis can accommodate multiple independent variables, it is highly suitable to test moderating effects, also able to determine the importance of each of the IV to see what the most important and what the least and so on Yi = b0+b1X1i+b2X2i+....+bkXki+ei Y is the dependent variable i is the observations b0 is the intercept b1 is the slope (By how much does Y change if the X1 increases with one unit when controlling for X2 and Xk) X is the independent variable e is the error

Catogorialcal IVs with 2 levels in a multiple linear regression analysis

The way you do it is by including ot as a dummy variable a variable that essentially that takes the value of 0 or 1 (eg male or female can be re wrote to 1 or 0)

Moderating effects in Linear regression

Since linear regression analysis can accommodate multiple independent variables, it is highly suitable to test moderating effects, Yi = b0+b1X1i+b2MODi+b3Xi *MODi+ei b3Xi *MODi : include the interaction effect between IV(X) and the moderator (MOD) b2MODi :control for the main effect of the moderator b3 = means to what extent does MOD change the effect of X on Y why does a interaction effect capture a moderatoring effect ? because X appers in 2 situation so the effect of X on Y depends on MOD i

T test in SPSS : (independent )

Analyse->compare means->independent samples T test -> DV in the Test variables (metric) and IV in Groping variables. In the first table it is standard descriptive statistics giving the MEAN and and SAMPLE SIZE of both groups While the bottom table reports if the IVs are significant, the bottom table is 2 T tests, i) For 2 groups are in equal variances ii) For 2 groups are unequal variances If the significance is >0.05 then the group have equal variances so t test interpreted in the i) line If the significance is <0.05 then the group have unequal variances so t test interpreted in the ii) line The t column give the T value , If the hypothesis is non directional we use a 2 sided P column otherwise we use the one sided p column

T test in SPSS : (For a paired sample)

-> Analyse->compare means->paired samples T test -> first variable in the first variables and 2nd in 2nd variables. first table give the mean of both groups the third table called the paried sample T test - the mean is mean difference of group 1 - group 2 - The t column give the T value , - If the hypothesis is non directional we use a 2 sided P column otherwise we use the one sided p column

One way ANOVA IN SPSS:

Analyse->compare means->one way ANOVA-> select IV labelled as Factor and then DV is labelled as Dependent list->options->descriptives->continue Next -> Click on the post hoc box -> pick “Turkye” -> then Okay First table provide descriptive statistics (mean in the mean column) for each table, And now to see if these differences are significant u look at the ANOVA table If the P value is < 0.05 at least 2 groups means differ significantly Then we use the multiple comparison table to see where those differences are if P < 0.05 the 2 group means differ significantly The moderating effect is captured by the last line in the output: the interaction between the independent variable and the moderator.

module 5 Flashcards

(35 cards)