Business Analytics Module 3 Flashcards Preview

HBS - Business Analytics > Business Analytics Module 3 > Flashcards

Flashcards in Business Analytics Module 3 Deck (13)
Loading flashcards...
1
Q

Hypothesis Testing

A

We use hypothesis tests to substantiate a claim about a population mean (or other population parameter).

2
Q

Null Hypothesis

A

A null hypothesis is a statement about a topic of interest, typically based on historical information or conventional wisdom. We start a hypothesis test by assuming that the null hypothesis is true and then test to see if we can nullify it, which is why it’s called the “null” hypothesis. The null hypothesis is the opposite of the hypothesis we are trying to substantiate (the alternative hypothesis).

3
Q

Alternative Hypothesis

A

An alternative hypothesis is the theory or claim we are trying to substantiate, and is stated as the opposite of a null hypothesis. When our data allow us to nullify the null hypothesis, we substantiate the alternative hypothesis.

4
Q

Single-Population Hypothesis Test

A

A test in which a single population is sampled to test whether a parameter’s value is different from a specific value (often a historical average).

5
Q

Two-Population Hypothesis Test

A

A test in which samples from two different populations are compared to see if the parameter of interest is different between the two populations.

6
Q

One-Sided Hypothesis Test

A

A hypothesis test that tests for a difference in a parameter in only one direction (e.g., if the mean of one group is greater than the mean of another group). This test should be used only if the researcher has strong convictions about the direction of the change, for example, that the mean of Group A cannot be less than the mean of Group B. In such a case, the null hypothesis might be that the mean of Group A is less than or equal to the mean of Group B, and the alternative hypothesis is that the mean of Group A is greater than the mean of Group B. The rejection region for a one-sided hypothesis test appears in only one tail of the distribution.

7
Q

Two-Sided Hypothesis Test

A

A hypothesis test that tests for any difference in a parameter (e.g., if the mean of one group is different – either greater than or less than – the mean of another group). In a two-sided test, the null hypothesis is that the parameter is the same (e.g., that the means of two groups are the same), whereas the alternative hypothesis is that the parameter is different (e.g., the means of two groups are different). The rejection region for a two-sided hypothesis test is divided into two parts in the tails of the distribution.

8
Q

Range of Likely Sample Means

A

A confidence interval around the population mean assumed under the null hypothesis. This width of the range is determined by the sample standard deviation and the desired confidence level. When a sample mean falls outside the range of likely sample means, we reject the null hypothesis at the stated confidence level.

9
Q

P-value

A

A p-value can be interpreted as the probability, assuming the null hypothesis is true, of obtaining an outcome that is equal to or more extreme than the result obtained from a data sample. The lower the p-value, the greater the strength of statistical evidence against the null hypothesis.

10
Q

Confidence Level

A

The percentage of all possible samples that can be expected to include the true population parameter. For example, for a 95% confidence level, the intervals should be constructed so that, on average, for 95 out of 100 samples, the confidence interval will contain the true population mean. Note that this does not mean that for any given sample, there is a 95% chance that the population mean is in the interval; each confidence interval either contains the true mean or it does not.

11
Q

Type I Error

A

Incorrectly rejecting a true null hypothesis (e.g., concluding that there is a difference between two groups when there is not one); also called a false positive.

The probability of a type I error is equal to the significance level, which is 1–confidence level. A 90% confidence level indicates that the significance level is 10%. Therefore there is a 10% chance of making a type I error.

12
Q

Type II error

A

Incorrectly failing to reject a false null hypothesis (e.g., concluding that there is no difference between two groups, when there is, in fact, a difference); also called a false negative.

Calculating the chances of making a type II error is quite complex and beyond the scope of this course.

13
Q

T-Test

A

A hypothesis test that uses a t-distribution rather than a normal distribution. When the sample size is small, a t-test is most appropriate to use. The exact form of the t-test is based on the sample’s degrees of freedom, which is equal to the sample size minus 1, that is, n-1. As the sample size increases, the t-distribution, which forms the basis of t-test, more closely approximates a normal distribution.

=T.TEST(array1, array2, tails, type)
• Returns the p-value associated with a given t-test.
• array1 is a set of numerical values or cell references.
• array2 is a set of numerical values or cell references. If we only have one set of data, for the second data set
we create a column for which every entry is the historical mean.
• tails is the number of tails for the distribution. It should be set to 1 to perform a one-sided test; to 2 to perform
a two-sided test.
• typecanbe1,2,or3.
→ Type 1 is a paired test and is used when the same group is tested twice to provide paired “before and after” data for each member of the group.
→ Type 2 is an unpaired test in which the samples are assumed to have equal variances.
→ Type 3 is an unpaired test in which the samples are assumed to have unequal variances. Unless we have
a good reason to believe two samples have equal variances, we typically use type 3 when conducting an unpaired test.