BSNS112 Flashcards

1
Q

What is the trimmed mean?

A

Cuts out the most extreme 5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the mean

A

Arithmetic average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is variance?

A

The squared deviation around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the standard deviation? (2)

A
  • Average deviation around the mean
  • Square root of the variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is kurtosis?

A

The extent to which observations cluster around a central point and fatness of tails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is right skew?

A

Positively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is left skew?

A

Negatively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is >?

A

Greater than

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is <?

A

Less than

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the sign for greater than?

A

>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the sign for less than?

A

<

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the test statistics?

A

sample statistic - null Ho value/standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the reject Ho rule?

A

If the p-value is less than or equal to the specified significance level α, the null hypothesis is rejected; otherwise, the null hypothesis is not rejected. In other words, if p≤α, reject H0; otherwise, if p>α do not reject H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a type 2 error?

A

Failing to reject the null hypothesis when you should have

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a type 1 error?

A

Rejecting the hypothesis when you shouldn’t have, i.e when null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What conditions need to be meet for hypothesis test to be valid?

A

normally distrbitued or sample bigger than 30
expected number of successes (np) and the expected number of failure (nq) must be five or more
the data are a random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you check if sample has come from a normal distribution?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

A 95% confidence interval is an interval calculated from ______ data and will cover the true _____ in 95% of all samples of the same size randomly drawn from the same population.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the sign for mean

A

the weird u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the p-value

A

The probability of getting a test statistic at least as extreme as the observed test statistic from the sample(s) given that the null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the test statistic measure?

A

the number of standard errors that the sample statistic is away from the value of the population parameter in the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When is a one sample t-test valid?

A

Data values must be independent. …
Data in each group must be obtained via a random sample from the population.
Data in each group are normally distributed.
Data values are continuous.
The variances for the two independent groups are equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the population mean, sample mean, standard deviation of a population and standard deviation of a sample signs?

A

μ refers to a population mean; and x, to a sample mean. σ refers to the standard deviation of a population; and s, to the standard deviation of a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A 90% confidence interval in relation to the 95% confidence interval above would be…?

A

Narrower and less likely to contain the unknown population proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which conditions should hold for this confidence interval to be valid?

A

The sample size must be bigger than 30 so that the sample mean is normally distributed according to the CLT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does the term sample statistic mean?

A

sample mean, median, standard deviation or percents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does the term standard error refer to?

A

The standard deviation of something

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the alternative hypothesis Ha:

A

The opposite of the null hypothesis and challeneges the null. Never contains the = sign, it contains /= or > or <

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

When do you reject null hypothesis?

A

if population is distributed normally the same mean is distributed normally
expenditure for large samples from any distribution the sample mean is distributed approx. normal by the central limit theorem.

If there is sufficient evidence against the status quo which comes from a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the null hypothesis Ho:?

A

The status quo. Begin the assumption that the null hypothesis is true and try disprove it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How are degrees of freedom for this tests calculated?

A

(The number of rows -1) x (the number of columns-1)
V1= k-1
V2 = n-k
k= number of categories
n= total sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Which of the following three statements about p-values is not correct?

A

The p-value can be interpreted as the probability of making a Type 1 Error in repeated sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a chi-square goodness of fit test?

A

A statistical hypothesis test used to determine whether a variable is likely to come from a specified distribution or not. It is often used to evaluate whether sample data is representative of the full population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a chi-square test of independence?

A

Check to see if independence holds in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is an ANOVA test?

A

Determines whether differences exist among two or more population means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is a correlation analysis?

A

Is primarily concerned with finding out whether a relationship exists between variables and then determining the magnitude and action of that relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is an independent samples t-test for the difference in means?

A

Used to compare two sample means from unrelated groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is a dependent samples t-test for the difference in means?

A

Used to compare the sample means from two related groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is a Levenes test?

A

Tells us if we have equal population variances
decision rule: p-value<a*=0.10 then reject Ho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is a non-paramatic test?

A

Not distribution dependent
Not as powerful as parametric tests
should be used when distribution requirements not met

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is a Z-test?

A
42
Q

What is the Wilcoxon Rank Sum test?

A

requires 2 independent random samples
assumes equal variance for both populations
test uses sums of ranking of observations across both samples
test statistic is the sum from one of the sample ranks

43
Q

What is true when comparing the Wilcoxon Rank Sum test to the more commonly used parametric methods for testing differences in means?

A
44
Q

What is the ANOVA F-test?

A
45
Q

What decreases the p-value in an ANOVA F-test

A
46
Q

Given a continuous random variable x, what is Pr(x=0.5)

A
47
Q

What is a linear relationship?

A

straight line relationship between two variables

48
Q

What is the difference between a significant and insignificant linear association?

A

A significant relationship is one that’s large enough to be unlikely to have occurred in the sample if there no relationship in the population

49
Q

What is a regression?

A

sorting out which of those variable does indeed have an impact

50
Q

What does a high correlation tell us?

A

Two or more variables have a strong relationship with each other

51
Q

What does a low correlation tell us?

A

means that the variables are hardly related.

52
Q

What is a seasonal trend?

A

a characteristic of a time series in which the data experiences regular and predictable changes that recur every calendar year

53
Q

What is a seasonal random trend?

A

assumes that the expected values of all future seasonal differences are equal to the most recently observed seasonal difference.

54
Q

What is a cyclical trend?

A

A regularly recurring pattern

55
Q

What is the best method to obtain a smooth trend and the monthly seasonal components from the time series data above?

A
56
Q

What is a population proportion?

A

the share of a population that belongs to a particular category

57
Q

What is a point estimate?

A

a single value estimate of a parameter

58
Q

What is an interval estimate?

A

a kind of statistical inference in which we search for an interval of values that contains the true parameter with high probability

59
Q

What is the outcome of a Poisson experiment?

A

gives the probability of a number of events occurring in a fixed interval of time or space if these events happen with a known average rate and independently of the time since the last event.

60
Q

What is the population proportion?

A

the share of a population that belongs to a particular category.

61
Q

What is the null hypothesis?

A

a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.

62
Q

What is central location?

A

a summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution.

63
Q

What is dispersion?

A

a statistical measure of the range of potential outcomes for an investment based on its historical volatility or returns.

64
Q

What is the shape of sample distributions?

A
65
Q

What are the necessary conditions for the confidence interval to be valid?

A
66
Q

What is a post-hoc test?

A

a statistical analysis specified after a study has been concluded and the data collected. A post-hoc test is done to identify exactly which groups differ from each other. Therefore, such tests are also called multiple comparison tests.

67
Q

What are the types of Post-Hoc tests?

A

Turkeys honest significance difference test, schneffes test, and Bonferronis test

68
Q

When should you use a post-hoc test?

A

used only after we find a statistically significant result and need to determine where our differences truly came from.

69
Q

What is the Turkeys honest significance difference test?

A

used to test differences among sample means for significance. The Tukey’s HSD tests all pairwise differences while controlling the probability of making one or more Type I errors.

70
Q

What is the scheffes test?

A

used to make unplanned comparisons, rather than pre-planned comparisons, among group means in an analysis of variance (ANOVA) experiment. The Scheffé test has the advantage of giving the experimenter the flexibility to test any comparisons that appear interesting.

71
Q

What is the Bonferronis test?

A

a type of multiple comparison test used in statistical analysis. When performing a hypothesis test with multiple comparisons, eventually a result could occur that appears to demonstrate statistical significance in the dependent variable, even when there is none.

72
Q

What is the estimated regression equation?

A

For simple linear regression, the least squares estimates of the model parameters β0 and β1 are denoted b0 and b1. Using these estimates, an estimated regression equation is constructed: ŷ = b0 + b1x

73
Q

What are the coefficients of the estimated regression equation?

A
74
Q

What is the value of R^2 and what does it mean?

A
75
Q

When do use the confidence interval for one proportion equation?

A

We use p-hat to calculate the margin of error.
95% normal distribution Z-score can be provided NP>=6 and NQ >= 5 and the four binomial conditions hold
Z is used not T because we are using the normal distribution to approx. the binomial distribution and N&P fully determine the distribution
The skewness of the binominal distribution depends on the probability of success and the number of trials

76
Q

When do you use the hypothesis test for one proportion?

A

when you are comparing one group to a known or hypothesized population proportion value. In other words, you have one sample with one categorical variable.

77
Q

When do you use the sample size for estimating mean equation?

A
78
Q

When do you use the sample size for proportion equation?

A

If you intend to ask more than one question, then use the largest sample size across all questions. Note that if the questions do not all have just two valid answers (eg. yes or no), but include one or more additional responses (eg.

79
Q

When do you use the confidence interval for the difference in two proportion equation?

A

If the confidence interval for the difference does not contain zero, we can conclude that there is a statistically significant difference in the two population values at the given level of confidence.

80
Q

When do you use the hypothesis interval for the difference in two proportions equation?

A

A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. The difference of two proportions follows an approximate normal distribution. Generally, the null hypothesis states that the two proportions are the same. That is, H0:pA=pB.

81
Q

When do you use the pooled sample proportion equation?

A

If the null hypothesis is true then the population proportions are equal. When computing the standard error for the difference between the two proportions a pooled proportion is used as opposed to the two proportions separately (i.e., unpooled).

82
Q

When do you use the standard normal transformation equation?

A
83
Q

What is sampling error?

A

chance differences from sample to sample:
- difference b/w sample and population that edit only because of the observations that happened to be selected in the sample
- error we expect to occur when we make a statement about the population based on a sample

84
Q

What is binomial distribution characteristics? (4)

A
  • Two outcomes to every trial, success or failure
  • A fixed number of trials
    The probability fo success stays the same for each trial
  • The probability of success stays the same for each trial
  • The trials are independent, meaning that the outcome of one trial doe not affect the outcome of another trial (trials known as Bernoulli trials)
85
Q

What is the Poisson distribution?

A

It measures the success in an interval
1. the number of successes in an interval independent of number of successes in any other interval
2. The probability of success is the same for all equal sized intervals
3. The probability of success in an interval is proportional to the size of the interval
4. The probability of more than one success in an interval approaches hero as the interval becomes smaller

86
Q

What is normal distribution (5)?

A
  • A continuous random variable
  • Bell shaped or mound shaped
  • symmetrical
  • mean and median are equal
  • Empirical rule applies
87
Q

What is a confidence interval?

A

All two sided confidence intervals for the population mean or population proportion follow a standard form

88
Q

Values of Z: 90%, 95% and 99%

A

90% confidence: +- 1.645
95% confidence: +- 1.96
99% confidence: +- 2.576

89
Q

What does the values of T do

A
  • varies with sample size (n)
  • tables approx for large samples
  • the larger the sample, the closer the T value and the Z value
90
Q

What does the 95% interval estimate?

A

Estimates, from samples of the same size drawn from the same population, will result in confidence intervals that contain the true population parameter
95% of intervals are corrected and 5% are wrong

91
Q

What are the confidence interval conditions for a mean to be valid? (2)

A
  1. need a random sample
  2. The sample means must be approx. normally distributed
92
Q

What is symbol for sample proportion?

A

p with hat

93
Q

What is the symbol for population proportion?

A

p

94
Q

What is a one tailed level of significant and rejection region?

A

Rejection region is either on the left or right tail = a

95
Q

What is a two tailed level of significance and rejection region? `

A

Rejection region is both the left and right tail = a/2

96
Q

What is the one sample hypothesis test?

A
  1. State null and alt hypothesis
  2. Select the level of significance
  3. Select the test statistic
  4. Formulate the decision rule
  5. Collect data, make a decision
97
Q

what is the decision rule?

A

1.96<Z>1.96 do not reject Ho</Z>

98
Q

What is case 1?

A

Z test for difference in two means (variance known)

99
Q

What are case 1 assumptions?

A

2 samples are randomly and independently drawn
if sample size small, samples are drawn from normal populations, otherwise large samples required (CLT)
population variance known

100
Q

What is case 2?

A

t test for differences in two means (variance unknown but assumed equal)

101
Q

What is case 3?

A

T test for differences in two means (unknown population variances

102
Q

What is the central limit theorem?

A

If the population you are sampling from is normally distributed the means of samples drawn from the population will also be normally distributed
if the population is not normally distributed the sample means will be approx., normally distributed if N is large