10 Introduction to Inferential Statistics Flashcards

1
Q

What is the main focus of the chapter?

A

The chapter focuses on data analysis, specifically t-tests, chi-square, correlations, and simple linear regression analyses

Includes hands-on practice to reinforce understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three main types of t-tests?

A
  • Independent t-test
  • Dependent t-test
  • One-sample t-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does an independent t-test compare?

A

Two groups that are independent of each other

Samples include different people or observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a dependent t-test?

A

A t-test that compares two groups that are inherently related

Example: collecting data from the same group at two different times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a one-sample t-test compare?

A

One group against a single value

Example: comparing class test scores against the mean of all test scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a t-test generate that is used to determine statistical significance?

A

A t-score

Typically, the p-value is provided as a result by analytics tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the key assumptions for an independent t-test?

A
  • Independence
  • Normality
  • Homogeneity of variance
  • n ≥ 30
  • The independent variable is categorical
  • The dependent variable is numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the assumption of normality in a t-test?

A

The data is normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does homogeneity of variance mean in the context of a t-test?

A

The variance of both groups is approximately the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the significance of having at least 30 observations in each sample for a t-test?

A

It generally leads to better results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the independent variable in a t-test?

A

The variable that is controlled or changed, separating the two groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the dependent variable in a t-test?

A

A numerical variable being compared between the two groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a chi-square goodness of fit test used for?

A

To compare a sample to a population to see if the sample is a good representation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a chi-square test for independence compare?

A

Two categorical variables to see whether there is a relationship between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the null hypothesis in a chi-square test for independence?

A

There is no relation between the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the assumptions for a chi-square test for independence?

A
  • Both variables are categorical
  • Independence of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a contingency table?

A

A frequency table that looks at more than one variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What programming environment is used for running the t-test in this chapter?

A

Jupyter Notebooks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What Python library is commonly used for statistical tests like t-tests?

A

SciPy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Fill in the blank: A t-test compares two groups that contain _______.

A

[quantitative data]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True or False: You need to perform a t-test for the exam.

A

False

You need to know the definition and application of a t-test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the purpose of running a t-test on Yorkshire Terriers and Singapura weights?

A

To determine if there is a significant difference in weight between the two breeds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the p-value indicate in a t-test result?

A

The statistical significance of the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the mean weight of Singapura cats according to the example?

A

6.1 lb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the mean weight of Yorkshire Terriers according to the example?
5.5 lb
26
What is the difference in mean weight between Singapura cats and Yorkshire Terriers?
0.6 lb
27
What is the main takeaway from performing t-tests according to the chapter?
Understanding how to differentiate t-tests from other analyses.
28
What is the next analysis covered after t-tests in this chapter?
Chi-square analysis.
29
What does a chi-square test for independence compare?
It compares two categorical variables by analyzing a contingency table.
30
Define a contingency table.
A contingency table is another name for a frequency table that looks at more than one variable.
31
List the assumptions of a chi-square test.
* Both variables are categorical * Independence of observations * Contingency cell exclusivity * 80% of cells should have a value of at least 5 * n ≥ 50
32
What is meant by 'independence of observations' in a chi-square test?
Each observation needs to be independent of every other observation.
33
Explain contingency cell exclusivity.
Each observation is only counted once in the contingency table.
34
What is the minimum requirement for cell values in a chi-square test?
80% of the cells should have a count of 5 or more.
35
What is the minimum sample size recommended for a chi-square test?
n ≥ 50.
36
How do you calculate the number of cells in a contingency table?
Multiply the number of possible outcomes of one variable by the number of possible outcomes of the second variable.
37
What does the p-value represent in a chi-square test?
It indicates whether there is a statistically significant relationship between the variables.
38
What conclusion is drawn if the p-value is larger than 0.05 in a chi-square test?
Accept the null hypothesis and reject the alternative hypothesis.
39
Define correlation in statistical terms.
A correlation is a relationship between two variables that can be positive or negative.
40
What does a positive correlation indicate?
As one variable increases, so does the other.
41
What does a negative correlation indicate?
As one variable increases, the other variable decreases.
42
What does it mean when there is no correlation between two variables?
What happens to one variable has no impact on the other.
43
True or False: Correlation implies causation.
False.
44
What is the correlation coefficient?
A number that indicates the strength of the relationship between two variables, ranging from -1 to 1.
45
What is the significance of the correlation coefficient being close to 1 or -1?
The closer the coefficient is to 1 or -1, the stronger the relationship.
46
What type of analysis tests the strength of the relationship between two numerical variables?
Correlation analysis.
47
List the assumptions for Pearson’s correlation analysis.
* Level of measurement * Linearity * Normality * Related pairs * Lack of outliers * n ≥ 30 * Two continuous variables
48
What is simple linear regression primarily used for?
Prediction of one variable based on another.
49
In simple linear regression, what does the x-axis represent?
The independent variable (predictor variable).
50
In simple linear regression, what does the y-axis represent?
The dependent variable (criterion variable).
51
What does the R² value indicate in regression analysis?
It tells how much variance in the dependent variable is explained by the independent variable.
52
What is the null hypothesis in regression analysis?
The independent variable is NOT a predictor of the dependent variable.
53
What is the alternative hypothesis in regression analysis?
The independent variable IS a predictor of the dependent variable.
54
What is the minimum sample size recommended for correlation analysis?
n ≥ 30.
55
What statistical method is used to determine if the number of nuts fed to squirrels can predict their weight?
Simple linear regression ## Footnote The p-value from the regression analysis was 0.043, indicating a statistically significant prediction.
56
What is the p-value threshold commonly used to determine statistical significance in regression analysis?
0.05 ## Footnote A p-value below this threshold suggests that the independent variable can predict the dependent variable.
57
In simple linear regression, which variable is considered the independent variable when predicting height?
Weight ## Footnote Height is the dependent variable in this context.
58
What are the assumptions of simple linear regression? List them.
* Linearity * Normality * Independence * Homoscedasticity * The dependent variable is numeric * The independent variable is numeric * n % 100 ## Footnote Each assumption must be checked to ensure valid results.
59
What does the term 'homoscedasticity' refer to in regression analysis?
The variance in the residuals remaining constant for different levels of the independent variable ## Footnote It means that the residuals are evenly spread around the regression line.
60
What is the minimum sample size recommended for simple linear regression?
10 observations ## Footnote However, best practice suggests using at least 100 observations.
61
What does the R-squared value represent in regression analysis?
The proportion of variance in the dependent variable explained by the independent variable ## Footnote An R-squared value of 0.92 means that 92% of the variance in height can be explained by weight.
62
What function is used in Python to create a simple linear regression model?
sm.OLS() ## Footnote This function is part of the statsmodels library in Python.
63
Fill in the blank: If you want to see whether there is a statistically significant difference between two groups using numeric variables, you would use a _______.
T-test
64
Which analysis would be appropriate to determine if there is a relationship between two categorical variables?
Chi-square test ## Footnote This test is used for independence between two categorical variables.
65
True or False: Correlation can be used to see whether two numeric variables are related and how strongly they are related.
True
66
What is the appropriate analysis to predict one numeric variable using another numeric variable?
Simple linear regression