Stats Flashcards

Question 1

Q

What are the 2 names of the Tukey test for outliers

Answer

A

Tukey’s Range Test
Tukey’s Honestly Significant Difference (HSD)

Question 2

Q

What are the 5 steps in the Tukey test

Answer

A

1 - Conduct ANOVA
2- Calculate the Critical Value
3 - Compute the Honestly Significant Difference (HSD)
4 - Compare Mean Differences
5 - Interpretation

Question 3

Q

What does ANOVA stand for

Answer

A

Analysis of variance test - a method used to compare means across multiple groups or treatments.

Question 4

Q

What is an F-statistic

Answer

A

is a ratio of two variances: the variance between group means and the variance within groups. It quantifies the extent to which the variation among group means is greater than the variation within individual groups.

Question 5

Q

What is a p-value

Answer

A

indicates the significance of the observed differences

Question 6

Q

What are the 7 outputs of the Tukey test

Answer

A

1- Comparison Matrix
2 - Mean Differences
3- Tukey’s HSD
4 - Confidence Intervals
5 - Adjusted p-values
6 - Significance Indicators
7 -Summary Statistics

Question 7

Q

Define Tukey’s HSD

Answer

A

The Honestly Significant Difference (HSD) value calculated based on the critical value and the standard error of the means. This value is used to determine whether the observed differences between group means are statistically significant.

Question 8

Q

What Python library supports Tukey

Answer

A

statsmodels.stats.multicomp

Question 9

Q

explain Z-Scores

Answer

A

indicates how many standard deviations a data point is from the mean of the dataset.

Question 10

Q

What are the 4 steps to calculating a Z-Score

Answer

A

1 - Calculate the Mean (μ): Find the average of the dataset.
2 - Calculate the Standard Deviation (σ)
3 - Subtract the Mean from the Data Point
4 - Divide by the Standard Deviation.

Question 11

Q

what does a Z-score of 0 indicate

Answer

A

Data is at the mean

Question 12

Q

what does a Z-score of +1indicate

Answer

A

Z-score of +1 (or -1) indicates that the data point is 1 standard deviation above (or below) the mean

Question 13

Q

What is R Squared

Answer

A

represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s) in a regression model. In simpler terms, it indicates how well the independent variable(s) explain the variability of the dependent variable.

Question 14

Q

What is the difference between R Squared and correlation

Answer

A

Correlation measures the strength and direction of the linear relationship between two continuous variables.

R-squared represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

In summary, correlation tells you how closely related two variables are in a linear sense, while R-squared tells you how much of the variability in the dependent variable is explained by the independent variable(s).

Question 15

Q

Name 8 methods that a data analyst may use to test equity signals

Answer

A

1 Correlation coefficient
2 Scatter plots
3 Covariance
4 Cross-correlation analysis
5 Granger causality test
6 Regression analysis
7 Correlation heatmaps
8 Principal Component Analysis (PCA)

Question 16

Q

What is the Correlation coefficient

Answer

Study These Flashcards

A

Pearson correlation coefficient is widely used to measure the linear relationship between two signals. Analysts calculate correlation to understand how closely the movements of two signals are related.

Question 17

Q

What are scatter plots

Answer

Study These Flashcards

A

Visual representation of the data points from both signals on a scatter plot helps analysts to visually inspect the relationship between the signals. Patterns such as clustering around a line or curve can indicate a relationship.

Question 18

Q

What is Covariance

Answer

Study These Flashcards

A

Covariance measures how much two signals vary together. Positive covariance indicates that the signals move in the same direction, while negative covariance indicates they move in opposite directions.

Question 19

Q

what is Cross-correlation analysis

Answer

Study These Flashcards

A

This involves analyzing the correlation between the signals at different time lags. It helps in understanding whether one signal leads or lags the other and the strength of this relationship at different time intervals.

Question 20

Q

what is the Granger causality test

Answer

Study These Flashcards

A

This statistical hypothesis test is used to determine whether one time series is useful in forecasting another. It helps analysts understand if one signal is a leading indicator of the other.

Question 21

Q

what is Regression analysis

Answer

Study These Flashcards

A

regression analysis quantifies the relationship between the two signals. They may use simple linear regression or more complex models to estimate how changes in one signal affect the other.

Question 22

Q

what are Correlation heatmaps

Answer

Study These Flashcards

A

These are visual representations of correlation coefficients between multiple signals. Heatmaps help in identifying patterns of correlation among multiple signals simultaneously.

Question 23

Q

What is Principal Component Analysis (PCA)

Answer

Study These Flashcards

A

PCA is a dimensionality reduction technique that can be used to identify underlying patterns and relationships between multiple signals. It helps in understanding the dominant sources of variability in the data.

Stats Flashcards

(23 cards)