Stats Flashcards
What are the 2 names of the Tukey test for outliers
Tukey’s Range Test
Tukey’s Honestly Significant Difference (HSD)
What are the 5 steps in the Tukey test
1 - Conduct ANOVA
2- Calculate the Critical Value
3 - Compute the Honestly Significant Difference (HSD)
4 - Compare Mean Differences
5 - Interpretation
What does ANOVA stand for
Analysis of variance test - a method used to compare means across multiple groups or treatments.
What is an F-statistic
is a ratio of two variances: the variance between group means and the variance within groups. It quantifies the extent to which the variation among group means is greater than the variation within individual groups.
What is a p-value
indicates the significance of the observed differences
What are the 7 outputs of the Tukey test
1- Comparison Matrix
2 - Mean Differences
3- Tukey’s HSD
4 - Confidence Intervals
5 - Adjusted p-values
6 - Significance Indicators
7 -Summary Statistics
Define Tukey’s HSD
The Honestly Significant Difference (HSD) value calculated based on the critical value and the standard error of the means. This value is used to determine whether the observed differences between group means are statistically significant.
What Python library supports Tukey
statsmodels.stats.multicomp
explain Z-Scores
indicates how many standard deviations a data point is from the mean of the dataset.
What are the 4 steps to calculating a Z-Score
1 - Calculate the Mean (μ): Find the average of the dataset.
2 - Calculate the Standard Deviation (σ)
3 - Subtract the Mean from the Data Point
4 - Divide by the Standard Deviation.
what does a Z-score of 0 indicate
Data is at the mean
what does a Z-score of +1indicate
Z-score of +1 (or -1) indicates that the data point is 1 standard deviation above (or below) the mean
What is R Squared
represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s) in a regression model. In simpler terms, it indicates how well the independent variable(s) explain the variability of the dependent variable.
What is the difference between R Squared and correlation
Correlation measures the strength and direction of the linear relationship between two continuous variables.
R-squared represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
In summary, correlation tells you how closely related two variables are in a linear sense, while R-squared tells you how much of the variability in the dependent variable is explained by the independent variable(s).
Name 8 methods that a data analyst may use to test equity signals
1 Correlation coefficient
2 Scatter plots
3 Covariance
4 Cross-correlation analysis
5 Granger causality test
6 Regression analysis
7 Correlation heatmaps
8 Principal Component Analysis (PCA)
What is the Correlation coefficient
Pearson correlation coefficient is widely used to measure the linear relationship between two signals. Analysts calculate correlation to understand how closely the movements of two signals are related.
What are scatter plots
Visual representation of the data points from both signals on a scatter plot helps analysts to visually inspect the relationship between the signals. Patterns such as clustering around a line or curve can indicate a relationship.
What is Covariance
Covariance measures how much two signals vary together. Positive covariance indicates that the signals move in the same direction, while negative covariance indicates they move in opposite directions.
what is Cross-correlation analysis
This involves analyzing the correlation between the signals at different time lags. It helps in understanding whether one signal leads or lags the other and the strength of this relationship at different time intervals.
what is the Granger causality test
This statistical hypothesis test is used to determine whether one time series is useful in forecasting another. It helps analysts understand if one signal is a leading indicator of the other.
what is Regression analysis
regression analysis quantifies the relationship between the two signals. They may use simple linear regression or more complex models to estimate how changes in one signal affect the other.
what are Correlation heatmaps
These are visual representations of correlation coefficients between multiple signals. Heatmaps help in identifying patterns of correlation among multiple signals simultaneously.
What is Principal Component Analysis (PCA)
PCA is a dimensionality reduction technique that can be used to identify underlying patterns and relationships between multiple signals. It helps in understanding the dominant sources of variability in the data.