CAP definitions Flashcards
(167 cards)
What is cross correlation?
When two different sequences are correlated
What is autocorrelation?
Degree of similarity between the values of the same variables
What is spatial autocorrelation?
Degree of similarity - When error terms across cross section data are correlated
What is serial autocorrelation
degree of similarity - When error terms across time series data are correlated
What is OLAP?
Online Analytical Processing - uses complex queries to analyze aggregated historical data from OLTP systems - associated with data warehouses. Operations include roll-up, drill down, slice and dice, pivoting, drill through, drill across, etc. OLAP data cubes can be mapped to any (infinite) number of dimensions .
What is OLTP?
Online Transaction Processing - captures, stores, and processes data from transactions in real-time. Faster
Monte Carlo simulation
Necessary to develop a cumulative probability distribution. Technique that allows people to account for risk in quantitative analysis and decision making.
snowball sampling
survey sampling where subjects are based on referral from other survey respondents
quota sampling
Method for selecting survey participants that is a non-probabilisitc version of stratified samplings. involves a specific group
judgement sampling
based on researcher’s judgement to select
strata sampling
random sampling is stratified random sampling
Central limit theorum
The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution. Sample sizesequal to or greater than 30are often considered sufficient for the CLT to hold.
When do you reject null hypothesis?
When p-value (probability) is less than alpha (level of significance) - alpha is typically 0.05. Null hypothesis is rejected only in two cases. If p-value is less than alpha or test statistic calculated is greater than tabular value.
What is alpha in statistics?
The level of significance
When do you fail to reject null hypothesis?
When p-value is above alpha
p-value
A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance
r squared
R-Squared is a statistical measure of fit that indicates how much variation of a dependent variable is explained by the independent variable(s) in a regression model. R-squaredevaluates the scatter of the data points around the fitted regression line. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively
paired t-test
used when we are interested in the difference between two variables for the same subject. Can never be applied on two different samples. Data is in the form of matched pairs. Parametric test
one-sample t-test
Used to determine if there is a significant difference between the means of two groups. If value of sample size is less than 30 and variance is unknown, the one sample t-test is the best statistic to test the hypothesis.
two sample t-test
Used to determine if two population means are equal. Used to test if a new process is superior to a current process.
one sample z-test
Used when we want to know whether our sample comes from a particular population. If N is greater than 30, we would have used the z-test with variance still unknown.
f-test
F test used when variance is known. An “F Test” is a catch-all term forany test that uses the F-distribution. In most cases, when people talk about the F-Test, what they are actually talking about is TheF-Test to Compare Two Variances
chi-squared test
Measures the difference between observed and expected values. Used on categorical data. Chi-square test uses the observed and expected frequency of categorical data from the contingency table. Other tests are based on mean and variance of the data.
Mann-whitney test (U test)
Used to test if two samples came from same population. Involves the calculation of a statistic, called U, whose distribution under the null hypothesis is known. Non-parametric test to compare outcomes between two independent groups. Alternative to two sample t-test.