# Continuous and rest till anova (p-values) Flashcards

1
Q

What is the Normal Distribution?

A

The normal distribution, also known as the Gaussian distribution, is a symmetric probability distribution frequently encountered in nature. It is defined by its mean and standard deviation and forms a characteristic bell-shaped curve. Example: IQ scores in a population often follow a normal distribution. For instance, if the mean IQ score is 100 and the standard deviation is 15, the distribution captures the variation in IQ scores.

2
Q

What is the Uniform Distribution?

A

The uniform distribution assigns equal probabilities to all outcomes within a specified interval. It models situations where each outcome has the same likelihood. Example: Rolling a fair six-sided die illustrates a uniform distribution, where each number (1 to 6) has an equal probability of 61​. |

3
Q

What is the Exponential Distribution?

A

The exponential distribution models the time between events in a Poisson process. It is often used for modeling waiting times or lifetimes and possesses a memoryless property. Example: Consider the time between arrivals of customers at a fast-food drive-thru with an average arrival rate of 3 minutes. The exponential distribution can model the time between successive arrivals. |

4
Q

What is the Gamma Distribution?

A

The gamma distribution is a versatile distribution used to model various types of continuous random variables. It encompasses the exponential distribution as a special case. Example: To model the time it takes for a machine to produce a certain number of parts, the gamma distribution can be employed. If the machine produces an average of 100 parts per hour, the distribution captures production times. |

5
Q

What is the Beta Distribution?

A

The beta distribution models probabilities or proportions and exhibits diverse shapes. It is commonly used in Bayesian analysis and quality control. Example: In a clinical trial, the proportion of patients responding positively to a new drug can be represented using a beta distribution, aiding in estimating a range of possible response rates. |

6
Q

What is the Chi-Square Distribution?

A

The chi-square distribution is commonly used in statistical hypothesis testing and arises when summing the squares of independent standard normal random variables. Example: When testing the independence of two categorical variables, such as smoking habits and lung cancer incidence, the chi-square distribution is utilized to assess the significance of the association. |

7
Q

What is the Student’s t-Distribution?

A

The t-distribution is employed when estimating the mean of a normally distributed population from a small sample or when the population standard deviation is unknown. Example: Suppose you wish to estimate the average time spent on a task from a small sample of 12 observations. The t-distribution is used to construct a confidence interval for the population mean. |

8
Q

What is the Log-Normal Distribution?

A

The log-normal distribution models data that are positively skewed and cannot take negative values. It is often used in financial modeling and describes multiplicative growth. Example: The distribution of housing prices in a city can often be described using a log-normal distribution, accounting for positive skewness and preventing negative prices. |

9
Q

What is the Weibull Distribution?

A

The Weibull distribution models the distribution of lifetimes or failure times of objects. It can take different shapes to describe various failure patterns. Example: The lifetime of electronic components, such as light bulbs, can be modeled using a Weibull distribution. Different shapes of the distribution correspond to different failure patterns. |

10
Q

What is the Cauchy Distribution?

A

The Cauchy distribution is characterized by its heavy tails and lack of finite moments. It is used to describe certain types of distributions in physics, engineering, and other fields. Example: In a physics experiment involving interference patterns, the distribution of phase differences between waves can be modeled using a Cauchy distribution. |

11
Q

What is the Pareto Distribution?

A

The Pareto distribution is used to model distributions where a small number of observations account for the majority of occurrences. It is often used in economics and finance. Example: In economics, the distribution of income or wealth often follows a Pareto distribution, where a small percentage of individuals hold the majority of resources. |

12
Q

What is the Exponential Power Distribution?

A

The exponential power distribution is a flexible distribution capable of modeling a wide range of shapes and tail behaviors. It is used in economics, finance, and engineering to handle diverse datasets. Example: The distribution of rainfall intensity during heavy storms can be modeled using an exponential power distribution to capture different patterns of intensity variation. |

13
Q

What is Bayes’ Theorem, and how does it relate to machine learning?

A

Bayes’ Theorem is a fundamental concept in probability theory and statistics. It provides a way to update predictions based on new evidence. In machine learning, it’s used for classification tasks like spam detection or medical diagnosis.

14
Q

Can you provide an example of Bayes’ Theorem in spam email detection?

A

Certainly! Consider a scenario where you’re building a spam filter. Given prior probabilities and keyword occurrence probabilities, Bayes’ Theorem helps calculate the chance an email is spam based on keywords.

15
Q

How does Bayes’ Theorem enhance decision-making in machine learning?

A

Bayes’ Theorem improves decision-making by incorporating prior knowledge and new evidence. It adjusts probabilities to update beliefs, leading to more accurate classifications and informed decisions.

16
Q

What is Prior Probability (Prior)?

A

Prior Probability: The initial belief or probability of an event occurring before considering new evidence.Example: In a medical test for a rare disease, the prior probability of a person having the disease might be 0.001 (0.1%).

17
Q

What is Posterior Probability (Posterior)?

A

Posterior Probability: The updated probability of an event occurring after considering new evidence using Bayes’ Theorem.Example: After a positive test result, the posterior probability of a person having the disease is recalculated based on the test.

18
Q

What is Likelihood?

A

Likelihood: The probability of observing the evidence (data) given a specific hypothesis or event.Example: In a coin flip, the likelihood of getting heads given that the coin is fair is 0.5.

19
Q

What is Evidence (Data)?

A

Evidence (Data): The observed information that is used to update probabilities.Example: In spam email detection, the evidence could be the presence of specific keywords in an email.

20
Q

What is Marginal Probability?

A

Marginal Probability: The probability of a single event occurring, disregarding any other events.Example: The probability of rolling a 4 on a fair six-sided die is a marginal probability.

21
Q

What is Conditional Probability?

A

Conditional Probability: The probability of one event occurring given that another event has already occurred.Example: The probability of a patient having a disease given that they exhibit certain symptoms.

22
Q

What is Joint Probability?

A

Joint Probability: The probability of two or more events occurring together.Example: The joint probability of rolling a 3 and flipping a head on two independent coin tosses.

23
Q

What is Law of Total Probability?

A

Law of Total Probability: A formula that computes the probability of an event by considering all possible ways it can occur.Example: Calculating the probability of a student passing a course by considering the probability of passing given study time.

24
Q

What is Bayes’ Factor?

A

Bayes’ Factor: A measure of the strength of evidence for one hypothesis compared to another, obtained by a ratio.Example: Comparing the hypothesis that a medical treatment is effective versus the hypothesis that it is not based on patient outcomes.

25
Q

What is Prior Distribution?

A

Prior Distribution: The probability distribution representing our uncertainty about a parameter before observing data.Example: In Bayesian statistics, the initial distribution representing our beliefs about the success rate of a new drug.

26
Q

What is Posterior Distribution?

A

Posterior Distribution: The updated probability distribution of a parameter after observing data.Example: The distribution of possible values for a patient’s blood pressure after incorporating measurements and prior knowledge.

27
Q

What is Probability Density Estimation (PDE)?

A

Probability Density Estimation (PDE) is a statistical technique used to estimate the probability distribution of a continuous random variable.

28
Q

How does Probability Density Estimation work?

A

PDE involves creating a smooth curve, called a probability density function, that approximates the underlying pattern in the data.

29
Q

Can you provide a simple example of PDE?

A

Certainly! For instance, PDE can help us understand the distribution of ages in a town by creating a curve showing how likely different ages are.

30
Q

What’s the benefit of using Probability Density Estimation?

A

PDE helps us see common trends and variations in data, allowing us to make informed decisions about the overall pattern.

31
Q

In what fields is Probability Density Estimation applied?

A

PDE is used in finance, biology, and machine learning, among others, to analyze data distributions and make predictions based on patterns.

32
Q

How is the probability density function (PDF) created in PDE?

A

The PDF is created by smoothing out data points using mathematical techniques, providing insights into the likelihood of different values.

33
Q

Is Probability Density Estimation useful only for large datasets?

A

PDE is useful for both large and small datasets, helping us understand data patterns regardless of the data’s size.

34
Q

What’s the main goal of Probability Density Estimation?

A

The main goal of PDE is to provide a representation of the underlying probability distribution, allowing us to understand data likelihoods.

35
Q

What is Hypothesis Testing?

A

Hypothesis testing is a statistical method used to make decisions about population parameters based on sample data. It involves formulating competing hypotheses and assessing evidence.

36
Q

Can you provide an example scenario for Hypothesis Testing?

A

Certainly! Imagine a company claims a new marketing campaign increased daily website visitors. Hypothesis testing helps us systematically assess whether this claim is supported by data.

37
Q

What are the steps in Hypothesis Testing?

A

The steps include: Formulating Hypotheses, Choosing Significance Level (α), Collecting and Analyzing Data, Calculating Test Statistic, Determining Critical Region/Critical Value, Making a Decision, Drawing a Conclusion.

38
Q

How do you formulate hypotheses in Hypothesis Testing?

A

Formulate a Null Hypothesis (H0​) and an Alternative Hypothesis (H1​ or Ha​) that represent the default assumption and the statement being tested.

39
Q

What is the significance level (α) in Hypothesis Testing?

A

The significance level (α) is the acceptable risk for making a Type I error (rejecting H0​ when true). Common choices are 0.05 or 0.01.

40
Q

How do you analyze data in Hypothesis Testing?

A

Collect a representative sample, perform statistical analysis, and calculate a test statistic.

41
Q

What is the test statistic in Hypothesis Testing?

A

The test statistic quantifies the difference between sample data and what’s expected under H0​. It varies based on hypothesis and data (e.g., t-test, z-test, chi-square test).

42
Q

How do you make a decision in Hypothesis Testing?

A

Compare the calculated test statistic with critical value or region. If the test statistic falls in the critical region, reject H0​; otherwise, fail to reject H0​.

43
Q

Could you provide an example application of Hypothesis Testing?

A

Certainly! Suppose we compare website visitors before and after a campaign. If the t-statistic falls in critical region and p-value is low, we’d reject H0​ and conclude campaign worked.

44
Q

What should you consider when conducting Hypothesis Testing?

A

Consider sample size, assumptions, and chosen significance level. A low p-value suggests evidence against H0​, but failure to reject H0​ doesn’t prove it’s true.

45
Q

What is a p-value?

A

The p-value is a probability value in hypothesis testing. It measures the strength of evidence against the null hypothesis, indicating the probability of observing an observed result (or more extreme) if the null hypothesis were true.

46
Q

How does the p-value conceptually relate to hypothesis testing?

A

Think of the null hypothesis as a “default assumption” and the p-value as quantifying how unusual the observed data is under this assumption. A small p-value suggests evidence against the null hypothesis.

47
Q

How is the p-value calculated?

A

The calculation depends on the test statistic and specific hypothesis test. It involves finding the probability of obtaining a test statistic as extreme as, or more extreme than, the one from the sample data.

48
Q

How do you interpret the p-value?

A

Interpreting p-values: A small p-value (p<α) indicates strong evidence against the null hypothesis. A large p-value (p≥α) suggests weak evidence against the null hypothesis.

49
Q

Could you provide an example scenario involving p-values?

A

Certainly! In a pharmaceutical example, a drug’s effect on blood pressure is tested. If the calculated p-value is 0.03 and α is 0.05, we have strong evidence to reject H0​ and conclude the drug likely affects blood pressure.

50
Q

What should be considered when interpreting p-values?

A

Consider that p-values don’t prove hypotheses true or false. Context, effect size, and domain knowledge are vital. A chosen significance level (α) sets the threshold for decision-making.

51
Q

What’s the significance of the chosen significance level (α)?

A

The significance level (α) determines the threshold for decision-making. Smaller α makes it harder to reject H0​. If p<α, we may reject H0​.

52
Q

What are some cautions and potential misinterpretations of p-values?

A

Be cautious of misinterpretations: A small p-value doesn’t prove the alternative hypothesis true, and a large p-value doesn’t prove H0​ true. P-value alone doesn’t indicate effect size or practical significance.

53
Q

How do you calculate p-values?

A

Calculating p-values involves determining the probability of obtaining a test statistic as extreme as, or more extreme than, the one calculated from the sample data, assuming the null hypothesis is true.

54
Q

What’s the first step in calculating p-values?

A

Begin by formulating hypotheses: a null hypothesis (H0​) and an alternative hypothesis (H1​ or Ha​).

55
Q

What’s the significance of choosing a significance level (α)?

A

Significance level (α) determines the acceptable risk for making a Type I error. Common choices for α are 0.05 or 0.01.

56
Q

What’s the role of data analysis in calculating p-values?

A

Collect a representative sample and perform appropriate statistical analysis to calculate the test statistic.

57
Q

How is the test statistic calculated for p-value calculation?

A

Calculate the specific test statistic depending on the hypothesis test being conducted. Examples include t-statistic, z-score, F-statistic, or chi-square statistic.

58
Q

What’s the critical region or critical value in p-value calculation?

A

Determine the critical region (range of extreme values) or calculate the critical value(s) based on chosen α and test statistic distribution.

59
Q

How is the p-value itself calculated?

A

Calculate the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample data under the null hypothesis.

60
Q

How do you interpret the calculated p-value?

A

Compare the calculated p-value to the chosen significance level α. If p<α, you have evidence to reject H0​; if p≥α, you fail to reject H0​.

61
Q

What’s the significance of using statistical software in p-value calculations?

A

For complex tests, statistical software (e.g., R, Python, dedicated software) is commonly used. It simplifies calculations and ensures accuracy.

62
Q

Examples of p-value Calculations

A

One-Sample z-test: Calculate z-score and find area under standard normal curve. Two-Sample t-test: Calculate t-statistic, degrees of freedom, and find p-value from t-distribution. Chi-Square Test: Calculate chi-square test statistic and find p-value from chi-square distribution. ANOVA (Analysis of Variance): Calculate F-statistic and find p-value from F-distribution.

63
Q

What are Degrees of Freedom (df) and why are they important?

A

Degrees of Freedom (df) represent the number of values in a statistic that can vary while satisfying specific constraints. They are crucial in statistical calculations to ensure accurate estimations and meaningful tests.

64
Q

How do Degrees of Freedom relate to statistical calculations?

A

Degrees of Freedom account for the fact that not all values are completely independent due to constraints in the data. They determine the variability present in data when estimating parameters or conducting tests.

65
Q

Can you provide an example scenario involving Degrees of Freedom?

A

Certainly! Consider the calculation of sample variance: s2=n−1∑i=1n​(xi​−xˉ)2​, where n−1 represents degrees of freedom.

66
Q

How are Degrees of Freedom calculated for sample variance?

A

For sample variance, df is calculated by subtracting 1 from the sample size: df=n−1.

67
Q

Could you provide a detailed example of Degrees of Freedom calculation?

A

Of course! Let’s calculate df for sample variance using test scores: 85, 90, 88, 92, and 86. In this example, n=5, so df=5−1=4.

68
Q

Why are Degrees of Freedom important in statistical tests?

A

Degrees of Freedom are crucial in statistical tests as they determine critical values and probabilities from distribution tables. They ensure accurate results by accounting for variability and constraints.

69
Q

What considerations are important when dealing with Degrees of Freedom?

A

Choosing the correct df is essential for accurate analysis. The specific context and statistical procedure being used influence the calculation and interpretation of df.

70
Q

What is a One-Sample z-Test, and what is its purpose?

A

A One-Sample z-Test is a hypothesis test comparing a sample mean to a known population mean when the population standard deviation is known. It assesses whether the observed sample mean represents a significant deviation from the population mean.

71
Q

How does the One-Sample z-Test work conceptually?

A

The test compares the sample mean to the population mean, considering the known population standard deviation. It helps determine if the observed difference is statistically meaningful or due to chance.

72
Q

What assumptions are associated with the One-Sample z-Test?

A

The test assumes: 1) Random sample, 2) Known population standard deviation (σ), and 3) Normality of data or large sample size for the central limit theorem.

73
Q

What are the steps involved in conducting a One-Sample z-Test?

A

Steps: 1) Formulate hypotheses, 2) Calculate Test Statistic (z), 3) Determine Critical Region, 4) Calculate p-value, and 5) Make a Decision based on comparison to significance level (α).

74
Q

Could you provide a comprehensive example of the One-Sample z-Test?

A

Certainly! Two examples with detailed calculations and decisions are provided to illustrate the application and interpretation of the One-Sample z-Test.

75
Q

What is the significance and application of the One-Sample z-Test?

A

The test helps assess the significance of sample mean differences from a known population mean, aiding in decisions about process or measurement accuracy and consistency.

76
Q

What cautions and considerations should be noted for the One-Sample z-Test?

A

Considerations include data assumptions (normality, known σ), and use of the appropriate test for unknown σ (One-Sample t-Test).

77
Q

What is a Two-Sample t-Test, and why is it used?

A

A Two-Sample t-Test is a hypothesis test comparing means of two independent samples to determine if the observed difference is statistically significant or could have occurred by chance.

78
Q

How does the Two-Sample t-Test conceptually work?

A

The test compares means of two samples while considering sample variability. It assesses if the observed difference is statistically significant, accounting for inherent variability in the data.

79
Q

What assumptions underlie the Two-Sample t-Test?

A

The assumptions include: 1) Independence of data, 2) Approximate normality in each sample, and 3) Equal variances between populations.

80
Q

Could you outline the steps of conducting a Two-Sample t-Test?

A

Certainly! Steps include: 1) Formulate hypotheses, 2) Calculate Test Statistic (t), 3) Determine Degrees of Freedom (df), 4) Find Critical Region, 5) Calculate p-value, and 6) Make a Decision.

81
Q

Could you provide comprehensive examples of the Two-Sample t-Test?

A

Certainly! Two examples with detailed calculations and decisions illustrate the application and interpretation of the Two-Sample t-Test.

82
Q

What is the significance and application of the Two-Sample t-Test?

A

The test is crucial in various fields to compare group means, aiding decisions about observed differences’ statistical significance.

83
Q

What cautions and considerations should be noted for the Two-Sample t-Test?

A

Pay attention to assumptions (independence, normality, equal variances), and consider Welch’s t-test for unequal variances and small sample sizes.

84
Q

What is the Chi-Square Test, and why is it used?

A

The Chi-Square Test is a statistical hypothesis test that examines whether there is a significant association between categorical variables. It helps determine if observed differences in frequencies are statistically significant or likely due to chance.

85
Q

How does the Chi-Square Test conceptually work?

A

The test assesses whether the observed frequencies in a contingency table significantly differ from the expected frequencies under the assumption of independence. It helps determine if there is a meaningful relationship between categorical variables.

86
Q

What assumptions are associated with the Chi-Square Test?

A

The assumptions include: 1) Random Sampling: The data is collected using a random sampling technique. 2) Categorical Variables: The variables being studied are categorical (nominal or ordinal). 3) Expected Frequencies: The expected frequency in each cell of the contingency table is at least 5.

87
Q

Could you outline the steps involved in conducting a Chi-Square Test?

A

Certainly! Steps include: 1) Formulate hypotheses, 2) Set significance level (α), 3) Create contingency table, 4) Calculate expected frequencies, 5) Calculate test statistic (χ2), 6) Determine degrees of freedom (df), 7) Determine critical value or calculate p-value, and 8) Make a decision based on critical value or p-value.

88
Q

Could you provide comprehensive examples of the Chi-Square Test?

A

Certainly! Two examples with detailed calculations and decisions illustrate the application and interpretation of the Chi-Square Test.

89
Q

What is the significance and application of the Chi-Square Test?

A

The Chi-Square Test is essential for analyzing associations between categorical variables, helping researchers draw conclusions about significant relationships.

90
Q

What cautions and considerations should be noted for the Chi-Square Test?

A

Assumptions (random sampling, expected frequencies) should be met for valid results. Fisher’s Exact Test is an option for small expected frequencies.

91
Q

What is ANOVA (Analysis of Variance), and why is it used?

A

ANOVA is a statistical technique used to compare means of two or more groups to determine if there is a significant difference among them. It assesses variability within and between groups to draw conclusions about population means.

92
Q

How does ANOVA conceptually work?

A

ANOVA evaluates whether the observed variability between group means is statistically significant compared to variability within each group. It helps determine if group differences are likely due to a real effect or if they could have occurred by chance.

93
Q

What assumptions are associated with ANOVA?

A

The assumptions include: 1) Independence: Data in each group are independent. 2) Normality: Data in each group is approximately normally distributed. 3) Homoscedasticity: Groups have equal variances.

94
Q

Could you outline the steps involved in conducting ANOVA?

A

Certainly! Steps include: 1) Formulate hypotheses, 2) Set significance level (α), 3) Calculate the grand mean (Xˉ), 4) Calculate sum of squares, 5) Calculate degrees of freedom, 6) Calculate mean squares, 7) Calculate F-statistic, 8) Determine critical value or calculate p-value, and 9) Make a decision based on critical value or p-value.

95
Q

Could you provide comprehensive examples of ANOVA?

A

Certainly! Two examples with detailed calculations and decisions illustrate the application and interpretation of ANOVA.

96
Q

What is the significance and application of ANOVA?

A

ANOVA is crucial for comparing means of multiple groups, helping researchers determine if observed differences are statistically significant.

97
Q

What cautions and considerations should be noted for ANOVA?

A

Assumptions (independence, normality, homoscedasticity) should be met for valid results. If assumptions are violated, consider non-parametric alternatives.