Collecting Data 1 Flashcards

Question

Normal Distribution

Answer 1

* Continuous Distribution * Normal distribution is also known as Gaussian distribution. * It is a symmetrical bell shaped curve with higher frequency (probability density) around the center value. * The frequency sharply decreases as values are away from the central value on either side * In other words, values lie in a symmetrical fashion mostly situated around the mean.

Answer 2

* Continuous Distribution * A continuous random variable x follows a lognormal distribution if its natural logarithm, ln(x), follows a normal distribution * When you sum the random variables, as the sample size increases, the distribution of the sum becomes a normal distribution, regardless of the distribution of the individuals. Same scenario for multiplication. * The location parameter is the mean of the data set after transformation by taking the logarithm, and also the scale parameter is the standard deviation of the data set after transformation.

Answer 3

* Continuous Distribution * The F distribution is extensively used to test for equality of variances from two normal populations. * The F distribution is an asymmetric distribution that has a minimum value of 0, but no maximum value. * Notably, the curve approaches zero but never quote touches the horizontal axis

Answer 4

* Continuous Distribution * The Chi Square Distribution results when independent variables with standard normal distribution are squared and summed * A chi-square distribution is a continuous distribution with degrees of freedom. It is used to describe **the distribution of a sum of squared random variables**. * Ex: if Z is standard normal random variable then * y =Z₁²+ Z₂² +Z₃² +Z₄²+…..+ Z_n² * ^{The chi square distribution is symmetrical, bounded blow by zero and approaches the normal distribution shape as the degrees of freedom increases.}

Answer 5

* Continuous Distribution * The exponential distribution is the probability distribution and of the widely used continuous distributions. Often used to model items with a constant failure rate. * Closely rated to the Poisson distribution * Has a constant failure rate as it will always have the same shape parameters * Example: The lifetime of a bulb, the time between fires in a city * The definition of exponential distribution is **the probability distribution of the time \*between\* the events in a Poisson process**. If you think about it, the amount of time until the event occurs means during the waiting period, not a single event has happened. This is, in other words, Poisson (X=0).

Answer 6

* Continuous Distribution * T distribution or student's t distribution is a bell shaped probability distribution, symmetrical about its mean. * Commonly used for hypothesis testing and constructing confidence intervals for means * Used in place of the normal distribution when the standard deviation is unknown * Like the normal distribution, when random variables are averages, the distribution of the average tends to be normal, regardless of the distribution of the individuals * The t distribution (aka, Student's t-distribution) is a probability distribution that is used **to estimate population parameters when the sample size is small and/or when the population variance is unknown**.

Answer 7

* Continuous Distribution * The basic purpose of Weibull distribution is to model time-to-failure data. * Widely used in reliability, medical research and statistical applications. * Assumes many shapes depending upon the shape, scale, and location parameters. Effect of Shape parameter β on Weibull distribution: * For instance, if shape parameter β is 1, it becomes identical to [exponential distribution](https://sixsigmastudyguide.com/exponential-distribution/). * If β is 2, then Rayleigh distribution. * and If β between 3 and 4, then [Normal distribution](https://sixsigmastudyguide.com/normal-probability-plot/).

Answer 8

* Continuous Distribution * The basic purpose of Weibull distribution is to model time-to-failure data. * Widely used in reliability, medical research and statistical applications. * Assumes many shapes depending upon the shape, scale, and location parameters. Effect of Shape parameter β on Weibull distribution: * For instance, if shape parameter β is 1, it becomes identical to [exponential distribution](https://sixsigmastudyguide.com/exponential-distribution/). * If β is 2, then Rayleigh distribution. * and If β between 3 and 4, then [Normal distribution](https://sixsigmastudyguide.com/normal-probability-plot/).

Answer 9

Generally an assumption is that while performing a hypothesis test that the data is a sample from a certain distribution commonly normal distribution, but always that is not the case that data may not follow normal distribution. Hence [nonparametric tests](https://sixsigmastudyguide.com/non-parametric/) used when there is no assumption of a specific distribution for the population. Particularly nonparametric test results are more robust against violation of the assumptions. Different types of nonparametric test are [Sign test](https://sixsigmastudyguide.com/1-sample-sign-non-parametric-hypothesis-test/), [**Mood’s Median Test (for two samples)**](https://sixsigmastudyguide.com/moods-median-non-parametric-hypothesis-test/) , [**Mann-Whitney Test for Independent Samples**](https://sixsigmastudyguide.com/mann-whitney-non-parametric-hypothesis-test/), [**Wilcoxon Signed-Rank Test for a Single Sample**](https://sixsigmastudyguide.com/1-sample-wilcoxon-non-parametric-hypothesis-test/)**,** [**Wilcoxon Signed-Rank Test for Paired Samples**](https://sixsigmastudyguide.com/1-sample-wilcoxon-non-parametric-hypothesis-test/)

Answer 10

* The continuous distribution (like normal, chi square, exponential) and discrete distribution (like binomial, geometric) are the probability distribution of one random variable * Whereas bivariate distribution is a probability of a certain event occur in case two independent random variables exists it may be continuous or discrete distribution. * Bivariate distribution is unique because it is the joint distribution of two variables. [**Bi-modal**](https://sixsigmastudyguide.com/bimodal-distribution/): * A bi-modal distribution which has two modes, in other words two outcomes that are most likely compare the outcomes of their region. * 2 sources of data coming into a single process screen.

Answer 11

A distribution is said to be skewed to the right if it has a long tail that trails toward the right side. The skewness value of a positively skewed distribution is greater than zero. **Example:** Income details of the manufacturing employees in Chicago indicates that the majority of people earn somewhere between $20K to $50K per annum. Very few earn less than $10K, and very few earn $100K. The center value is $50K. It is very clear from the graph a long tail is on the right side of the center value. As the tail is on the positive side of the center value, the distribution is positively skewed. Unlike symmetric distribution, it is not equally distributed on both sides of the center value. From the graph, it is clearly understood that the mean value is the highest one, followed by median and mode. Since the skewness of the distribution is towards the right, the mean is greater than the median and ultimately move towards the right. Also, the mode of the values occurs at the highest frequency, which is on the left side of the median. Hence, **mode \< median \< mean.**

Answer 12

Generally, symmetrical distribution appears as a bell curve. The perfect normal distribution is the probability distribution that has zero skewness. However, it is always impossible to have a perfect normal distribution in the real world, so the skewness is not equal to zero; it is almost zero. Symmetrical distribution occurs when mean, median, and mode occur at the same point, and the values of variables occur at regular frequencies. Both sides of the mean match & mirror each other. **Example:** The weights of high school students are reported between 80lb to 100lbs, while the majority of students weights are around 90lbs. The weights are equally distributed on both sides of 90lb, which is the center value. This type of distribution is called a [Normal Distribution](https://sixsigmastudyguide.com/normal-distribution-aka-gaussian-distribution/).

Answer 13

A distribution is said to be skewed to the left if it has a long tail that trails toward the left side. The skewness value of a negatively skewed distribution is less than zero. **Example:** A professor collected students’ marks in a science subject. The majority of students score between 50 and 80 while the center value is 50 marks. The long tail is on the left side of the center value because it is skewed the left-hand side of the center value. So the data is negative skew distribution. Here **mean \< median \< mode**

Answer 14

There are different methods to [test the normality](https://sixsigmastudyguide.com/how-do-you-know-if-a-process-is-normal/) of data, including visual or graphical method and Quantifiable or numerical methods. **Visual method:** Visual inspection approach may be used to assess the data distribution normality, although this method is unpredictable and does not guarantee that the data distribution is normal. However, visual method somewhat help user to judge the data normality. Ex: Histogram), boxplot, stem-and-leaf plot, probability-probability plot, and quantile-quantile plot. **Quantifiable method:** Quantifiable methods are supplementary to the visual methods. Particularly these tests compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation. Ex: Anderson-Darling Test, Shapiro-Wilk W Test, Kolmogorov-Smirnov Test etc.,

Answer 15

**How to Make a Process Follow a Normal Distribution by Using Transforms** Sometimes you will be analyzing a process and the data will come out in a non-normal shape. Since, [normal distributions](https://sixsigmastudyguide.com/normal-distribution-aka-gaussian-distribution/) have wonderful mathematical properties that make analysis and control so much easier, try to transform the data to a normal distribution if possible. The approach to address the non-normal distribution is to make transformation to “normalize” the data. Some typical data transformation methods are [Box Cox](https://sixsigmastudyguide.com/box-cox-transformation/), Log transformation, Square root or power transformation, Exponential and Reciprocal etc., [**Box Cox transform**](https://sixsigmastudyguide.com/box-cox-transformation/) * A Box Cox transformation is a useful power transformation technique to transform non-normal dependent variables into a normal shape. * George Box and Sir D.R.Cox. are the authors for this method * The applicable formula is y^l=y^λ(λ is the power or parameter the to be transform the data). * For instance, λ=2, the data is squared and if λ=0.5 a square root is required. [**Z transform**](https://sixsigmastudyguide.com/z-scores-z-table-z-transformations/) * Z transformation is an analysis tool in signal processing * It is a generalization of the Discrete-Time Fourier Transform (DTFT), in particular it applies to signals for which DFTF doesn’t exists thus allowing to analyze those signals * It also helps to see the new ideas in the sense of a system with respect to stability and causality * Z transform is the discrete time counterpart to the Lapse transform

Collecting Data 1 Flashcards

(39 cards)