Six Sigma Statistics and Graphical Presentation Flashcards
By Ron Crabtree (39 cards)
Sample
Subset of the overall population.
Make sure they are representative samples
Three most standard descriptive/characteristic statistics
Mean (arithmetic average)
Standard deviation
Variance
What are the symbols for the three most common data characteristics for both population parameters and sample statistics?
MEAN
Pop Par: mu
Sample stat: x-bar
STANDARD DEVIATION
PP: sigma
SS: s
VARIANCE
PP: sigma squared
SS: s squared
Descriptive statistics
Used to describe the process itself
One of most common tools: histogram. Variation, centering
Inferential statistics
Making inferences about the population from your sample
It’s possible to learn meaningful information with as little as 30 measurements.
Compare descriptive vs. inferential statistics
DESCRIPTIVE
Approach: More inductive (induce information)
Goal: Summarize the data to make decisions
Tools/Techniques: Histograms, interrelationship diagrams, process maps, fish bone diagrams
Interpretations: Fairly straightforward, Not as difficult to create
INFERENTIAL
Approach: Deductive (deduce information)
Goal: Infer population characteristics to predict future outcomes
Tools/Techniques: More advanced/complex, Chi squared, binomial, poisson distributions, hypothesis testing, confidence intervals, correlation, regression analysis.
Interpretations: Complex
Normal distribution
Most of the values in the data set are close to the average for the data. Standard deviation is small. Also allows for easy inference.
AKA The bell-shaped curve.
The 69-95-99 Percent Rule \+- One St Dev: 68.26 % Rule \+- Two St Dev: 95.44% Rule \+- Three St Dev: 99.74%
What are the basic tenants of the central limit theorem?
Basic tenants:
The sampling distribution of the mean approaches a normal distribution as the sample size increases.
n = 100, get a curve n = 500, a peak appears n = 1000, a normal distribution appears
As you increase samples, you get closer to a perfect bell curve
Something something central limit theorem
n = sample size for the sample mean
n = 4, get a near normal sampling dist
n = 30 will make the distribution normal
Basic tenants of confidence interval
Used to state some level of confidence that the mean of your population falls within a certain range
- Collect data for sample
- Calculate mean and standard deviation of sample
- Then make inference
Hypothesis testing
Test a null hypothesis, or a state of nature of which you do not know the true outcome.
H-naught (H0) typically set to test of two values are equal, or if greater/lesser than or equal to
H-sub-a: alternative of the null hypothesis.
Use data to infer the true state of the population
Control chart
Typically plotting data pulling at a consistent rate. Pooling samples
X axis is values (ex. 20 values), i.e. pulling 5 parts every hour and giving mean of those
Center line (mean of process)
UCL
LCL
Infer data about the entire population therein
Measures of central tendency
Whether or not the center of your process falls close to your target. Looking at the centering of your process
Measures of dispersion
How much variation within your process
What performance does six sigma aim for?
On target performance
As little variation as possible
Do you need to look at both central tendency & dispersion?
Yes. Need to look at BOTH measures to understand your data
Graph:
The three measures of central tendency
Mean - arithmetic average
x-bar = (sum of values/samples)/n
Median - middle data-point based on ordering
Data point to use = (n+1)/2
Ex. n = 7, use 4th data point
Mode - most frequently occurring value
You can have more than one mode, a bimodal distribution.
The three measures of dispersion
RANGE
= Maximum value - minimum value
Lets us ask: Which process is more tightly grouped around the mean?
STANDARD DEVIATION Gives a little more information about how much each data point varies from the mean AKA Sigma values Calculation: s = square root of (Sum of all values (Xi - X-bar)^2) / n-1 Xi = a score in the distribution The smaller the number, the less variation.
VARIANCE
The average of the squared differences from the mean.
Central to projects - goal is to reduce variation around the mean.
Not taking the square root, so not expressed with units of data.
Calculation: same as standard deviation but WITHOUT square root
(Sum of all values (Xi - X-bar)^2) / n-1
Xi = a score in the distribution
The smaller the number, the less variation.
Frequency distribution table
TWO COLUMNS
1st - Classes
2nd - Frequency of classes in data
(Optional 3rd) - Frequency as percentage
Usually collected with a check sheet
ex Determining how often a park is visited over time by certain classes of visitor.
Histogram are most frequent illustrations of frequency distributions. Or a pie chart.
How to make a frequency distribution table
- Organize the data into class intervals (ex 0-9, 10-19, etc)
- Remember, intervals must be mutually exclusive (it’s impossible to fall into 2 categories) - Record the data in the tally column
- Calculate frequency (percentage)
Tips for class intervals
Class intervals should be based on the number of data points.
- < 100 data points - 5-10 classes
- > 100 - 10-25 classes
- OR classes = square root of number of data points
To determine class interval
- Range/number of classes (or class interval = (maximum value-minimum value)/number of classes
- Make sure classes are mutually exclusive
- Include all data points
Cumulative frequency
Builds off of the frequency distribution table, but it provides information on the cumulative data.
Used to determine the number of observations above or below a particular value of the data set.
Helpful in understanding the behavior of the data.
Ends in additional columns for
- Cum frequency
- Cum percentage
Scatter diagrams
A way to graphically understand if there is a relationship between two variables.
X - Independent variable (Causal variable)
Y - Dependent variable
(Result)
Straight line through the dots
- Calculated by determining the “best fit” line
- Based on the slope, you can say whether they’re correlated
What questions do scatter plots answer?
- Is there a relationship?
- Is there a common pattern?