Collecting Data 1 Flashcards
Scales of Measurement
- In order of desirability
- Nominal
- Ordinal (Ranking)
- Interval
- Ratio
Nominal Scale of Measurement
- Data that consists of names or categories only
- Allows us to classify the object
- E.g. Is a famous beach or not
- Does not allow rank
- E.g. Doesn’t rank how famous the beach is
- Cannot determine the interval
- No ordering scheme is possible
- E.g. # of M&M colors in a bag
Ordinal (Ranking) Scale of Measurement
- Data arranged in order
- Difference between the values cannot be determined or are meaningless
- A ranking scale
- E.g. Likert Customer satisfaction scale
- The difference between a 2 rating and a 4 rating does not mean the customer is twice as satisfied when giving a 4.
- E.g. Software defect categories
- 3 UI, 4 data, 1 browser compatibility
- E.g. Likert Customer satisfaction scale
Interval Scale of Measurement
- Data type which is measured along a scale, in which each point is placed at equal distance from one another
- Always appears in the form of numbers or numerical values where the distance between the two points is standardized and equal
- Has an interval
- Data is arranged in order and differences can be found
- No starting point
- Cannot be multiplied or divided, can be added or subtracted
- Ratios are meaningless
- E.g. Temperature of 3 pizzas. If one pizza is 100 degrees, that doesn’t make a 300 degree object 3 time as hot
- Examples:
- Temperature (in Celsius or Fahrenheit)
- IQ test
- Grade level, 1st, 2nd, 3rd grade
- Dates
Ratio Scale of Measurement
- Extension of interval level that includes a zero starting point
- Data is high level variable data
- There is an inherent zero starting point
- Both differences and ratio are meaningful
- Classify objects
- Rank Objects
- Has equal intervals
- Has a true zero point
- E.g. Watches that cost $200 and $400. The 2nd one is 2 times as expensive as the first
Types of Data
The type of data you have will dictate what you can do and the tools you can use.
- Discrete Data
- Qualitative Data
- Attribute Data
- Continuous Data/Variable Data
- Location Data
Discrete Data
- Best at discerning whether or not we have a defective product or service
- “Pass/Fail: is better for failure analysis
- Counted data is discrete
- E.g. Number dimples on a golf ball
- Number of people in a stadium
- 80/100 to discrete - it is out of a finite set
- E.g. Number dimples on a golf ball
- Full numbers
Qualitative Data
- An example of qualitative data is color. It cannot be expressed as a number
Attribute Data
- Anything that can be classified as either/or
- Very binary
- Pass/Fail, go/no-go, good/bad
- Example:
- Paint chips per unit, percent of defective units in a lot, audit points
- Attribute charts
- A kind of control chart to display information about defects and defectives. Helps you visualize variation
Continuous Data/Variable Data
- Anything that can be measured on a continuous basis
- Can always be divided into smaller increments
- Exists on a continuum
- Preferred over Discrete
- Use continuous data where possible because it tells us the magnitude of the issue
- Helpful for controlling the process and providing enough discrimination
- Examples:
- Length (inches, half inch, hundredths of an inch…)
- Weight
- Temperature
- Time
- Anything you can measure: torque, tension, length, volume
Teaching Discrete and Continuous Data
Imagine you have a young child who says that he is sick. As a parent, the first thing you do is to touch their forehead to see if they feel warm – that is collecting discrete data.
If it feels like he has a fever, you’re likely to use a thermometer to take his temperature – Another type of data collection. You need to know magnitude of the fever because that will determine the course of action; 105 – ER, 101 – TYLENOL. That temperature reading is continuous data – data that exist on a continuum.
Location Data
- You could record on a measles diagram
- Example:
- Determining root cause of paint blemishes occurring on a car production line
-
Measles Diagram/Chart
- Use specifically to analyze the problem’s location and density, not just collecting the count of the problem.
- Helps determine where the common defects on parts are located
Converting Types of Data
- Difficult to translate after the fact attribute (go/no go) data to variable. But in most cases, you can find a way during measuring to convert attribute to variable
- Example: how far out of tolerance
- Always easy to convert variable data to attribute data if you have a standard.
- Example: Water is too cold to swim at less than 75 degrees. No go <75. Then put all of the data that is less than 75 to “no go” and all above “go”
Data Distribution
- Data distribution is a function that specified all possible values for a variable and also quantifies the relative frequency (probability of how often they occur)
- Distributions are considered any population that has a scattering of data.
- It’s important to determine the kind of distribution that population has so we can apply the correct statistical methods when analyzing it
Types of Continuous Distributions
- Normal Distribution
- Lognormal Distribution
- F Distribution
- Chi-Square Distribution
- Exponential Distribution
- T-Student Distribution
- Weibull Distribution
- Non-Normal Distributions
- Odd Distributions
- Bivariate Distribution
- Bi-Modal
Continuous Distribution
- A Continuous Distribution containing infinite (variable) data points that may be displayed on a continuous measurement scale.
- A continuous variable is a random variable with a set of possible values that is infinite and uncountable.
- It measures something rather than just count and typically described by probability density function (pdf)
- Simply Continuous = can take many different values
Types of Discrete Distributions
- Binomial Distribution
- Poisson Distribution
- Hypergeometric Distribution
- Geometric Distribution
Discrete Distributions
- A discrete distribution resulting from countable data that has finite number of possible values.
- Discrete Distributions can be reported in tables and the respective values of the random variables are countable
- Example: Rolling dice, choosing a number of heads etc.
- Simply Discrete=counted
Probability Mass Function (pmf)
- Discrete Distributions
- Probability mass functions is a frequency function which gives the probability for discrete random variables
- Aka Discrete Density Function
Binomial Distribution
- Discrete Distribution
- The Binomial distribution measures the probability of the number of successes or failure outcome in an experiment in each try
- Characteristics that are classified into two mutually exclusive and exhaustive classes, such as number of successes/failures, number accepted/rejected follow binomial distribution
- Example: Tossing a coin: Probability of coin landing Heads is ½ and the probability of coin landing Tail is ½
Poisson Distribution
- Discrete Distribution
- The Poisson distribution is the discrete probability distribution that measures the likelihood of a number of events occurring in a given time period, when the events occur one after another in a time in a well-defined manner
- Characteristics that can theoretically take larger values, but actually take small values have Poisson distribution
- Example: Number of defects, errors, accidents, absentees, etc.
Hypergeometric Distribution
- Discrete Distribution
- Hypergeometric distribution is a discrete distribution that measures the probability of a specified number of successes in (n) trials, without replacement, from a relatively large population (N).
- In other words, sampling without replacement
- Similar to Binomial Distribution
- For the binomial distribution, the probability is the same for every trail.
- For hypergeometric distribution, each trial changes the probability for each subsequent trial because there is no replacement.
Geometric Distribution
- Geometric distribution is a discrete distribution that measures the likelihood of when the first success will occur
- Discrete probability distribution that represents the probability of getting the first success after having a consecutive number of failures
- Can have an indefinite number of trails until the first success is obtained
- An extension of it may be considered as negative binomial distribution
- Example:
- You ask people outside a polling station who they voted for until you find someone that voted for the independent candidate in a locate election. The geometric distribution would represent the number of people who you had to pool before you found someone who voted independent.
Probability Density Function (pdf)
- Continuous Distributions
- The probability density function describes the behavior of a random variable.
- It is normally grouped frequency distribution.
- Hence, the probability density function is seen as “shape” of the distribution