Data Analysis Flashcards by Zadrian Huang

What is the difference between a bad data point and an outlier?

Bad data point: An observation that contains invalid or inaccurate data and may or may not be a statistical outlier.
Outlier: An observation that lies outside the overall pattern of a distribution.

How well did you know this?

Not at all

Perfectly

What are some causes of poor data quality?

Manual entry errors
Duplicate entries
Query issues (nulls, blanks, white space, data formats)

How well did you know this?

Not at all

Perfectly

What are the benefits of using experimental data rather than historical data?

Better define the measurement system
Create the factor settings to analyze
Establish cause and effect
Structured to reduce variation and enable analysis

How well did you know this?

Not at all

Perfectly

What are some issues with historical data?

Often lacks precision
Includes Project Y data but not the X settings needed
Does not establish cause and effect

How well did you know this?

Not at all

Perfectly

When does it make sense to use historical data?

Access to sufficient data
Cost prohibitive to conduct an experiment

How well did you know this?

Not at all

Perfectly

When does it make sense to use experimental data?

No historical data exists
Need to define the relationship between inputs and outputs
Identify and measure interactions between the ‘Vital Few’ sources of variation
Determine the best set-up conditions of X’s for improved Y performance
Cost-justified

How well did you know this?

Not at all

Perfectly

What is Data Analysis?

Using statistical methodology and tools to discover useful information and make better business decisions.

How well did you know this?

Not at all

Perfectly

When does it make sense to use data analysis?

Establish whether there is a relationship between an output and a suspected variable
Determine the optimal setting for a confirmed variable
Quantify the impact of controlling a confirmed variable

How well did you know this?

Not at all

Perfectly

What are some key outcomes of the Data Analysis skillset?

4S/GAP Methodology
Identifying Transfer Function(s)
Basic Statistical Analyses
Recommendations to the business based on data

How well did you know this?

Not at all

Perfectly

What is the difference between discrete and continuous data?

Discrete or Attribute data result from a finite number of possible values. Continuous or Variable data can be measured along a continuous scale of values.

How well did you know this?

Not at all

Perfectly

Why do we care what type of data we have?

Different Analysis tools for Discrete and Continuous Data
Continuous Data typically requires fewer data points
Continuous data provides a more complete picture of variation in a process

How well did you know this?

Not at all

Perfectly

What does y = (f)x represent and mean?

Transfer function. The relationship that explains y in terms of x(s).

How well did you know this?

Not at all

Perfectly

Describe the approach we use to analyze data at Penske.

4S (Stability, Shape, Spread, ‘S’enter) and GAP (Graphical, Analytical, Practical)

How well did you know this?

Not at all

Perfectly

What is the difference between descriptive statistics and inferential statistics?

Descriptive: Analyzing data without drawing conclusions about a larger group. Inferential: Analyzing data from a sample group to draw conclusions about the population.

How well did you know this?

Not at all

Perfectly

What is sampling?

The process of collecting a subset of the data and drawing conclusions about the total population from the subset.

How well did you know this?

Not at all

Perfectly

What statistics do we use to look at variation and when do we use each?

Study These Flashcards

Standard deviation - normal
IQR - non-normal

What statistics do we use to look at central tendency and when do we use each?

Study These Flashcards

Mean (average) - normal
Median (middle value) - non-normal
Mode (most frequently occurring) - not used in hypothesis testing

What core question are we answering when using a hypothesis test for Spread or ‘S’enter?

Study These Flashcards

Is there a statistical difference?

What is a p-value?

Study These Flashcards

The probability of being wrong if claiming a difference
The probability of obtaining a result as extreme as the one observed if the null hypothesis is true.

What is power?

Study These Flashcards

Ability to see a difference if there is one.

What are confidence intervals and why do we use them?

Study These Flashcards

Amount of variation we can expect in our estimates; used because of sampling.

What does a 95% confidence interval mean?

Study These Flashcards

95% of the time, the statistic of interest will fall within the range of that confidence interval if the same sample is taken.

Does correlation imply causation?

Study These Flashcards

No.

When referring to Project Y’s, what statements are typically true?

Study These Flashcards

Based on a sample of the population
Focused on establishing cause and effect relationships
Aim to create continuous Project Y’s.

What does a Stability test practically tell you?

Whether you can trust your measure of 'S'enter to be representative of the population over time.

What does GAP help us do?

Provides a standard way to look at data and answers the hypothesis test question of 'Is there a Difference?'.

What main thing do we have to take into account when collecting sample data?

The sample must be representative of the population.

When should you sample?

* Collecting all data is impractical * Too costly or time-consuming * Measuring a high-volume process.

When should you not sample?

A subset of data can't accurately depict the process.

What is the Central Limit Theorem?

If you take samples from a population, the means of the samples will form a normal distribution regardless of the original distribution shape.

Practically, what does a p-value tell us?

If there is a difference, it helps determine if the difference found truly makes a difference to the business.

If your p-value is 0.05, would you fail to reject the null or reject the null?

Fail to reject the null.

If I do not see a difference when completing my hypothesis test, what should I do?

Run a power test to determine how likely I would have been to see a difference.

What is the difference between common cause and special cause variation?

* Common: Always Present, Expected, Predictable, Usual * Special: Not Always Present, Unexpected, Unpredictable, Unusual.

Why do we conduct MSAs?

To ensure that variation in the data is from process variation and not measurement error.

What are residuals and why do we check them?

Residuals are the difference between actual response values and fitted values; they estimate inability to predict.

What is r and what does it tell us?

Correlation coefficient. Measures strength of linear relationship between variables.

What is r2?

Coefficient of determination; the amount of variation in the output explained by the input(s).

Data Analysis Flashcards

(38 cards)