Statistical Measure 📏 DETERMINES if the results are statisically significant⭐⭐⭐⭐⭐⭐⭐ A low p value 5% = conclude the null hypothesis and that there is no effect/relationship/difference between 2 variables

Principles of Statistics Flashcards by Beth Turner

What does analysing data with statistics do?

Framework to uncover hidden patterns 🏗️
Objective Perspective 🎯
Test Hypotheses🧪
Confident Decisions: Rely on Data > Assumptions e.g. lead time changes💪

How well did you know this?

Not at all

Perfectly

How have you applied statistical testing when analysing data?

Descriptive stats: mean, median etc.
Inferential stats: Hypothesis Testing: pearson’s correlation coefficient or Regression
Assess Model: RMSE, MAE

How well did you know this?

Not at all

Perfectly

What is Hypothesis Testing?

Inferential stats method 📈
Assess a hypothesis about a larger population based on a sample 👥 🎛️
2 Competing hypothesises - null (no sig correlation) and alternative (a sig correlation)❌🔀
See if observed data is due to chance🔭🍀

How well did you know this?

Not at all

Perfectly

What is Inferential Statistics?

Field of Statistics🌾
Analytical tools to draw conclusions about a whole population 🌍 based on a sample 🔬

How well did you know this?

Not at all

Perfectly

What is Pearson’s correlation test?

Type of hypothesis testing that determines if a relationship exists between 2 variables (lead time and stock holding)

How well did you know this?

Not at all

Perfectly

What is a t test?

Hypothesis test that compares the means of 2 groups

How well did you know this?

Not at all

Perfectly

What was the significance level that the P value was tested against?

5% significance level (p < 0.05)

How well did you know this?

Not at all

Perfectly

What is a p-value?

Statistical Measure 📏
DETERMINES if the results are statisically significant⭐⭐⭐⭐⭐⭐⭐
A low p value < 5% = reject the null hypothesis and conclude the alternative that there is an effect/relationship/difference
A high p value > 5% = conclude the null hypothesis and that there is no effect/relationship/difference between 2 variables

How well did you know this?

Not at all

Perfectly

Interpret the P value results of the Pearson’s Correlation Test

P Value < 0.05
Reject Null
Conclude Alternative
WAS a significant relationship between Lead Time & Stock Holding

How well did you know this?

Not at all

Perfectly

Interpret the correlation coefficient of the pearson’s correlation test

Strength of relationship
-1 to 1
Positive Value, far from 1
Weak Positive relationship
Could infer from the sample: a relationship did exist between lead time and stock holding in the Frozen Warehouse (Inferential Stats example)

How well did you know this?

Not at all

Perfectly

Have you encountered a situation where stats method did not yield the desired results? How did you rectify it?

Regression = high error & poor fit
Due to small sample size, DQ issues or weak relationship
Frozen Suppliers not adhere to lead times
Summer build stock (irrespective of lead time)
Customer demand, supplier shortages, warehouse space (not considered by model)
External factors: historical data may be better
Time series: identify patterns

How well did you know this?

Not at all

Perfectly

What is linear regression?

Stats method
Predicts an outcome based on another
By fitting a line of best fit to the data
The equation of the line allows the model to make predictions
E.g. if the lead time was 30 days (x axis), you could see where the line intercepts the x axis and see the corresponding y value (stocking holding) as the prediction

How well did you know this?

Not at all

Perfectly

When did you use linear regression?

To predict stock holding from lead time
Lead Time as the independent variable (x axis)
Stock Holding as the dependent variable (y axis)

How well did you know this?

Not at all

Perfectly

What was the independent variable in your regression model?

Lead time on the x axis

How well did you know this?

Not at all

Perfectly

What was the dependent variable on your regression model?

Stock holding on the y axis

How well did you know this?

Not at all

Perfectly

What evaluation metrics did you use to determine the accuracy and effectiveness of your models?

Study These Flashcards

ROOT MEAN SQUARED ERROR- measures the difference between actual and predicted values (lower value is better)
MEAN ABSOLUTE ERROR- showed how much error was in the predictions too (lower value is better)
R SQUARED - most common - shows how much data variation is explained by the model. 0 - 1. 1 = 100% of the variation is explained by the model. 1 = better fit💯✅
Plotted predicted stock/lead time - not a straight 45 line, not performing well

What is R SQUARED and interpret your results

Study These Flashcards

Number that shows how well the line (LR Model) fits the data🔢
Tells me how much of a difference in stock holding can be explained by lead time⏱️
My R-squared was no bigger than 0.05, which means only 5% of the differences in stock holding can be explained by lead time⚄
Additionally, Training and Test numbers were lower, which could suggested the model was** too simple to capture the patterns** in the data (underfitting)🤺🧪⚪️

What does over fitting mean?

Study These Flashcards

Model is too complex
Fits the training data too well
Cannot handle data that is different from that

What is a limitation of R squared?

Study These Flashcards

sensitive to outliers and my data had a few that could have influenced the score

Why did you choose those error metrics?

Study These Flashcards

MAE and RMSE as together as RMSE is sensitive to outliers and using both can show more insights. E.g. RMSE bigger than MAE = outliers exist that could throw model off

What is a time series forecast?

Study These Flashcards

Type of predictive analysis that predicts future values based on historical data collected at specific intervals. It analyses past trends, patterns and seasonal variations to make these predictions.

What tool did you use for the time series forecast model and why?

Study These Flashcards

Python

flexibility: exponential smoothing levels
experiment with different models

How do you know if your forecast is accurate?

Study These Flashcards

Root Mean Sq Error - margin of error between actual and predicted values

Mean Absolute Error - also measures error between actual/predictive values

Also use confidence levels in the chart to see how confident the model is

What are the 4 plots on the decomposition plot show?

Study These Flashcards

Observed - actual data
trend - the long term upward or downward direction of the data
seasonality - repeating patterns within specific time periods
Residual (noise): random fluctuations that cannot be explained by trend or seaonality

What does decomposition plot do?

Breaks down the data into underlying components

What does my decomposition plot show specifically?

Observed: Peaks and troughs show fluctuation in stock levels over time Trend: downward trend unto 2020, steep upwards until 2021 - levels off a little Seasonal: annual pattern - stock levels rising in the second quarter and gradually decreasing throughout the year - working capital management at year-end and ice cream stock building Residual: random fluctuation do exist that could be due to supplier issues, manual adjustments to orders, changes in space allocated to shelves in shops

What time series forecast model did you use to forecast?

* Naive - assumes values would be equal to the most recently observed data value (establishes **baseline** for comparison) * Holt Linear - trend - identifies underlying trend in the data to make future predictions (does not capture autocorrelation which is the relationship between the variables current and past value) * Holt Winters - trend and seasonality

What parameters did you use to customise the holt winters model?

* Seasonal periods: 52 for weekly * Trend/Seasonality: add * Smoothing level: how much **weight** given to past observations when forecasting future values (0.10)

What was the outcome of the time series forecast?

* Holt winters 🏆 * Forecast: higher stock than previous years * BUT below 4.3 million unit maximum * Winter Stock Build (model not considered) * 5 - 7% error, < 5% better but better than "finger in the air"

What could your stakeholders use the time series forecast model for?

Stock prediction used to: * 🎯 Optimise Targets and KPIs * ⚠️ Foresee potential issues and correct them * ⚡ Maintain efficient warehouse operations

Explain the Linear Regression Equation

y = mx + c y = predicted stock holding m = intercept: where line intersects the y value when x = 0 x = value we change i.e. lead time days c = gradient that measures the slope of the line

What is Descriptive Statistics?

* **Summarise & Describe** a dataset that's a representation of a population * Overview of **characteristics** of the data e.g. **central tenency** (mean, mode, median) or **variability** (SD, min, max) * **Understanding** of the data - **foundation for inferential statistics**

What are Limitations of Linear Regression?

* Need a linear relationship between varibles (strong relationship = better model) * Predictions are limited to models trained range (e.g. 100 days would not produce reliable values as model has not been trained on such values) * Predictions are not fact

Why did you want to forecast stock holding?

* Action needed before year end **W/C metrics**? * Reduce risk of reach **capacity** and impacting operations * We **pay for space**, need to use it efficiently, useful to predict stock holding

In Time Series Forecasting, why do you Resample the Data?

* Adjust frequency * Smooth Gaps * Reduce Noise * Easier to identify patterns

Principles of Statistics Flashcards

(35 cards)