WEEK 9 Flashcards
(19 cards)
time series TIME DEPENDABLE VARIABLES
collect data on one subject over many time periods e.g. ice cream sales each week, daily number of flights or monthly job applications , we care about trends, patterns and changes over time
DATA WE USUALLY ANALYSE IS CALLD CROSS SECTION OF DATA and its a data that is collected
at one single point of time
eg a survey of 1000 people about their income
predictions about the future
forecasting
flactuations can happend due to
seasonality (over year)
time series characteristics - decompositions due to the nature of the time series data evolving over time we assume that our data is subject to the most common characteristics
trend - the overall long run average movement in the data, increasing, decreasing or constant
seasonality - related to the period fluctuation that repeats themselves at fixed intervals of the time (season) use visual graph or theory to understand it
cyclic behaviour - represents the generic ups and downs of the economy or industry that occurs over longer intervals of time
randomness - in the real world there is always variation in the data that cannot be explained
once we assume that the data is subject to these 4 factors we can try to identify the and use them to create a statistical model for forecasting. time series models which incorporate there characteristics are known as the decomposition model the step to carry this out are
1 identify the trend
2 identify seasonality
3 identify cyclical behaviour
4 combine these 3 a certain way to create a model
5 assess model error
6 use the model to make forecasts
randomness is the only
non systematic term
we need to keep doing the moel unit we achieve
a low error model to allow us to use randomness trend
low error model is a
model that leaves a very little amount of variation (randomness) and most of the variation of time series is explained by the systematic part (the first 3 ) which creates more accurate forecast
trend can be hard to see due to
the fluctuation caused by seasonality and/or cyclical behaviour
to identify general trend of our data we need to first
smooth or remove the effect of the seasonal fluctuations
the method to smooth is called moving average
which considers the avarege value of our data at different point in time (moving)
there are 2 ways use the moving average
simple moving average and centred moving average
simple moving average
k-period SMA is an average taken over the previous k periods and plotted at that period, we then shift everything along by 1 period to the the next average and so on, the idea is to smooth out the seasonal effect or spikes in the data making the trend much easier to identify an
d then use to forecast future values
k = 2 basically take first 2 periods calculate average of the 2 and we put is as SMA for period 3, then we do 2nd and 3rd period average and put it on the 4th period
if we do k = 3 then we do first 3 do average then its 4th period SMA then we do 2,3,4 and add average then its 5th period SMA
diss - use past values
we take the average by adding all then
dividing by how much data points there are
in excel we do =AVERAGE(D2:D4)
centred moving average
average taken over k periods which is then plotted in the centre of those values, we then shift everything along by 1 period to take the next average and so on
the adv is that it removes the ‘lag’ from moving average, but it does present an issue that SMA doesn’t, that is the calculations differ depending on whether k is odd or even
so basically
CMA odd k
odd like 3, 5 ,7 more straightforward
if k=3 we take average of 3 observations and plot this for 2nd period then take period 2,3,4 average and put it on number 3
if k=7 we do it under period 4 is middle cuz its 3 period before and 3 after
CMA even k = 4
in excel -(D2+2*(SUM(D3:D5))+D6)/8
k= 24,6 more complicated if there is 4 period which one is middle
so we take averages of averages so if we take average of first 4 then take the average of observations 2-5 and take average of these 2 we get 2.5 and 3.5 and the middle the is 3
so basically first4 average THEN TAKE average from 2nd period to 5th so 4 period again and then take average of these 2 averages so add them and divide by 2