Week 9: Time Series, Imbalanced Data & Fairness Flashcards by Annie Clarnette

REVERSED

F1 = (2*R*P)/(R+P)

What is the F1 measure?

How well did you know this?

Not at all

Perfectly

REVERSED

Sampling data from a stream
Queries over sliding windows
Counting distinct elements

What are 3 problems with a data stream?

How well did you know this?

Not at all

Perfectly

REVERSED

AEO(diff) = [(P1-P2) + (P3-P4)] /2

What is AEO(diff)?

How well did you know this?

Not at all

Perfectly

REVERSED

Maintain a count of the number of distinct elements seen so far

What is counting distinct elements?

How well did you know this?

Not at all

Perfectly

REVERSED

A single sensitive (protected) attribute defining demographic groups
Find privileged and unprivileged groups based on the sensitive attributes and the decision label
Checking parity between demographic groups
Cannot always identify hidden unfairness

What is statistical fairness?

How well did you know this?

Not at all

Perfectly

REVERSED

Store all the first s elements of the stream to S
We have seen n-1 elements, now the nth element arrives
With probability s/n, keep the nth element, otherwise discard it
If we picked the nth element, then it replaces one of the element s in sample S, picked uniformly at random

What is reservoir sampling?

How well did you know this?

Not at all

Perfectly

REVERSED

might introduce artificial minority class examples too deeply in the majority class space

What is a problem with SMOTE?

How well did you know this?

Not at all

Perfectly

REVERSED

Cost is the penalty associated with an incorrect prediction, goal is to minimise the cost
Based on the classifier predicted probabilities
Binary traditional case: predict positive if probability is > 0.5
Probability threshold can be changed using a cost matrix
Classify as positive if: probability of positive > FP/FP+FN

What is cost sensitive classification?

How well did you know this?

Not at all

Perfectly

REVERSED

Define multiple subgroups in a dataset, check parity between these subgroups
A statistical constraint is needed

What is group fairness?

How well did you know this?

Not at all

Perfectly

REVERSED

D = {X,S,Y} is a dataset
* X: the set of attributes that do not contain sensitive information regarding individuals
* S: the set of sensitive attributes containing sensitive information
* Y/Y*: either 0 or 1 is the original/predicted class label of individuals, which indicates the decision outcome
* G/G’: the values of the unprivileged/privileged group

What are the symbols used for defining fairness metrics?

How well did you know this?

Not at all

Perfectly

REVERSED

Divide the data into two equal time ranges
Calculate the average of the observations in each of the two time ranges. plot the average at the mid-point of each time range.
Draw a straight line between the two points

How does the semi average method work for finding the trend?

How well did you know this?

Not at all

Perfectly

REVERSED

PP and EO need original and model

DP, DI and consistency can be computed from either the original or the model

Which fairness metrics need the original dataset and the model?

How well did you know this?

Not at all

Perfectly

REVERSED

Naive forecasting
Simple mean
Moving average
Weighted moving average
Exponential smoothing

What are 5 methods for forecasting the trend?

How well did you know this?

Not at all

Perfectly

REVERSED

A smaller a makes the forecast more stable
A larger a makes the forecast more responsive

What do different values of a do for an exponential smoothing forecast?

How well did you know this?

Not at all

Perfectly

REVERSED

bias in the training datasets

Where does bias in algorithms come from?

How well did you know this?

Not at all

Perfectly

REVERSED

Forecasts are more accurate for aggregated data than for individual items
Forecast are more accurate for shorter than longer time periods

What makes demand forecasts more accurate?

How well did you know this?

Not at all

Perfectly

REVERSED

series which are measures of activities to specific dates e.g. retail, balance of payments

What is a flow series?

How well did you know this?

Not at all

Perfectly

REVERSED

Sensitive attributes should not affect the outcome labels
Identify “proxy” attributes that are related to the protected attributes

What is causal fairness?

How well did you know this?

Not at all

Perfectly

REVERSED

Collect more data - difficult in many domains
Delete data from the majority class
Create synthetic data
Adapt your learning algorithm (cost sensitive classification)
Random over/under sampling

What are 5 options for handling imbalanced data?

How well did you know this?

Not at all

Perfectly

REVERSED

Take the difference between a sample point and one of its nearest neighbours
Multiply the difference by a random number between 0 and 1 and add it to the feature vector

What are the steps of creating data with SMOTE?

How well did you know this?

Not at all

Perfectly

REVERSED

Balanced accuracy = (sensitivity + specificity)/2

What is the balanced accuracy measure?

How well did you know this?

Not at all

Perfectly

REVERSED

Pick a hash function h that maps each of the N elements to at least log2(N) bits
For each stream element a, let r(a) be the number of trailing 0s in h(a)
r(a) = position of first 1 counting from the right (including 0)
Record R = the maximum r(a) seen
Estimated number of distinct elements = 2^R

What is the Flajolet-Martin approach?

How well did you know this?

Not at all

Perfectly

REVERSED

a is small -> more weight for the past parameters
a is large -> more weight for the present trend

What do a high and low alpha represent in exponential smoothing?

How well did you know this?

Not at all

Perfectly

REVERSED

Synthetic minority over-sampling techniques (SMOTE)
Creates new data points from the minority class

What is SMOTE?

How well did you know this?

Not at all

Perfectly

# REVERSED EO states that instances from protected and unprotected groups should have equal true positive rate (TPR) and false positive rate (FPR) - P1 = P[Y\*(x) = 1 | S(x) = G’, Y(x) = 1] - P2 = P[Y\*(x) = 1 | S(x) = G, Y(x) = 1] - P3 = P[Y\*(x) = 1 | S(x) = G’, Y(x) = 0] - P4 = P[Y\*(x) = 1 | S(x) = G, Y(x) = 0] - For a classifier to be fair: P1=P2 and P3=P4

What is equalised odds difference?

# REVERSED measures of activity at a point in time e.g. employment

What is a stock series?

# REVERSED - Eliminate the discrimination from the final predictions - Change the predicted outcomes of classifiers by accessing a hold out set that was not involved in the training of the model

What is post-processing for mitigation?

# REVERSED -The instances in both protected(unprivileged) and unprotected(privileged) groups should have equal probability of being predicted as positive outcome DP(diff) = P[Y(x) = 1 | S(x) = G’] - P[Y(x) = 1 | S(x) = G] = approx 0 -This metric takes values between 0 and 1 where 0 is the optimal

What is Demographic Parity (DP) Difference?

# REVERSED The long term growth or decline of the series

What is trend?

# REVERSED Inductive hypothesis: after n elements, the sample S contains each element seen so far with probability s/n Inductive step: for elements already in S, the probability that the algorithm keeps it in S is:...n/n+1 So, at time n the tuples in S were there with probability s/n, then at time n+1 the tuple stayed in s with probability n/n+1, so the probability that a tuple is in S at time n+1 is (s/n)\*(n/n+1) = s/n+1

How do you prove that each element is picked with equal probability in reservoir sampling using mathematical induction?

# REVERSED \* A classifier is fair in terms of predictive parity if the probability that an example is positive in the original dataset given that it is predicted positive from both protected and unprotected groups is the same \* P[Y(x) = 1 | Y\*(x) = 1, S(x) = G] = P[Y(x) = 1 | Y\*(x) = 1, S(x) = G’]

What is predictive parity?

# REVERSED - Networks are difficult to converge - The goal is for generator and discriminator to reach some desired equilibrium but this is rare - GANs are yet to converge on large problems

What are 3 problems with GANs?

# REVERSED MSE = sum(yt = y\*t)^2 / T - T1 + 1 T=total number of samples in time series T1 = index of first value to be forecasted yt = actual value y\*t = predicted value

What is the formula for the MSE for testing forecast accuracy?

# REVERSED Random under-sampling: randomly delete data points from the majority class - problem with loss of information

What is random undersampling and its problem?

# REVERSED Classifiers try to reduce the overall error (increase the accuracy) so they can be biased towards the majority class

What is the imbalanced data problem?

# REVERSED - Semi-average - Moving average - Least-square - Exponential smoothing

What are 4 common methods for measuring the trend?

# REVERSED - Data enters at a high speed rate - The system cannot store the entire steam, but only a small fraction

What is a data streams model?

# REVERSED Pre-process the dataset only - try to transform the data so the underlying discrimination is removed

What is pre-processing for mitigation?

# REVERSED An individual fairness metric measures how similar the labels are for the similar instances in a dataset based on the k-neighbours of the instance -Takes values between 0 and 1 where 1 is the optimal

What is consistency?

# REVERSED historical bias in the decision variable less informative features biased data collection imbalanced representation of different demographic groups

What are reasons for biased data? (4)

# REVERSED a pattern of change that recurs regularly over time

What is seasonal variation?

# REVERSED - Trend - Seasonal variation - Cyclical variation - Irregular variation

What are the 4 components of time series?

# REVERSED Next period’s forecast = average of previously observed data Yt+1 = (Y1 + Y2 + … = Yt)/t

What is simple mean forecasting?

# REVERSED Next periods forecast = previous period’s actual Yt+1 = Yt

What is naive forecasting?

# REVERSED Given a set of points (xi, yi), find the best fitting line f(xi) = a + bxi such that SSE = sum (yi - f(xi)^2 is minimised

How does least squared liner regression work for finding the trend?

# REVERSED Cyclical variations have recurring patterns but with a longer and more erratic time scale compared to seasonal variations

What is cyclical variation?

# REVERSED - Huge columns of continuous data, possible infinite - Fast changing and required fast, real-time response - Random access is expensive - single scan algorithms

What are characteristics of a data streams model?

# REVERSED - An irregular (or random) variation in a time series occurs over varying (usually short) periods - It follows no pattern and is by nature unpredictable - Irregular variation cannot be explained mathematically

What is irregular variation?

# REVERSED Random oversampling: randomly duplicate data points from the minority class - problem with overfitting and fixed boundaries

What is random oversampling and its problem?

# REVERSED Must split the data into train/test sets and perform preprocessing on just the training data

What do you have to do when performing SMOTE?

# REVERSED - Keep the most recent k items - Upon the arrival of a new item from the stream, discard the oldest item

What is the sliding window model for data streams?

# REVERSED - Past history is used to flatten out short term fluctuations Sx = ay + (1-a)Sx-1 - Sx = the smoothed value for observation x - y = the actual observation at time x - Sx-1 = the smoothed value previously calculated for observation at time x-1 - a = the smoothing constant where 0 \<= a \<= 1

What is the forumla for exponential smoothing for finding the trend?

# REVERSED - Adjust the time series - Seasonally adjusted data = actual values / seasonal index \*100

How do you remove the seasonal effect?

# REVERSED - root mean squared error - mean absolute error (MAE) - tracking signal = sum(yt - y\*t)/MAE

What are 3 other measures for testing forecast accuracy?

# REVERSED 1. Determine the number of samples n 2. Allocate mid point in time and replace the time points by their corresponding x values by increasing and decreasing one unit from the mid point accordingly 3. The dependent variable is “y" 4. Compute sum(xi^2) and sum(xi\*yi), where sum(xi) is 0 5. Find y = a+bx where b = sum(xi\*yi)/sum(xi^2) and a = sum(yi)/n

What are the steps for finding the values of a and b for least squares linear regression?

# REVERSED regularly spaced peaks and troughs

How can you identify seasonality in a time series?

# REVERSED Fb = (1+B^2)(R\*P)/(B^2\*P + R)

What is the Fb measure?

# REVERSED estimate the counts in an unbiased way. Accept that the count may have a little error, but limit the probability that the error is large

What if you do not have the space to maintain the set of elements?

# REVERSED - The generator tries to mimic examples from a training dataset, which is sampled from the true data distribution. Does this by transforming a random source of noise received as input into a synthetic sample. The objective of the generative network is to increase the error rate of the discriminative network - The discriminator receives a sample, but it is not told where the sample comes from. It’s job is to predict whether it is a data sample or a synthetic sample. The objective of the discriminate network is to decrease the binary classification loss

What are the roles of the generator and discriminator in GANs?

# REVERSED - Individuals with similar features except the sensitive (protected) attributes must have the same/similar outcomes - A similarity/distance measure is needed - Requires strong assumptions regarding the relationship between features and the decision label

What is individual fairness?

# REVERSED Suffers from propagation error

What is a problem with exponential smoothing?

# REVERSED - Adjust/tune the classification algorithm - Applied during the model training

What is in-processing for mitigation?

# REVERSED - Maintain a sample size S of exactly s samples - Suppose at time n we have n items - Each sample is in the sample S with equal probability s/n

What is sampling a fixed sample size?

# REVERSED a set of observations measured at specified, usually equal time intervals

What is a time series?

# REVERSED -Also called rolling window Next periods forecast = simple average of the last k periods Yt+1 = (Yt-k+1 + Yt-k+2 + … + Yt) / k

What is moving average forecasting? What is another name for it?

# REVERSED Generative adversarial networks (GANs) -System of two neural networks (generator and discriminator) competing against each other in a zero-sum framework: improvement in one model come at cost to performance of other model Can learn to draw samples from a model that is similar to the original data

What is GANs?

# REVERSED - A smaller k makes the forecast more responsive - A larger k makes the forecast more stable

What do different values of k do for a moving average forecast?

# REVERSED Naive solution: generate a random integer in [0..9] for each query. store query if the integer is 0, otherwise discard Problem: as the stream grows. the sample size will also grow

What is sampling a fixed proportion? What is the problem with it?

# REVERSED - Simple average method - Take the average for each period (period mean) over at least 3 years - Express each value as an index by comparing it to the average of all periods over the same period of time (divide actual value by period mean to get index)

How do you calculate the seasonal index?

# REVERSED high degree of irregularity in original or seasonal-adjusted series or, abrupt change in the time series characteristics of the original data

What can cause the usefulness of trend estimates to decline?

# REVERSED - The discriminator becomes too strong too quickly and the generator ends up not learning anything - The generator only learns very specific weaknesses of the discriminator - The generator learns only a very small subset of the true data distribution

What are 3 of the ways that GANs can fail?

# REVERSED - Fairness through unawareness: deletes the sensitive attributes in a dataset - Preferential sampling (re-sampling): data objects are sampled with replacement - Massaging (relabeling): changes the actual class labels of some of the instances in the training set - Reweighing: assigns weights to each instance in the training set

What are 4 examples of pre-processing for mitigation?

# REVERSED Next periods forecast = weighted average of the last k periods with Yt+1 = c1Yt-k+1 + … + ckYt with c1+c2 … + ck = 1

What is weighted moving average forecasting?

# REVERSED - Based on the premise that if values in a time series are averaged over a sufficient period, the effect of short term variations will be reduced - The degree of smoothing can be controlled by selecting the number of cases to be included in the average - a 5-year moving average: for one year, get the average of the 2 previous years, current year and two ahead years. this is the average for that year. compute for each year and plot.

How does the moving average method work for finding the trend?

# REVERSED majority of the data coming from one class

What is imbalanced data?

# REVERSED -The ratio between the probability of protected and unprotected groups getting positive or desired outcomes DI(D) = P[Y(x) = 1 | S(x) = G] / P[Y(x) = 1 | S(x) = G’] -A dataset or a classifier is considered fair (by law) if its DI-ratio is between 0.8 and 1.25 (1 is the optimal)

What is Disparate Impact (DI) ratio

# REVERSED Next periods forecast = weighted average of the previous reading and the history Yt+1 = aYt + (1-a)Y\*t y\*t is the prediction for y\*t from exponential smoothing

What is exponential smoothing for trend forecasting?

Week 9: Time Series, Imbalanced Data & Fairness Flashcards

(77 cards)