Chapter 4 Flashcards

Question 1

Q

feature engineering time domain:

> what is lambda?

> what does lambda = 1 mean?

Answer

A

lambda is the window size: expresses the amount of discrrete time steps considered

> lambda = 1: consider two instances, the current instance and instance before

Question 2

Q

time domain: categorical data

> what are two types of temporal patterns?

Answer

A

succession - one before the other
co-occurence - occur at same time point

Question 3

Q

time domain: categorical features

> what is meant by the “support” of the temporal patterns in our data

> how to compute it?

Answer

A

support: how often the pattern occurs in the data compared to the number of time points in our data

> for all instances, check whether the pattern occurs within the selected window size

Question 4

Q

time domain:categorical

> how to generate valid patterns?

> why is this useful/valid?

Answer

A

define minimal support threshold theta
generate all possible patterns of size 1 that meet theta
iteratively extend possible patterns by 1 until desired size k

> this much more efficient than simply checking all possible combinations

> the support of a new k-pattern can never be greater than the support value of the least supported subpattern it includes

Question 5

Q

calculating support: what to do with the first lambda instances?

Answer

A

ignore the first lambda instances, as they do not have sufficient history available

Question 6

Q

feature engineering: time domain

> how to handle mixed data?

Answer

A

derive categorical features from numerical features: two methods:

if certain ranges are known (low, normal, high)
if no such information is available, calculate slope

> if slope above certain threshold: increasing/decreasing

> else stable

Question 7

Q

FT: what is the base frequency?

Answer

A

base frequency:

f0 = 2*pi / lambda +1

> lambda + 1 is the number of data points we consider

> 2*pi is one full sinusoid period

> base frequency is the lowest frequency that can fit a whole period into our window

Question 8

Q

FT: why do we need lambda +1 frequencies to represent our original sequence?

Answer

A

0 * f0

1 * f0

…

lambda * f0

lambda +1 * f0

>>> starts at zero…

Question 9

Q

FT: which kinds of features can we derive from FT?

Answer

A

frequency domain features:

amplitude: frequency with highest amplitude describes the most important frequency in the considered window
frequency weighted signal average: weighted average frequency within considered window
power spectral entropy: describes how much information is contained within the signal

> whether there are one or a few discrete frequencies standing out of all others

Question 10

Q

unstructured data: preprocessing steps

> 4 steps in order to extract attributes from words

Answer

A

tokenization

> identify sentences and words within sentences

lower case

> change uppercase to lowercase

stemming

> identify stem of each word to reduce words to their stem

> map all different variations to a single term

stop word removal

> remove known stop words as they are not likely to be predictive

Question 11

Q

explain bag of words

Answer

A

bag of words:

define n-grams of words (unigrams, bigrams etc)
count number of occurences for each n-gram in text irrespective the order of appearance
value for new attribute is number of occurence for that text

> can be binary with only true (occuring in text) and false (not occuring in text)

Question 12

Q

explain TF-IDF

(term frequency inverse document frequency)

Answer

A

do bag of words >>> term frequency in document = a
normalize: divide the total number of instances by the number of instances that contain the n-gram = idf

> the higher the number, the more unique the n-gram is

compute a*idf = tf_idf

> n-grams that are unique are weighted more

> this avoidy very frequent words to become dominant in our attributes

Question 13

Q

explain topic modeling

Answer

A

topic modeling: extract more high level topics from text

assume W words (generated by poisson) and a distribution over topics
for each word in W select a topic based on the probabilities
for each word assume that its topic is wrong but the other topics are correct
probabilistically assign word w to a topic based on

> what topics are in document

> number of times word w assigned to particular topic

repeat

>>> create on attribute per topic and assign a value based on the observed frequencies of words and weights assigned to the words for the topic

Question 14

Q

why are overlapping windows an issue?

> solution?

Answer

A

overlapping windows are of course highly correlated

> each window differs just in one point from adjacent instances

> this is likely to cause overfitting

solution: set a maximum overlap for windows and remove instances for which this criterion is not met

(typically 50% overlap is allowed)

Chapter 4 Flashcards

(14 cards)