Chapter 4 Flashcards

1
Q

feature engineering time domain:

> what is lambda?

> what does lambda = 1 mean?

A

lambda is the window size: expresses the amount of discrrete time steps considered

> lambda = 1: consider two instances, the current instance and instance before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

time domain: categorical data

> what are two types of temporal patterns?

A
  1. succession - one before the other
  2. co-occurence - occur at same time point
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

time domain: categorical features

> what is meant by the “support” of the temporal patterns in our data

> how to compute it?

A

support: how often the pattern occurs in the data compared to the number of time points in our data

> for all instances, check whether the pattern occurs within the selected window size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

time domain:categorical

> how to generate valid patterns?

> why is this useful/valid?

A
  1. define minimal support threshold theta
  2. generate all possible patterns of size 1 that meet theta
  3. iteratively extend possible patterns by 1 until desired size k

> this much more efficient than simply checking all possible combinations

> the support of a new k-pattern can never be greater than the support value of the least supported subpattern it includes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

calculating support: what to do with the first lambda instances?

A

ignore the first lambda instances, as they do not have sufficient history available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

feature engineering: time domain

> how to handle mixed data?

A

derive categorical features from numerical features: two methods:

  1. if certain ranges are known (low, normal, high)
  2. if no such information is available, calculate slope

> if slope above certain threshold: increasing/decreasing

> else stable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

FT: what is the base frequency?

A

base frequency:

f0 = 2*pi / lambda +1

> lambda + 1 is the number of data points we consider

> 2*pi is one full sinusoid period

> base frequency is the lowest frequency that can fit a whole period into our window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

FT: why do we need lambda +1 frequencies to represent our original sequence?

A

0 * f0

1 * f0

lambda * f0

lambda +1 * f0

>>> starts at zero…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

FT: which kinds of features can we derive from FT?

A

frequency domain features:

  1. amplitude: frequency with highest amplitude describes the most important frequency in the considered window
  2. frequency weighted signal average: weighted average frequency within considered window
  3. power spectral entropy: describes how much information is contained within the signal

> whether there are one or a few discrete frequencies standing out of all others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

unstructured data: preprocessing steps

> 4 steps in order to extract attributes from words

A
  1. tokenization

> identify sentences and words within sentences

  1. lower case

> change uppercase to lowercase

  1. stemming

> identify stem of each word to reduce words to their stem

> map all different variations to a single term

  1. stop word removal

> remove known stop words as they are not likely to be predictive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

explain bag of words

A

bag of words:

  1. define n-grams of words (unigrams, bigrams etc)
  2. count number of occurences for each n-gram in text irrespective the order of appearance
  3. value for new attribute is number of occurence for that text

> can be binary with only true (occuring in text) and false (not occuring in text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

explain TF-IDF

(term frequency inverse document frequency)

A
  1. do bag of words >>> term frequency in document = a
  2. normalize: divide the total number of instances by the number of instances that contain the n-gram = idf

> the higher the number, the more unique the n-gram is

  1. compute a*idf = tf_idf

> n-grams that are unique are weighted more

> this avoidy very frequent words to become dominant in our attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

explain topic modeling

A

topic modeling: extract more high level topics from text

  1. assume W words (generated by poisson) and a distribution over topics
  2. for each word in W select a topic based on the probabilities
  3. for each word assume that its topic is wrong but the other topics are correct
  4. probabilistically assign word w to a topic based on

> what topics are in document

> number of times word w assigned to particular topic

  1. repeat

>>> create on attribute per topic and assign a value based on the observed frequencies of words and weights assigned to the words for the topic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

why are overlapping windows an issue?

> solution?

A

overlapping windows are of course highly correlated

> each window differs just in one point from adjacent instances

> this is likely to cause overfitting

solution: set a maximum overlap for windows and remove instances for which this criterion is not met

(typically 50% overlap is allowed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly