Final Prep COPY Flashcards

2nd Half Semester Information

1
Q

<p>Unsupervised Learning</p>

A

<p>Make Sense of Unlabeled Data
Data Description - find a more compact way to describe it

Versus Function Approximation of Supervised Learning with Labeled Data</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

<p>Basic Clustering Problem</p>

A

<p>Given:
Set of objects X
Inter object Distances D(x,y)
which is equal D(y,x)

Output:
Partition Pd(x) = Pd(y)
this is the "partition function"
which outputs a compact descriptor of all equivalent data
</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

<p>Single Linkage Clustering (SLC)</p>

A

<p>A space of n objects all distances from each other

The two closest points are connected
To be SLC it must be closest
And then the closes clusters are merged

Until only k clusters are achieved

Distances only matter between un-linked items</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

<p>Hierarchical Agglomerative Cluster Structure</p>

A

<p>Used to represent linkage cluster derived from SLC</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

<p>Running Time of SLC</p>

A

<p>for n points
with k Clusters
O(n^3)
n^2 from the distances and n/2 to link the closest one
find the
Similar to spanning tree problem
</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

<p>k-means clustering algorithm</p>

A

<p>-pick k "centers" (at random)
-each center claims center
-recompute the centers by averaging clustered
-repeat until converged
-this center could be a point in the collection or a point in the space. This is almost like creating kNN</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

<p>Euclidean space</p>

A

<p>In geometry, a two- or three-dimensional space in which the axioms and postulates of Euclidean geometry apply; also, a space in any finite number of dimensions, in which points are designated by coordinates (one for each dimension) and the distance between two points is given by a distance formula.</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

<p>k-means Proof</p>

A

<p>P(x) : Partition/Cluster of object x
Ci: set of all points in a cluster of i
center = sum(y)/(Ci)

P(x) -> center -> P(x)
</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

<p>K-Means as Optimization</p>

A

<p>Configurations(inputs) - center, P

Scores - the further from the center increases the error of that object E(P, center) = Sum||center - x||^2

Neighborhood-where the P are set and the center changes or the center is set and the P changes
</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

<p>Randomized Optimization most like K-Means</p>

A

<p>Hill Climbing
You are taking steps towards a configuration better than the configuration before. We are finding Neighbors slightly better than before</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

<p>Properties of K-means clustering</p>

A

<p>-Each Iteration is polynomial O(kn)
-finite (exponential) iterations O(k^n)
different ways of assigning partitions.
in practice this is short as there are limited ways
- Error Decreases (if ties broken consistently)
- can get stuck
</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

<p>Stop K-means stuck mitigation</p>

A

<p>-Random Restarts
- Initial review of data to find centers far apart</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

<p>Equal Distance Points between Clusters</p>

A

<p>for points equal distance between two clusters. It will sometimes be assigned to either cluster
</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

<p>Soft Clustering</p>

A

<p>Assume the data generated by
1) Select one of k Gaussians with known variances
2) Same xi from that Guassian
3) Repeat n times

Task- Find a hypothesis h = </p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

<p>Maximum Likelihood Gaussian</p>

A

<p>- The ML mean of the gaussian is the mean of the data
- does not apply for k different means, just for one
- the k different means is based on assigning each point hidden variables to determine what gaussian it is a part of</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

<p>Expectation Maximazation</p>

A

<p>Tick tok between expectation (definze z from mu) to maximazation where you define ,u from z</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

<p>Properties of EM</p>

A

<p>-monotonically non- decreasing likelihood
- It always moves to something better
- Has a change to not converge (practically does)
- Will not diverge
in k means there is change to diverge
- can get stuck
can make a local optima where it finds a good assignment but we need ot fix with random restart
- works with any distribution
domain knowledge for E and M stop based on data</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

<p>Clustering Properties</p>

A

<p>- Richness
all inputs and any way of clustering is possible
- Scale In-variance
doubling or halving distances does not changing the way the clusters should be determined
- Consistency
shrinking the point to point and expanding the cluster to cluster distance will not change the clustering. This is an application of domain knowledge.</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

<p>Impossibility Theorem</p>

A

<p>Richness/Scale In-variance/ and Consistency cannot all be true at the same time</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

<p>Feature Selection</p>

A

<p>Two Reasons:
1) Knowledge Discovery
Make it able to interpret-ability and Insight
2) Cure of Dimensionality
As you add more data you need exponential to the number of features you have</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

<p>How hard is it to reduce a problem from N features to m features</p>

A

<p>This is an exponential hard problem. 2^N or (n m) which is n choose m. This is NP-hard. This can match the 3 set np-hard problem. </p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

<p>Filtering- Features</p>

A

<p>Features with some type of search algorithm then to a learning algorithm. This is flow forward, but with no feedback. It can look at the labels, so you could use some type of thing like information gain. You could even use DT as the filterer.
Pro
Speed
Cons
Look at features in isolation, but maybe it is important when looked at with something else
Ignores the learner.</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

<p>Wrapping - Features</p>

A

<p>Take in the features, searches over a subset then goes to the learning algorithm. Then updates the search with those scores. The advantage being
Pro
Takes into account the learner and model bias
Con
Very slow since the speed of the learner is also included in the time.</p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

<p>Filtering Implementation</p>

A

<p>Information Gain
Variance, Entropy
GINI Index
NN where we remove low weights
"Useful" Features
Indepedent / Non_redundant</p>

25

Wrapping Implementation

Hill Climbing but taking the return scores from the learner Randomized Optimization in general can be done Forward Search

26

Forward Search

Search through the entire set and find the best, then try in combination with each of the other features, and continue until the increase in score plateau

27

Back Ward Search

Try all combinations of n-1 features. Then n-2. Then stop at a point when the subset does pretty well. And the error dramatically increases.

28

Relevance

xi is strongly relevant if remove it degrades Bayes Optimial Classifier xi is weakly relevent if it is not strongly relevent and some subset of features S such that adding xi to S improves BOC otherwise xi is irrelevant

29

Bayes Optimal Classifier

Takes weighted average of all hypthosis BOC is best you can do on average if you can find it

30

Relevance vs Usefulness

Relevance measures effect on BOC Usefulness Measures effect on a particular predictor Relevance aka information Usefulness aka Error given Model/Learner

31

Feature Transformation

Pre Process a set of features, to create a new set of features. THis should be smaller or more compact. Retain the relavent and useful information.

32

Polysemy

Words mean many things, false positives

33

Synonomy

The same thing can be represented by different words, false negatives

34

Principal Components Analysis

Allows transformation intoa new space that can be used for feature selection, by looking at the eigenvalue. For instance anything with an eigen value of 0 would be removed. THis finds correlation, maximizng variance and allows reconstruction. I need to research more on this.

35

Principal Components Analysis

Allows transformation into a new space that can be used for feature selection, by looking at the eigenvalue. For instance anything with an eigen value of 0 would be removed. This finds correlation, maximizing variance and allows reconstruction. Mutually Orthogonal Maximal Variance Ordered Features Bag of Features (because Ordered Features are a type of bag of features) I need to research more on this.

36

Independent Components Analysis

Independence of features. Creates new features from each base features through linear transformation that allows all the new features to be statistically independent. Mutual information of zero Mutually Independent Maximal Mutual Information Bag of Features

37

Random Components Analysis

Generates Random Directions and project the data out to these. It works well if the next thing is some type of classification. The projection is lower than N but not as much as PCA. The big advantage is speed.

38

Linear Discriminant Analysis

Finds a projection that discriminates based on the label. Project into clumps or clusters.

39

Three Types of Learning

Supervised Learning: y = f(x) Like function approximation Unsupervised Learning: f(x) Clustering/Description Like Description Reinforcement Learning: y = f(x), z A lot like function approximation with the added z

40

Markov Decision Process

State: S Model: T(s,a,s') ~ Pr(s'|s,a) Actions: A(s), A Reward: R(s), R(s,a) , R(s,a,s') ----------------------------------------- Policy: pi(s) -> a

41

Markov State

Is the possible representations inside of the world

42

Markov Action

Things that a state can do, that when you are in a state that you can take

43

Markov Model

Given state, state prime, and action it gives you the probability of state prime given state and action

44

Markov Property

Only the present matters Pr(s'|s,a). Only the most recent state matters You can trick this to that the current state remembers everything it needs to know

45

Markov Reward

R(s), R(s,a) , R(s,a,s') There are several ways to look at rewards. But might be better to think one way or another. We will focus on R(s). A reward for a certain state.

46

Markov Policy

pi(s) -> a, a function that takes in a state and and gives you an action you should take. A "command". pi* is an optimized reward you should take to maximize reward.

47

Temporal Credit Assignment Problem

This is the problem of assigning blame/credit of each move in a sequence when only the final state matters

48

Utility of Sequences

If one sequence is of higher utility then all subset of the sequences are also of higher utility This can be thought of as the sum of the rewards of each state

49

Utility of State

Is the reward for that state, but also all the reward we could theoretically get for the rest of the sequence based on the policy.

50

Finding the Policty

1) Start with a guess at the policy 2) Evaluate the policty by calcuating the utility with that policy 3) Improve the policy based on the utility function because there is no max here, now this is a linear equation solve

51

Finding the Policy (Policy Iteration)

1) Start with a guess at the policy 2) Evaluate the policy by calculating the utility with that policy 3) Improve the policy based on the utility function because there is no max here, now this is a linear equation solve

52
53
Modeler
Takes in Transistions and puts out Models
54
Simulator
Takes in Model and Makes Transistions
55
Learner
takes trancitions and puts out a policy
56
PLaner
Takes in a model and puts out a policy
57
Model Based Reinforcement Learning
Transistions to Modeler to Planner to Policy
58
RL Based Planner
Model to Simulator to Transistions to Learner to Policy
59
Three Main Approaches to RL
Policy Search - States are given and policy Derived Value Function Based - Takes states to determine utility, states mapped to values Model Based - (fairly direct learning) States and Actions take out