Eksamen 2016 Flashcards

1
Q

Forklar independence

A

To variabler er uavhengige om de ikke påvirker hverandre, Altså, det å vite sannsynligheten for den ene endrer ikke sannsynligheten for den andre.
P(x|y) = P(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Forklar conditional independence

A

To variable, X og Y, er conditional independence gitt en tredje variabel Z, hvis gitt kunnskapen om når Z inntreffer, kunnskapen om når x inntrenger gir ikke noen sannsynlighet for at Y inntreffer. Altså, det at man finner en cavity er uavhengig fra om pasienten har tannpine, men både tannpinen og hendelsen å finne en cavity er avhengig av om tannen har en cavity eller ikke.

P(toothache, catch | cavity) = P(toothache | cavity) * P(catch | cavity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain why independence and conditional independence is useful when reasoning with uncertainty.

A

Independence and conditional independence are useful when reasoning with uncertainty because they reduce the complexity of the inference and the representation of the domain. Both independence and conditional independence enable the full joint probability tables to be reduced from exponential growth to linear growth, albeith full independene reduce the growth more. Independence allows complete set of variables to be divided into independent subsets and then the full joint distribution can be factored into separate joint distributions on those subsets. For example, the full joint distribution on the outcome of n independent coin flips, P(C1, …, Cn) has 2n entries, but it can be represented as the procuct of n single-variable distributions P(Ci). Hence, when independence assertions are available, they can help in reducing the size of the domain representation and the complexity of the inference problem.
Conditional independence also allows the full joint distribution to be divided into smaller tables that contain the conditional probability distributions. Because the probabilities sum to one the conditional probabilities can be further reduced, and instead of an exponential growth, O(2n), in probabilities, linear growth, O(n2k), can be achieved where k is the maximum number of possible values a variable can have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a transition model ?

A

Transition model (1p): The transition model specifies the probability distribution over the latest state variables given the previous values: P(Xt|X_0:t-1).

Med andre ord, vi antar at en transition fra en state til en annen er markoviaø, altså at the probability distribution of results given an action is only dependent on the agents current state and not its history. You can think of the transition model as a big three dimensional table for the function:
P(s’ | s,a)
Given the agent is in a state, s, and performs an action, a, it will reach a new state, s’, with the probability of P(s’ | s,a).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a sensor model?

A

Sensor model (1p): The sensor model specifies the the probability distribution over the evidence variables: P(E_t|X_0:t , E_0:t-1).

Med andre ord, sannsynligheten for å fange opp en state med sensorene man har.

Også kalt observasjons modell!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the Markov assumption for a second-order Markov model

A

Markov assumption for seond-order Markov model (1p): The current state depends only on only on the two previous states: P(X_t | X_0:t-1) = P( X_t | X_t-2, X_t-1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the sensor Markov assumption

A

Sensor Markov Assuption (1p): The evidence variables, E_t, depend only on the current state variables and not any of the history of state and evidence variables: P(Et|X0:t,E0:t-1) = P(Et|Xt)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hva er en Hidden markovian model?

A

En temporal probabilistic model hvor staten i prosessen er beskrevet av én single discrete random variable. De mulige verdiene til variabelen er de mulige statene i verdenen.
Eks: I regn og paraply eksempelet er det kun én state, regn, og derfor er det en HMM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In temporal inference, what four basic inference tasks do we have for temporal models?

A
  • Filtering ( Task of computing the belief state – the posterior distribution over the most recent state – given all evidence to date. This means computing: P(Xt|e1:t).)
  • Prediction (The task of computing the posterior distribution over the future state given all evidence to date. This means computing: P(Xt+k|e1:t).)
  • Smoothing (Task of computing the posterior distribution over a past state, given all evidence up to the present. This means computing: P(Xk|e1:t), for som 0 ≤ k < t.)
  • Most likely explanation (The task of computig the squence of states that is most likely to have generated a set of observations, given a sequence of observations. This means computing argmaxx1:t P(x1:t|e1:t).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hva er filtering?

A

her regner vi ut belief state gitt distribusjonen over de nyeste X_t, gitt alt beviset frem til nå. Det kalles også for state estimation.

Ex. Man regner ut sannsynligheten for regn i dag gitt alle tidligere observasjonene.
P(X | e_1:t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hva er smoothing?

A

Med smoothing regner man ut distribusjonen over en tidligere X_t gitt alt beviset frem til nå.

Ex: Sannsynligheten for at det regnet forrige onsdag gitt alle observasjonene frem til nå.
P( X_k | e_1:t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hva er prediction?

A

Regner ut distribusjonen over fremtidige X_t gitt alt bevis frem til nå.

Ex: Sannsynligheten for at det regner tre dager fra nå gitt beviset frem til i dag.
P(X_t+k | e_1:t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hva er most likely explanation?

A

Gitt sekvensen over observasjoner ønsker vi å finne den sekvensen av X_t som mest sannsynlig gav disse observasjonene.
Ex: Observerer vi at paraplyen tre dager på rad og ikke den fjerde er det mest sannsynlig at det regnet de tre første dagene og ikke den fjerde.
argMax_X1:t P(X_1:t | e_1:t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the steps in the CBR cycle?

A
  1. Input case (en beskrivelse av problemet uten løsning)
  2. retrieve (En prosess hvor tidligere caser lignende til input casen blir hentet fra case-base)
  3. Reuse(Løsninger fra tidligere lignende caser blir tilpasset problemet. Enten ved å kopiere løsningen fra det som ligner mest eller ved kompleks adapsjon regler for en kombinert løsning)
  4. Resolve(Valider den foreslåtte løsningen(bruker))
  5. Retain(Lagre casen I case-base)

Case-base = En database med tidligere problem løst i form av caser med en problem beskrivelse og løsning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A case is a central concept in CBR, and cases typically have two parts. Which parts are they?

A

Cases have a problem description and a solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Similarity is important in CBR. Explain what a similarity measure is and provide one or more examples.

A

Similarity measures are real valued functions that quantify the similarity between two objects.

Typically, the similarity is normalized between 0 and 1. The similarity can be computed between two cases (global similarity) or two features (local similarity) of the two cases. Cases could describe patients whos feature could be age and weight. Patient A has an age of 18 and weight 60kg while patient B has is 70 years old and weighs 80kg. How similar are the patients given that a similarity score of 0 is not similar at all and 1 is totally the same? Cases could be represented using the vector so that A=<18,60> and B=<70,80>.The global similarity between two cases could be computed as the weighted sum of the local similarity scores between all the features: sim(caseA, caseB)=∑wisimi(featureA,i, featureB,i), where i specifies the i-th feature of the cases and w is the weight assigned to a feature. Weights specify the importance of a feature in the similarity calculation. When we have nummeric attributes, we can use vector similarity measures, such as cosine similarity:
cos_sim(A,B) = A·B/(||A|| ||B||)= (18
70+6080)/(sqrt[182+602]sqrt[702802])= (1260+4800)/(62,64106,3)= 6060/6659=0.91.
The similarity between objects can be calculated using numeric functions, the distance between objects in graphs or trees, using neural networks or rules among many other possibilities.

17
Q

Explain the difference between training set, validation set and test set.

A

Training set: Data set used for training in supervised learning where each sample is comprised of of an input, xi, and and output, yi, and yi is generated by an unknown function y = f(x) which the machine learning method is to approximate.

Validation set: A part of the training set that can be put aside to evaluate the hypothesis while developing the hypothesis.

Test set: Data set that is distinct from the training and validation set and is used to test the accuracy of the hypothesis generated by the machine learning method.

18
Q

Explain the difference between hold out cross-validation, k-fold cross-validation and leave-one-out cross-validation.

A

Hold out cross-validation: Randomly splitting of the availbale data into a training set from which the learning algorithm produces the hypothesis h and a test set which is on which the accuracy of h is evaluated.

K-fold cross-validation: All examples can be used for both training the hypothesis h and testing it. Split data into k subsets. Then, perform k rounds of learning where i/k of the data is held out as a test set and the remaining data is used for training. The average test score of the k rounds should give a better estimate than a single source.

Leave-one-out cross-validation: The extreme version of k-fold cross-validation where k=n, and n is the amount of data samples.

19
Q

Explain the Chinese room thought experiment and what it is meant to illustrate. Please do not use more than two pages.

A

The Chinese room experiment: System: Human who only understand English, equipped with a rule book written in English, and various stacks of paper (some blank and some with indecipherable inscriptions). Human = CPU, rule book = program and stacks of paper = memory.

The system is inside a room with a small opening to the outside, through which indecipherable symbols written on paper appear. The human find matching symbols in the rule book and follows the instructions in the rule book, such as writing symbols on paper, rearranging the paper stacks and so on. Eventually, the nstrutions will cause one or more symbols to be transcribed on paper and sent out through the opening.

The system is capable of answering intelligent questions in Chinese without understainding the answers it gives.

What it is meant to illustrate: A system representing a program that passes the Turing Test, but which does not understand anything of its input and outputs.