Computing As Experiment Flashcards

1
Q

Reasoning

A

Sometimes experimentation works better rather than concrete mathematics, like machine-learning for example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Population

A

Set of items from which objects to test are derived from. We can denote this as Ω.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population examples

A
  • a set of permutations of <1,2,3,…,n>
  • a set of graphs with n nodes and m edges
  • a set amount of students attending a school
    We can focus on certain subsets of these, like male students or even permutations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sampling

A

Taking a random section of the population.
There could be many reasons, like limiting fake results or because the population is too large.
How do we then choose in a unbias way?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

P[X] and Unbiased Selection

A

Probability of each element being chosen, with each elements probability fulfilling this equation:
0 <= P[X] <= 1
P[X] = 1/Ω.
Lets say we had 100 students, each given probability of 0.01.
One student would be chosen with a probability of 1/100.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sampling

A

Taking a random section of the population.
There could be many reasons, like limiting fake results or because the population is too large.
How do we then choose in a unbiased way?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Random Variable

A

r : Ω -> R
Maps a population to a value.
This essentially means that all of the members of the population have some sort of random variable assigned to them, and we can pick from that instead of the probability. The random variable for students, for example, could be a class test score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Expected Value E[X]

A

Denoted as E[X], the expected value is:
Σ P[X] . r(X)
Essentially, the sum over all the probabilities of x multiplied by the random variables of x. This gives us what we expect the value to return when choosing a random entry in a population, or in other words, is the average we expect over a set amount of entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

E[X] of a fair die

A

We have the sum over:
1 . 1/6, with 1 being the number on the die (the random variable), and 1/6 being the probability of getting 1 (the probability)
2 . 1/6
3 . 1/6
4 . 1/6
5 . 1/6
6 . 1/6
The summation of all of these gives us 3.5, which is the expected value of rolling a fair die.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bias E[X] of a unfair die

A

What if we had different probabilities?
We simply incorporate this into our summations:
1/4 if X exists in {1,2,3}
1/12 if X exists in {4,5,6}
Then the summation of the top definition gives us (1 . 1/4) + (2 . 1/4) + (3 . 1/4) = 6/4 = 1.5.
The summation of the bottom gives us (4 . 1/12) + (5 . 1/12) + (6 . 1/12) = 15/12 = 5/4 = 1.25.
This gives us the expected value 1.5 + 1.25 = 2.75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Problems with Mean, Median and Mode

A

Normally, when we have all of these statistics in front of us, we see similarities. For examples, take this below example as scores from students:
- 10 score 20%
- 35 score 50%
- 25 score 60%
- 30 score 90%
The mode is 50%, the median is 60% and the mean is 61.5%.
However, messing with data can give us a massive distance between mode/median and mean:
Lets say we have 61 who score 25%, and 39 who score 100%.
The mean is roughly 54.25%, but the mode and median are way lower, giving us 25% for both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variance

A

Variance, or Var(X), is the summation of the values of the square of the random variables subtract the expected values. Essentially:
Σ (r(X_i) - E[X])^2
The higher the variance is, the most likely the data is spread out and the more likely the random variable deviates from the expected one, therefore demonstrating how impactful this could be on the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Exact Standard Deviation

A

This is essentially the square root of the variance, and quantifies how much the individual data points in a dataset deviate, on average, from the expected value of the dataset (same as variance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Estimated Standard Deviation

A

Since there can be a lot of variables and expected values going into this summation of variance/ExactSD, we can just take a sample instead of <y_1,y_2,…,y_n>, which these values equate to the random variables of <x_1, x_2,…,x_n> from the original saple, giving us the equation:

sqrt( (Σ (r(y_i) - E[y] )^2) / N)

This seems very hard to read, so lets break it down.
We first find out the random variable of y_1, and subtract it from the expected value of Y, since this does not change due to it being the expected value of the whole set of Y.
We square this answer, and then repeated for y_2, y_3 and so on…
We then take this answer, and divide it by N (changed later to N-1 in Bessel’s Correction). This is so we can reduce the impact of invalid results as heavily as it would do (we want a good idea of what the deviation is, not a completely chaotic number).
We then take all of this, and square root it like we would do in exact standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bessel’s Correction

A

Essentially, instead of dividing by N, we divide by N-1. This is explained further in data science.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Significance Testing

A

Lets say we have a predicted outcome X and an actual outcome Y.
We want to know if the chance of our prediction being accurate given the outcome is likely.
So we count the number of standard deviations by which Y differs from X. If there are too many, the hypothesis is not tenable.

17
Q

How do we determine a best fit for a certain line to represent points plotted on a graph?

A

The standard interpretation of “best fit” is “least squares”.
This essentially means when we have a function f: R -> R, we want to minimise this expression:
Σ (y_k - f(x_k))^2
In a way, when applying this function to a graph, it gives us an accurate representation of what the points look like via a curve.

18
Q

Using standard interpretation with line functions

A

Using the line function f(x) = mx+c, we can interpret that into our original equation, giving us:
Σ (y_k - (mx + c)^2
Which can be simplified to:
F(a, b) = Σ (y_k - mx - c)^2