Data Science Flashcards

Question

Explain aleatoric uncertainty

Answer 1

- **Inherent/intrinsic** randomness - **Irreducible**

Answer 2

- Model is **incomplete** - Unaware of **factors** affecting the system

Answer 3

- **Measurement uncertainty**: related to accuracy of the tool or method used - **Sampling uncertainty**: arises from representativeness and size of subset

Answer 4

- **Deterministic**: behaivours are **pre-determined** and always the same - **Stochastic**: **largely the same** with random components - **Random**: Usually **pseudo-random** as computers cannot produce true randomness

Answer 5

- Degree to which values are **arranged around the true value** - Lack of accuracy = **bias** (systematic error)

Answer 6

- Degree to which values are **close to each other** (high repeatbility) - Lack of precision = **variablity**

Answer 7

- **Random sampling**: random subset -** Systematic sampling**: select values at regular intervals - **Stratified sampling**: takes relative samples from different Strata - **Cluster sampling**: uses all data in a random cluster - **Weighted sampling**: assignes probabilities based on volume

Answer 8

- Sample with replacement to estimate distribution **Advantages**: - Used from small datasets - No assumption needed - Deal with non-normal data - Applied to any measureable quantity

Answer 9

- Classical: asymptotic distributions and frequentist probability - Uses computation to make decisions about data

Answer 10

- **Null H**: no differences and any differences are due to chance - **Alpha level**: probability level at which you consider difference to be real - **P value**: probability that both means came from the null distibution

Answer 11

- The z-score is **negative** if the observed proportion is **less** than the expected proportion

Answer 12

- Independantly sampled - Free from outliers - CLT applies, so empirical means are approx normal

Answer 13

- **Joint probability**: intersection between x and y - **Conditional probability**: probability of x given y

Answer 14

- θ is the parameter - **Posterior**: P(θ |Data) Updated belief about θ after seeing data - **Likelihood**: P(Data|θ ) Probability of seeing data given θ - **Prior**: P(θ ) Our belief about θ before we aquired the data - **Normalisation**: P(Data) The evidence/ probability of observing

Answer 15

Posterior = (likelihood x prior)/ evidence

Answer 16

- A parameter is a number that defines how a probability distribution behaves - Parameters of **normal**: mean and variance - Parameters of **binomial**: success probability and number of trials

Answer 17

- Maximum likelihood estimation - The parameter that maximises the likelihood **Properties**: - Not bayesian - Widely used - Returns single best estimate - In small dataset may be no successes

Answer 18

1. Take log of likelihood 2. Differentiate with respect to theta 3. Solve derivative 4. Second derivative must be negative (max likelihood)

Answer 19

- Maximum Aposteriori Estimation - Find parameter that maximises posterior **Properties**: - Bayesian version of MLE - Incorporates prior - More probable paramter after seeing data - Uses laplace to ensur dist is never 0 or 1

Answer 20

- **Laplace's method**: approximates **sharply peaked posterior** by normal centered at MAP estimate - **Markov chain monte carlo (MCMC)**: sample from complex posterior distribution using **Metropolis-hastings algorithm**

Answer 21

1. Compute **unnormalised** posterior (likelihood x prior) 2. Find log: L(θ)=log(Likelihood)+log(Prior) 3. Find MAP estimate such that L′(θ∗)=0 4. Compute L''(θ∗) 5. Use **Taylor Expansion** formula for normal approximation (mean and var.)

Answer 22

- **Mean**: peak θ∗ = MAP estimate = mode - **Variance**: inverse of second derivative of log(posterior) = 1/L''(θ∗)

Answer 23

- Generates markov chain that eventually converges to the posterior - Using **metropolis-hastings algorithm**

Answer 24

- To generate samples from posterior where exact computation is difficult

Answer 25

- Only current state - Transitions are memory-less

Answer 26

1. Parameter space θ 2. Unnormalised posterior 3. Proposal distribution T(θ′∣θ) 4. Accept/reject based on posterior ratio

Answer 27

- Propose new θ′ using 𝑇(𝜃′∣𝜃) - Accept 𝜃′ with probability based on posterior ratio - If accepted, next θ is θ' and if not θ stays the same

Answer 28

- **Examples**: Observations, data points/entries - **Features**: Independant variables or predictors - **Class/label:** Dependant variables/ outcomes being predicted - **Inputs/Outputs**: Features are inputs, results are outputs

Answer 29

- **Supervised learning**: Uses labeled input-output pairs for training - **Reinforcement training**: Agent learns by recieving special rewards/penalties - **Unsupervised learning**: Finds patterns in unlabelled data with no outputs

Answer 30

- Uses examples with **known inputs** for training - Algorithm maps inputs to correct outputs - Used for **classification** and **regression** - Outputs could be labels, probabilities or predcitions

Answer 31

- Agent learns by **interacting** with an environment - Agent recieves **rewards** or **penalties** for actions - Learns a policy mapping **states to actions** - **Maximises** cumulative rewards over time

Answer 32

- Works with **unlabelled** data (no pre-defined outputs) - Discovers hidden patterns - **No prior knowledge** or labeled data is used for learning

Answer 33

- **Classification**: Assigns input to specific category - **Regression**: Predicts a continuous value based on input

Answer 34

- The ability for a model to perform well on new data - Tested using seperate test set - Helps **avoid overfitting** to the training set - Ensures **unbiased** evaluation

Answer 35

Training set: - Used to **train** the model + learn patterns - Can be overly **optimistic** Testing set: - **Evaluates** models performance on unseen data - Provides **unbiased** estimate of accuracy

Answer 36

- **Classification accuracy**: Percentage of correct predictions - **Misclassification error**: Percentage of incorrect preditcions - **Confusion matrix** helps visualise and evaluate these

Answer 37

- Mean squared error (**MSE**) - Average **squared difference** between the predicted and actual values - Always on **test set**

Answer 38

- **Discriminative approach**: Focusses on decision boundaries between classes - **Generative approach**: Models how data is generated for each class

Answer 39

- **Posterior**: updated probability of a class after observing data - **Likelihood**: probability of observing data given a class - **Prior**: initial belief about the probability of each class - **Evidence**: overall probability of data across all classes

Answer 40

1. Estimate **priors** based on class frequency 2. Model feature **distributions** 3. Compute **likelihood** for new data point 4. Apply Bayes' rule to compute **posterior**

Data Science Flashcards

(64 cards)