Past paper Flashcards
(19 cards)
What are Supervised tasks ?
Supervised tasks use labels/targets
What are unsupervised tasks ?
Unsupervised do not use labels/targets
What is a silhouette coefficient?
it is a performance evaulation metric for when we dont have labelled data
Describe the process of the silhouette coefficient ?
Value ranged from -1 to +1
-1 indicated incorrect clustering +1 is dense clusters
What is the name of the phenomenon of a model becoming less accurate over time ?
Data/ concept drift
Performance decay
What are the pros and cons of reduced data set dimensionality ?
+Removes noisy information
-Removing information can affect model performance
+Lower dimensionality allows some algorithms to run faster
What is normalisation ?
Process of scaling features without affect the distribution
It scales each feature from 0-1
What is a sample ?
A singular observation
What is a feature?
singular measured attribute
What is top down development ?
Applies previous knowledge via rules/choices
What is bottom down development ?
Builds knowledge from observed data, the model learns the rules
What are the first 2 steps of PPDAC?
Problem and plan
What are the final 3 steps of PPDAC?
data
analysis
Conclusion
What is a non directional hypothesis?
Does not state how the independent variable affects the dependent variable
“if a car is red, its speed is affected”
What is a null hypothesis?
States there is no relation between either the dependent or independent variable
What distribution pattern is symmetric with the mean and median close ?
Normal distribution
What is optimization ?
To find the best option from a set of alternatives. Often trying to find the global optimum
K-means clustering algorithm ?
.Scale the data
.Choose a K
.Initialise centroids
.Associate data points to nearest centroid
.Update by calculating mean of the allocated points
.Repeat association and update until convergence rule is met