3. Visual Healthcare Analytics Flashcards

Question 1

Q

what is parallel coordinate

Answer

A

visual of multidim data
given NxM table with N patients and M clinical var, a line chart is generated by displaying M equally spaced vertical axes with individual ranges
can then filter and explore correlations between vars for patients

Question 2

Q

draw a parallel coordinate graph

Question 3

Q

what is a chord visualisation plot

Answer

A

a diagram illustrating the connection between different variables

Question 4

Q

draw a chord visualisation plot

Question 5

Q

what are 2 dim reduction algos

Question 6

Q

what is t-SNE

Answer

A

an algorithm that calculates similarity via the high dim and low dim space. it computes the distance between instances in both spaces and tries to optimise these similarity measures using a cost function

Question 7

Q

what is pca

Answer

A

unsupervised linear dimensionality reduction
visualisation technique

Question 8

Q

how is pca calculated at a high level (variance)

Answer

A

the greatest variance by some scalar projection becomes the first PC (coordinate), and the second greatest becomes the second PC etc etc.

Question 9

Q

how can we determine the amount of preserved information after pca is applied

Answer

A

variance - compare that of all original dimensions, with the variance of the reduced dimensions

Question 10

Q

after performing pca, two variables have a variance of 1.46 (A), and 0.2 (B). the entire dataset itself has a variance of 2.06, what can be inferred by this

Answer

A

we can infer that variable A alone can explain most of the information of the output predicted by the two variables

Question 11

Q

how is pca calculated

Question 12

Q

how are the number of pca dimensions measured

Answer

A

consider using a scree plot. it shows the variance explained by each PC, based on the number of PCs used

Question 13

Q

draw a scree plot

Question 14

Q

when should we not use PCA

Answer

A

pca is linear because when calculated, it is projected as a linear vector [finish this]

Question 15

Q

what to use instead of pca

Answer

A

if it’s not a linear transformation, consider distributed stochastic neighbourhood embedding (t-SNE)

Question 16

Q

how is t-sne calculated

Answer

Study These Flashcards

A

for every point in 2D space, centre a Gaussian distribution over that point (this denotes the similarity by probability, of the two points)
repeat step 1, this time with a Cauchy/student t-distbn (these are the set of prob. for the low dim space)
now map the high dim and low dim using a loss function to minimise the distance using Kullback-Liebler divergence (KL)
finally, use gradient descent to minimise the KL cost function