Lecture 8 - Non-linear plots Flashcards
change in expression is also known as
variation in expression
Based on this image, assume that the distance (variation) between the naive and transplant 2Hon the y-axis is 24% and the x-axis distance (variation) is 15%. What would the overall variation between the naive and transplant 2H be?
The Pythagorean theorem states it would be about 28.3%
Here, we see that PC1 and PC2 account for _____% of all of the variation. The 3rd, 4th, etc PCs would have to add up to be less than _____% so we would be safe to assume that the
differences we see in this plot would be _______ (likely to/unlikely to) reflect the differences between the tissue types
a) 20.3+68.1 = 88.4%
b) 100 - 88.4 = 11.6%
c) likely
Match the following with the colours shown based on the PCA results
1) 75%
2) 39%
3) 63%
yellow = 2
green = 1
pink = 3
*based this on which ones appear the most distinct from each other
If a PCA has 3 dimensions the plot will contain a y-axis, x-axis, and a ______. each axis represents a different _____ (gene/cell)
z-axis, cell
answer the following pertaining to PCA
a) PCA stands for?
b) what type of approach does it have
c) Reduces dimensions by _____ on the variance in each dimension (minimizes/maximizes)
d) identifies key ____ that influence tissue types (genes/cells)
e) What type of biological processes does it identify?
a) principe componenet analysis
b) linear + unsupervised
c) maximizes
d) genes
e) differentiation
PCA is an unsupervised linear approach.
a) unsupervised?
b) linear?
a) it means that you are comparing all the components with each other and not one to another individually
b) measure the distance/variance between genes expression using lines not curves
why does scRNA-seq not just use PCA?
it requires a comparison of multipe differnt cell types and genes expressions at different times which is too complex for the linearness of PCA
match the following to
a) linear
b) nonlinear
a) A
b) B + D
which of the following are similar
I. PC
II. tSNE
III. UMAP
a) I and III
b) II and III
c) I and II
d) I, II, and III
e) none, they are all distinct
b
match the following
a) PC
b) tSNE
a) A
b) B
Non-linear diffusion models
a) what does it emphasize in the data?
b) useful for ______ of continuous processes such as ________
c) “Each dimension highlights the heterogeneity of a different
cell population” –> what does this mean
d) used for ______ and _______
e) typically rely on the ____ of dimensions first (addition/reduction)
a) transitions –> seeing a big difference in the spaces between the clusters of cells
b) visualization, differentiation
c) each dimension shows the variation (heterogeneity) between the different subpopulations (clusters)
d) exploration and visualization
e) reduction
T or F - the number used on the axis of tSNE plots are arbitrary
T - This plot is just meant for visualization of difference inexpression
which plot uses percentage variation as its axis?
PC
T or F - Non-linear diffusion models such as tSNE and UMAP are used to help with exploration, visualization, and for determining events
F - not used for determinging events, it cannot state whether one population is derived from another population on the plot
non-linear diffusion models are not used for determining events - what does this mean?
it means that it cannot tell you whether one of the populations (clusters) shown is derived from another population or not
t-SNE
t-distributed stochastic neighbour embedding
T or F - while PCA is unsupervised and linear, t-SNE is unsupervised and nonlinear
T
T or F - while PCA is unsupervised and linear, t-SNE is supervised and nonlinear
F - tSNE is also unsupervised
t-SNE calculates a similarity measure between a pair of instances in the high dimensional space and in the low dimension space
a) high dimensional space?
b) low dimensional space?
a) gene by gene comparsion
b) PCA
T or F - genes that are found to be similar to each other have a higher cost
T
Which of the following would result in a negative cost
a) similar genes
b) distinct genes
b
What allows tSNE to exaggerate differences between cell population and overlook potential connections between pop?
the cost function
T or F - in tSNE you will never get the same image twice
T