Questions based on my summaries and notes Flashcards

1
Q

What is a type 1 error?

A

False positive, falsely rejecting the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a type 2 error?

A

False negative, false accepting the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the power say?

A

1-B(beta)=Power of the test, the chance we got a true positive. B stands for the false positive when we should have rejected the alternative hypothesis and we didn’t. If B is small the power is high and that means we likely got a true positive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a parametric test and when can it be used?

A

For example Chi2 test, t-test and regression analysis. It can be used when the data is normally distributed and that the data is homogenous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a chi2 test and when can it be used?

A

There are 2 different types of Chi2 test, the goodness of fit test and test of independance. For Chi2 test of independance:

H0=The two variables are independent
H1=The two variables are not independent(they are associated with each other)

This could for example be used to test whether two genes are linked or unlinked by looking at the frequency distribution of potential phenotypes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a t test and what does it analyze?

A

There are two types of t-test, one sample and two sample. The basis of a t-test is analyzing the mean. For example a two sample t-test you can analyze the gene expression mean between a control group and a patient group and see if they have the same mean for gene expression. For a one sample t-test you can analyze the the patient group has x as a mean, you compare it to a set value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a regression analysis and what does it analyze?

A

Simple linear regression is a statistical method you can use to understand the relationship between two variables, x and y.

One variable, x, is known as the predictor variable.

The other variable, y, is known as the response variable.

For example weight and height. You put the values of height and weight for a set of individuals into a scatterplot and find a regression line that fits best with these values. This regression line can then be used to predict the weight or height of a certain individual.

One way to measure how well the least squares regression line “fits” the data is using the coefficient of determination, denoted as R2. It states between 0-1 that will show to what percentage this model can explain the data. For example 0.77 explains 77% of the data with this model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some examples of non parametric analysis and when can it be used?

A

Mann Whitney U test, spearman etc. This can be used when the data is not normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the mann whitney u test analyze and when can it be used?

A

For small populations that are not normally distriobuted.

H0=The population are equal
H1=The population is not equal

This can for example be used to analyze if two different patient groups on different diets lose the same amount of weight or if they lose a different amount of weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between spearman and pearsson?

A

Both are correlation analysis, to see if two values are correlated or not. They both use a scatterplot to make a line but Pearson can be used for linear data and Spearman when there are extreme outliers or ranked data.

H0=There is no correlation
H1=There is correlation

Pearson: -1 there is perfect negative correlation, 0 there is no correlation at all, 1 there is perfect positive correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some exmaples of descriptive analysis and what does it do?

A

It summarizes and bisualize the data for example scatteprlot and PCA plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are a bivariate analysis and what are some examples?

A

It shows two variables relation to each other, for example Chi2 test and regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is multivariate analyzes?

A

It shows multiple varibales in relation to each other for example PCA and cluster analyzis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When are generalized linear models used?

A

When there is no normal distribution or the data is nonlinear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is E-value and score significance and what is acceptable values for these?

A

E-value shows the probability that we got the match by random chance.- A E value below 0.01 is considered good for homologous and below 1e-50 is a very good fit. Score significance shows how trustable the match is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is multiple sequence alignment and what is it used for?

A

When comparing at least three sequences to find homology through evulutonary conserved sequences through natural selection

17
Q

How would the histogram look for p values look if null hypothesis vs alternative hypothesis is true?

A

null hypothesis flat, 0.05 skewed to the right

18
Q

Steps and tools of RNA sequencing analysis pipeline:

A

Pre processing raw reads that are FASTA sequences

Filtering adapting and trimming with fast p

Quality control , FASTAQC

Back to preprocessing and filtering etc until we’re happy

Alignmeant against a refrence genome uaing HISAT2

Quantification- Gene count

19
Q

Downstream analysis steps and tools (done in R)

A

Normalization- log counts

Do PCA plot to see if we need to do batch correction

Visualization such as volcano plot and pathway analysis

Differential expression analysis when we have specific set list of genes to perform for example t test

Gene set enrichment to identify overrepresented genes

20
Q

False positives formula:

A

FP=(FP/FP+TN)

21
Q

What is the FWER?

A

Family wise error rate

Bonferroni, q-value, more strict

new p-value=original p values/number of statistical tests

22
Q

What is FDR?

A

Hochberg, adjusted p value, less strict

1.Rank p values from smallest to largest
2.Largest p value is FDR and first largest p value
3.The next p value gets the smallest of the following 2:

a)The previous adjusted p value
b)current p value x (total number of p value/ p value rank)

23
Q

Examples of high dimensional data:

A

PCA plot, MDS, SVI, k-means, tSNE, hierchal clustering

24
Q

When to use pathway enrichment and gene set enrichment:

A

Gene set enrichment is good at comparing pathways or gene sets to a normal control group.

Pathway enrichment is good to identify over or under expressed genes when we have no previous set of genes

25
Q

Examples of useful databases:

A

KEGG-database maps
GO-function of genes
Disigenet-Gene disease association database
Reactome-Biological pathways

26
Q

Measures of enrichment:

A

Effect size-strength of correlation between two variables foe example spearman

Odds ratio- Odds of event A/Odds of event B

Fold enrichment comparing frequency with control group or representative dataset to find over or under expressed genes

27
Q

Fishers test

A

2x2 contingency table

Kind of like a Chi2 test when one or more categories to determine if there is a strong associations between the groups

                 Group 1                      Group 2 Category 1  Category 2