Bioninformatics block test 2 Flashcards

1
Q

What is variant calling?

A

Variant calling refers to how after read alignment we identify positions within a sample that are variable relative to the reference genome used in the allignment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What must be done before variant calling can be carried out?

A

Duplicates must be removed from the SAM/BAM files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 4 major types of duplicates?

A

PCR duplicates.
Clustering duplicates.
Optical duplicates.
Sister duplicates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are PCR duplicates?

A

PCR duplicates arise due to more than 1 copy of the same fragment annealing to the surface of the flow cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Clustering duplicates?

A

Clustering duplicates are unique to Illumina and arise during cluster generation when a single library spreads across 2 adjacent tiles on the flow cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are optical duplicates?

A

Optical duplicates also arise during cluster generation, the base caller software reads a single cluster of reads as two, across two adjacent tiles on the flow cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are sister duplicates?

A

Sister duplicates occur when both strands of the same library fragment anneal to the surface of the flow cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is it important to mark and remove duplicates for variant calling?

A

It is important to remove duplicates for variant calling as duplicates artificially inflate sequencing depth, resulting in homozygous positions being called heterozygous and heterozygous positions being called homozygous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 types of variant callers?

A
  1. Copy Number Variant (CNV) callers.
  2. Structural Variant (SV) callers.
  3. Somatic callers. (somatic mutations)
  4. Germline callers (inherited traits)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two ways in which variant calling can be preformed?

A
  1. Single sample calling: Per individual.
  2. Joint calling: Using information from multiple samples at a time.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the varient caller pipeline?

A

Variant caller pipeline:

  1. Take SAM/BAM file as input.
  2. Mark and remove duplicates.
  3. Apply statistical methods to identify variants and assign genotypes.
  4. Produce a gVCF file as output.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is variant annotation?

A

Variant annotation is the process of describing the nature and effect of the DNA alterations produced by a variant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In variant annotation, what does nature and effect refer to?

A

Nature: Type of sequence alteration (indel, substitution, cnv etc…)

Effect: How the variant changes the annotated reference sequence it occurs in.

Variants will always have a single nature, but can have multiple effects, because the effect on the context of each transcript.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Some common varient annotation tools are SnpEff, ANNOVAR, and Varient Effect Predictor (VEP), what do they all have in common?

A

All varient annotation tools take a VCF file as input, annotate each varient in the file to return an annotated VCF file, with annotations in the info field of the file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What tools are used to predict varient impact on the structure and function of human proteins?

A

SIFT (Sorting Tolerant From Intolerant), creates a postion-specific weight matrix that corresponds to the profiles of specific protein domains. Position specific weight matrices can also be used to identify transcription factor binding sites.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What two factors determine if variation will be retained from generation to generation?

A

1.) Variation is caused by mutation, if the mutation is in a somatic cell it will not be passed on, but if the mutation occurs in a germline cell it will be.

  1. If the mutation positively or negatively affects reproductive fitness.
17
Q

What is the difference between a single nucleotide varient (SNV) and a single nucleotide polymorphism (SNP)?

A

SNV’s occur in less than 1% of the population.

SNP’s occur in more than 1% of the population.

18
Q

What is the basis of HWE?

A

In the absence of evolutionary forces, allele, and genotype frequencies remain the same from generation to generation.

19
Q

What are the requirements for HWE?

A

Large population.
Diploid organisms.
Sexual reproduction.
Allele frequencies equal in both sexes.
Random mating.
No cross generational mating.
No selection.

20
Q

What explains deviations from HWE?

A
  1. Genetic drift
  2. Natural selection.
  3. Mutation.
  4. gene flow.
  5. Inbreeding.
  6. Non-random mating.
21
Q

What is linkage disequilibrium?

A

Linkage disequilibrium is the non-random association of alleles at different loci in a given population.

What breaks HWE is what breaks LD as allele frequencies are used to calculate LD.

22
Q

What is a haplotype?

A

A haplotype refers to alleles that are close together on the chromosome and thus are inherited together in blocks.

23
Q

What are tag SNPs?

A

Tag SNPs are SNPS that because of Linkage Disequilibrium can be used as markers for specific haplotypes.

24
Q

What is population structure?

A

Population structure refers to differences in patterns of genetic varients between populations. (with respect to allele frequencies, LD, and haplotype structure.)

25
Q

What is the point of a GWAS?

A

GWAS identifies correlations between alleles at a particular locus and a phenotypic disease or trait of interest.

26
Q

What is the common disease, common varient hypothesis?

A

Varients associated with common traits/diseases will have small effect sizes and be common within populations.

27
Q

SNP chips rely on the principle of indirect association, what is that?

A

The principle of indirect association states that when an association is found between an SNP and a disease, the varient in question is likley no the true varient associated with the disease, but rather one in strong LD wit the responsible varient.

This is becuase tag SNPs are used to represent allelic variability at multiple positions.

28
Q

What is the purpose of diiferential gene expression analysis?

A

The purpose of differential gene expression analysis is transcriptionally profile and compare 2 conditions, in order to identify genes that are differentially expressed between the two conditions.

29
Q

What are technical replicates?

A

Technical replicates are repeated measurements of the same sample, they represent measures of random noise associated with the protocols or equipment.

30
Q

What are biological replicates?

A

Biological replicates are parallel measurements of distinct samples that capture random biological variation.

31
Q

What are eQTLs?

A

eQTL = Expression Quantitative Trait Loci.

eQTLs: Genomic loci statistically associated with variations in expression.

32
Q

What are local eQTLs?

A

Local eQTLs are located in close proximity to the genes they influence and can act in cis or in trans.

33
Q

What are distant eQTLs?

A

Distant eQTLs are located far away from the genes they influence, usually work in trans and have smaller effect sizes, are more tissue-specific, and are harder to detect.