Week 11 (1000 Genomes Project) Flashcards

Question

the evidence for a single genotype typically weak in low coverage regions. why is it more difficult for heterozygous traits?

Answer 1

a single read is sufficient for there to be error, but it could mean it is heterozygous, so your confidence on the call is low

Answer 2

sequence deeper (increase coverage)

Answer 3

variant quality score calibration

Answer 4

1.1 billion

Answer 5

structural variation

Answer 6

repetitive sequence

Answer 7

long read sequencers, so we can span across the repeat

Answer 8

- true positive - false positice - false negative

Answer 9

false discovery rate

Answer 10

FP / FP+TP (FDR= false discovery rate, FP=false positive, TP = true positive)

Answer 11

the fraction of the reference genome in which short-read data can lead to reliable variant discovery

Answer 12

- balletic indels - large deletions

Answer 13

we have two chromosomes, so if the other chromosome is functioning it can mask the bad variant

Answer 14

Sequencing depth, which is the number of times each nucleotide position is read during sequencing, significantly impacts genotype accuracy because it directly influences the ability to accurately infer the genetic makeup of a sample

Answer 15

the mutation cannot kill the person it is affecting, it must allow the individual to survive and reproduce

Answer 16

Human population expansion, while increasing overall genetic diversity, actually leads to more low-frequency variants rather than common variants. This is because rapid growth creates a "load" of new mutations, many of which are rare.

Answer 17

Old mutations tend to be found on short haplotypes while new mutations are more likely on long haplotypes due to the process of recombination and mutation breaking down ancestral haplotypes over time

Answer 18

old; short; recombination

Answer 19

redundancy (we have 2 chromosomes, a copy of everything)

Answer 20

genetic changes, or mutations, that negatively impact an individual's fitness and reproductive success

Answer 21

redundancy (we have another chromosome that, if functional, can support the organism)

Answer 22

redundancy, some deleterious variation does not have as big of an impact as others

Answer 23

long read sequencing

Answer 24

it depends! It depends on if you are able to achieve better data or if this is the best you can do with the resources you have (ex: ancient DNA)

Answer 25

the implication that the interpretation of rare variants in individuals with a particular disease should be within the context of the local (geographic or ancestral) genetic background. variation can be different between populations so it is important to sequence individuals from diverse populations

Answer 26

the method of choice to study gene expression and identify novel RNA species

Answer 27

the study of transcriptomes (the complete set of RNA transcripts, both coding and non-coding, within a cell, tissue, or organism) and their functions

Answer 28

- uracil - ribose

Answer 29

cDNA library preparation (converting RNA to DNA)

Answer 30

polyadenylated RNA

Answer 31

ribosomal RNA

Answer 32

poly(A) purification

Answer 33

ribosomal RNA

Answer 34

1. ribosomal depletion (more expensive) 2. oligo-dT (most common)

Answer 35

eliminating ribosomal RNA

Answer 36

the size limitation of most current sequencing platforms (ex: Illumina needs <600 bp)

Answer 37

long read sequencing (but it is expensive so short read is more common)

Answer 38

without strand information, antisense RNA, which is complementary to the sense RNA, is incorrectly counted as sense RNA, leading to an overestimation of sense RNA expression and the inability to accurately quantify antisense RNA

Answer 39

dUTP in the second strand of cDNA, then enzymes will degrade the strand that contains uracil

Answer 40

- ligation of the 3' preadenylated and 5' adapters - labeling the second strand with dUTP followed by enzymatic degradation (MOST COMMON) - the peregrine method

Answer 41

alternative splicing

Answer 42

sequence each transcript from beginning to end

Answer 43

high quality (we want to see the tall ribosomal RNA peaks)

Answer 44

medium quality

Answer 45

low quality

Answer 46

what, when, where, how much

Answer 47

aspects of discovery and quantification can be combined

Answer 48

experimental design, that the data generated have the potential to answer the biological questions of interest

Answer 49

how much; which

Answer 50

RNA-extraction

Answer 51

the analysis goals

Answer 52

short read; long read

Answer 53

different versions of the same gene, arising from variations in alternative splicing or different transcription initiation and termination sites

Answer 54

3 (more is always better)

Answer 55

technical variability; biological variability

Answer 56

technical variability, biological variability, and the desired statistical power

Answer 57

you collected DNA from the bacteria or virus or fungi from the tissue

Answer 58

NO, it will give you information on the tissue that you collected and indicate info about expression

Answer 59

- genome mapping - transcriptome mapping - reference free assembly

Answer 60

to estimate the gene and transcript expression (the how much)

Answer 61

normalization methods are used to address technical biases and ensure accurate comparisons of gene expression levels across samples

Answer 62

In this scenario, RPKM (Reads Per Kilobase per Million mapped reads) or similar normalization methods would be crucial. Since Gene B is 10 times longer than Gene A, raw read counts for Gene B would inherently be higher than for Gene A, even if the true expression levels are identical. Normalization is needed to account for this length bias and ensure that comparisons reflect the true expression differences, not simply the length of the gene.

Answer 63

- within the sample to account for the fact that longer genes accumulate more reads - NOT necessary when comparing changes in gene expression within the same gene across samples

Answer 64

COVID, collect from patients and you can tell the amount of exposure to COVID and it can give you information on the level or expression

Answer 65

biological replication

Answer 66

DNA sequencing

Week 11 (1000 Genomes Project) Flashcards

(99 cards)