module 3: beyond genome sequencing Flashcards
why bother with NGS if the human genome has already been sequenced?
1) Clinical setting - gives info about potential disease causing mutations
2) Phylogenetic studies
3) Compare sequences between population to detect variability
4) Keep up with the changes of the genome
1) What is whole-exome and targeted sequencing?
2) How does it work?
1) Sequencing the exonic region (sequences retained in mature mRNA) of the genome! It is important for CLINICAL RESEARCH.
2) Same as whole genome sequencing but differs in library preparation - after fragmenting DNA, special beads bind to exonic sequences only.
Whole-exome and targeted sequencing _______ sequencing power by reducing the _______ area covered.
It lays the framework for causes of _________.
increase; genomic
autism (complex disease with many factors contributing to it)
*most disease causing mutations disrupt gene expression; affect coding regions.
_________ platforms can detect modified bases.
Nanopore
*has lots of potential for epigenetic studies
(T/F) NGS is not important after sequencing one genome as a single genome can represent the genetic diversity of our species.
False!
NGS help CATALOGUE human genome diversity: NO SINGLE GENOME CAN REPRESENT THE GENETIC DIVERSITY OF OUR SPECIES.
Which statement is true?
1) The reference genome for humans is mosaic (many different genomes).
2) 0.6% of NT differs between any 2 individuals.
All are true!
1) What was the first project to catalog human diversity?
2) What was the “All-of-US” project?
1) 1000 genome project which ran from 2008-2015. it sequenced 2,504 individual genomes from 26 different populations. POPULATION-level sequencing
2) aimed to sequence the genomes of 1 million American citizens to accelerate research and improve health
Why is cataloging human genetic diversity important?
1) helps us see where our genomes differ and how these can affect our phenotypes.
2) can help us learn how our genetics can influence our response to certain drugs and our susceptibility to different diseases.
Human Pangenome Reference Consortium looked at 47 phased diploid genomes.
1) What does phased diploid genome mean?
2) Why is this important?
1) Separated maternal chromosomes from paternal chromosomes.
2) Reference sequences provide a consensus sequence of two homologous that doesn’t take in the diversity between maternal and paternal.
This will better represent the diversity of the human genome. Each genome carries a certain number of DNA variants.
1) Define the term mutation.
2) Mutations are caused by:
1) Permanent change in the DNA sequence compared to what is predominant in the population
2) Endogenous (un-repaired DNA damage) and exogenous sources
Give examples of exogenous and endogenous sources of mutations.
Exogenous:
- ionizing radiation (DNA breaks)
- UV rays (thymine dimers, issues with DNA replication)
- chemicals (deamination, oxidation of bases)
Endogenous:
- oxidation of bases (G -> T)
- errors of DNA pol
- mis-repaired ds/ss DNA breaks
- loss of a purine/pyrimidine (abasic site)
- cytosine deamination (gives uracil)
*most endogenous sources are repaired normally but when they are not repaired, it can lead to mutations!
(T/F) Mutations can range from a single bp to millions of bp. They are also inevitable.
True!
We can not stop mutations. They can be good or bad depending on where they occur and their nature.
(T/F) We lose 500 purines per day and 10,000 pyrimidines per day.
False!
We lose 10,000 purines per day and 500 pyrimidines per day.
These are normally repaired but if not, errors arise during DNA replication.
1) Define genetic VARIATION.
2) What are the two types of variations? Briefly describe each.
Variation: mutations that result in ALTERNATIVE forms of DNA (established in a population).
Common variation: minor allele (least common allele) frequency of at least 1% in the population
Rare variation: minor allele frequency of <1% in the population
*not a strict rule
Define allele.
Allele refers to one of two or more versions of a DNA sequence at a given location.
For any genomic location, we have two alleles (maternal vs paternal).
You can be HOMOZYGOUS or HETEROZYGOUS for alleles.
(T/F) If you are homozygous for one loci, chances are you are homozygous for all.
False!
You can be homozygous for some loci but heterozygous for others.
What are the four types of genomic variants?
1) Single NT polymorphisms
2) Insertion-Deletions (INDELs or DIPs)
3) Simple sequence repeats (SSRs)
4) Copy number variants (CNVs)
Single nucleotide polymorphisms, also known as SNPs are the ____ common genomic variants (1 in every ____ NTs).
There are about _______ SNPs in the human genome.
What are the causes of SNPs?
Most; 300
10 million
Causes of SNPs are the same as the causes of mutations; errors, radiation, oxidation, endo vs exo, etc.
*SNPs are point mutations
Briefly describe where SNPs can occur within the genome.
Which location has the most visible impact?
SNPs can occur in the:
1) Coding
2) Non-coding (introns - could affect mRNA splicing, TFs binding, and stability of mRNA esp if it is in the 3’ UTR)
3) Intergenic (regulatory regions between genes - can affect transcription)
Most visible impacts are seen when SNPs are present in the coding and intergenic regions.
There are two types of coding SNPs.
Differentiate them.
1) Synonymous: NO CHANGE in the amino acid thus no impact of protein
2) Non-synonymous: CHANGES the amino acid.
There are two types of non-synonymous: MISSENSE or NONSENSE (intro of premature STOP codon).
*the mRNA usually gets degraded before it can be translated if it has a non-synonymous coding SNP.
What is the difference between a causative and a correlated SNP?
Causative SNP: SNP alters protein function, leading to disfunction in the organism. The SNP causes the observed phenotype.
Correlated SNP: SNP is not within a coding region but is inherited with a mutation that causes a disease. The SNP does not cause the observed phenotype.
(T/F) Most SNPs have a significant impact on the health and development of humans.
False!
Most SNPs are not observable unless they are affecting a coding/regulatory region.
What is the human germline mutation rate for SNPs?
Knowing the human germline mutation rate for SNPs and that we have a lot of SNPs in our genome, what does this tell us?
1 in 100 million NTs are substituted per generation (~1.2x10^-8 per site per generation). This means 30 NEW PT MUTATIONS per generation are arising in an egg/sperm.
We find a lot of SNPs in our genome and we know that each incidence of creating that SNP is a rare event. This tells us that SNPs are VERY OLD mutational events inherited by a COMMON ANCESTOR.
We can compare the SNPs across various genomes to trace the origins of the human species!
Most SNPs are __-allelic.
Bi-allelic
This means that most SNPs come in one or two varieties. For example, a locus can have either A or T but not G or C.
This is because the germline mutation rate is so LOW! That exact position has to be mutated more than once to be tri-allelic and more, which is very rare.