Genomics Flashcards
What is genomics?
The study of whole genomes including their mapping, structure, function and evolution.
This typically refers to changes in DNA sequence.
What are regulatory and/or functional genomics?
These typically refer to changes in the structure, packaging and function of DNA sequence and DNA products.
What does the term Omics mean?
Describes the characterisation and quantification of biological molecules in an organism e.g. proteome, microbiome.
What is the 100K genomes project?
A project aiming to sequence the genomes of patients to help improve disease diagnosis and treatment. Diseases include cancer, rare diseases, infections and controls. 100k+ genes have been sequenced so far. It is a NHS transformation project.
What are the characteristics of X-linked recessive inheritance?
- Mutations in one copy of a gene are necessary to cause the disease in males, but females would need both copies of the gene.
- No male to male transmission.
What is genetic linkage?
Because recombination can occur at any location along the chromosome, the frequency of recombination between two locations depends on their distance. As a consequence DNA regions that are located in close proximity are more likely to be co-inherited (not split apart) than DNA regions originating from further apart.
How can you determine how close genetic loci are to each other?
By observing recombinant events. The fewer the recombinant events between loci across generations, the closer they are to each other.
What information does genetic linkage provide?
- Helps determine the genotype of a disease causing mutation (is it autosomal or sex-linked, recessive or dominant?) without knowing the location of the gene.
- By observing the gene segregation (which alleles are passed on to which child) you can narrow down the location of a mutation to a certain locus - linkage interval.
What is used to detect the presence of linkage?
A statistical test called the LOD score.
LOD score >3 is consistent with linkage.
LOD score
What happens when you have determined a linkage interval?
- Identify the genes within the interval.
- Assess the genes as potential candidates based on their biological function.
- Sequence the genes in the affected individuals to try and identify the causative mutation.
What can confound linkage analysis?
- Non-penetrance
- Phenocopies
How can linkage analysis overcome the issue of genetic heterogeneity?
By performing linkage analysis in multiple pedigrees (families) and combining the results.
Why is coding the human exome (all protein-coding regions) a cost-effective strategy for monogenic diseases?
The causative mutations in monogenic diseases have consistently been identified in the protein coding regions of the genome, so no need to sequence the entire thing at a higher cost.
What do you gain from sequencing the exome?
You are able to catalogue all protein altering variations in an individual.
If the individual has a monogenic disease you can filter through the variation to identify which variant is causing the disease.
If a person has a rare, autosomal dominant disease how would you filter through the variants in protein coding regions after sequencing their exome?
1) Because the disease is dominant, you ignore the homozygous variants and look at the heterozygous.
2) You look at the heterozygous mutations and see which ones are predicted to alter protein structure and which are silent.
3) Out of the variants predicted to alter protein structure, you ignore the variants that have been previously observed which leaves you which a number of novel candidate variants.
4) You then investigate these variants and dismiss the ones that do not match the phenotype of the disease.
5) For more accuracy, you compare the candidate variants of multiple unrelated individuals with the same disease.
What is a de novo mutation?
Another term for a spontaneous mutation - when the mutation is not seen in the parents.
What is a major confounding factor in exome sequencing for disease?
Genetic heterogeneity.
What is a DNA variant/variation?
Where specific loci within the genome contain two or more alleles.
What are transitions and transversions?
Transition = another word for a silent point mutation.
Transversion = a point mutation that changes the base chemistry = another word for missense or nonsense mutations.
What do small rearrangements refer to?
Any changes in the DNA sequence whose effects are confined to a single exon of a gene.
What are the consequences of variants in splice sites?
- Exon skipping
- Use of cryptic splice sites/new splice sites
- Intron retention (small introns only)
- Combination of above
In which site are variants in DNA less commonly found?
- Promoters
- Untranslated regions
- Polyadenylation signals
What are amorphs and antimorphs?
Amorph = a variant that causes complete loss of gene function.
Antimorph = dominant alleles that act in opposition to normal gene activity.
What are hypomorphs and hypermorphs?
Hypomorph = a variant that causes a partial loss of gene function.
Hypermorph = a variant that causes an increase in normal gene activity.
What is a neomorphic mutation?
A variant that causes a dominant gain of gene function that is different from normal function.
What changes to proteins causes loss of function phenotypes?
- Little or no protein produced.
- Protein is unstable or inappropriately targeted for degradation.
- Residue or domain essential for protein function is missing or altered.
How can loss of function alleles (which are usually recessive) display dominant inheritance?
1) Haploinsufficiency = the organism is so sensitive to protein levels that 50% reduction causes a noticeable phenotype.
2) Dominant negative effect = the mutated proteins disrupts the function of normal proteins in complexes.
3) Somatic second hits = the person inherits a defect and then has that defect amplified by another somatic mutation that affects the same protein.
What are gain of function mutations?
Where rather than lose its principal function, the protein may become less specific in its normal function or acquire a novel function. These are dominantly inherited.
What was the effected gene in infantile onset epilepsy in the Amish?
SIAT9 encoding the enzyme GM3 synthase which is used for the production of gangliosides which are vital for membrane stability. The mutation led to loss of function in this enzyme.
Who is the consultand in a pedigree?
The person who initially approached the doctors with concerns over the condition in their family.
What is genomic anticipation?
When the presentation of the disease gets worse as it is transmitted across generations. Seen in myotonic dystrophy and Huntington’s disease.
What is genetic penetrance?
The proportion of carriers who manifest phenotypic symptoms of the condition - not all individuals who inherit a dominant mutation will necessarily show signs of the condition.
Penetrance is a statistic and is usually expressed as a percentage.
What factor can increase the likelihood of a de novo mutation?
Paternal age due to the large number of mitotic divisions of male games stem cells during a man’s lifetime.
What is germline mosaicism?
When a parent carries a small proportion of games that harbour the same mutation.
What is compound heterozygosity?
Where someone has different allelic mutation at the same locus.
What are lines of Blaschko?
Lines that represent territories of clonal cell populations on the skin.
What are APO-E alleles?
They are alleles associated with Alzheimer’s disease.
- e4 allele increases the risk of AD
- e3 allele causes a baseline risk
- e2 allele protects against AD
What are genome-wide association studies?
They are tests to see whether the presence of a specific gene variant (SNP) increases the risk of disease. This is done across many individuals and by comparing affected and unaffected individuals.
What diseases have absolute and relative risk?
Absolute = Mendelian diseases.
Relative = polygenic diseases.
What P-value is used in GWAS? Why is it not 0.05?
5 x 10^-8
Because GWAS uses tens of thousands of people and the chance that any SNP has a causal effect on the phenotype being studied is very low.
What is one of the most important aspects of genetic counselling?
To provide a risk figure for the patient, their offspring and other family members.
Describe the process of next generation sequencing.
- DNA is broken into smaller pieces by a sonar pulse.
- DNA fragments are denatured (split into two strands) and hybridised to a surface (bead or solid plate).
- Denatured DNA fragment is amplified in either an emulsion, PCR reaction or an isothermal bridge amplification reaction. Result is a clonal cluster of DNA.
- The DNA is then sequenced together with fluorescent nucleotides (denatured parental strand has fluorescent nucleotides added).
How do you calculate the probability of a child being affected by a genetic disease?
You multiply the probability of the child inheriting the disease by the penetrance of the disease.
What is an obligate carrier?
An individual who may be clinically unaffected but who must carry a gene mutation based on analysis of the family history.
What are most diseases and human traits caused by?/
Many genetic variants each having a tiny effect.
What are genetic association studies?
They test the association between the alleles/genotypes of a SNP and the trait of interest.
What are inheritance models?
The pattern of effects that different genotypes have on a phenotype is shown in these models.
On the Y axis you have the phenotype and on the X you have the genotype (AA, AB, BB).
How are GWAS results usually displayed?
Manhattan Plot
X axis = association results across ordered chromosomes.
Y axis = P-values to log base 10.
What is the key idea underlying the GWAS design?
That disease risk alleles pass from generation to generation together with other alleles that are nearby on the genome. Therefore, even if the variants are not identified in the study, the SNPs that are might be nearby and correlated with the variants. This is indirect association.
What is a polygenic risk score?
They are constructed from the Manhattan plots created by GWAS data and predict an individual’s risk of disease based on the combination of their genotype and the effect size estimates from GWAS results.
How to you calculate a polygenic risk score?
You put a horizontal line marking the P-value of 5x10^-8 on a Manhattan plot and take each variant that is above this line and put it into a score.
Often you take a whole series of scores using different P-value thresholds.
It is a sum of risk alleles from the individual’s genome-wide SNPs weighted by their GWAS-derived effect score (from the Manhattan plot).
This result is then compared to the disease state of the patient to see how accurate the score was (shown in a regression with a variety of P-value thresholds).
How are polygenic risk scores useful for medical research?
- Can assess shared genetic aetiology among phenotypes.
- Act as a biomarker for disease.
- Infer whether a biological factor is causally associated with a disorder.
- Screen subjects for clinical trials.
What is the future of polygenic risk scores?
Individuals can be placed on a spectrum of genetic burden to a disease/trait.
What are the advantages of sequencing the whole genome?
A person’s genome doesn’t change so there is the potential to collect once, store and then refer to repeatedly for clinical care.
What are the limitations of next generation sequencing?
- Short reads of NGS (only small fragments are duplicated) make accurate characterisation of large variants hard.
- NGS accuracy is currently lower than older, more expensive sequencing technology.
What can you do to identify causal variants in the genome?
- Filter variants that are frequently observed.
- Look for variants identified as pathogenic.
- Look for variants in genes linked to a condition.
- Look for variants that effect functional elements.
- Look for variants that are normally conserved across species.
What is common among pathogenic variants?
A large proportion are false positives.
What is rare disease diagnostics?
Where you sequence the affected individual and other family members (affected or unaffected).
Who is whole-genome sequencing restricted to in clinical care?
- People with monogenic diseases as they only have one variant.
- People with clear phenotypes - can focus on the genes associated with the phenotype.
- Patients who are ill.
What is reported in clinical use of whole-genome sequencing?
Protein coding sequences only as it is easier to predict the effect of mutation.
What were the objectives of the 100K genome project?
- Improve health of NHS patients.
- Stimulate wealth generation (economy).
- Create legacy of infrastructure, human capacity and capability.
- Enable large scale genomics research.
What are the responsibilities of NHS genomic medicine centres?
- Identifying and recruiting participants.
- Clinical care following results.
What is human phenotype ontology?
A system that provides a standardised vocabulary of phenotypic abnormalities encountered in human disease.
What information is fed back to the NHS from the 100k genome project?
- Information about the patient’s main condition.
- Information about additional ‘serious and actionable’ conditions (optional).
- Carrier status for non-affected parents of children with rare diseases (optional).
What are the three tiers of reported variants?
1) Variant is in the gene panel (in a list of genes known to be associated with a condition), there is clear loss-of-function evidence, and is a known pathogenic variant.
2) Variant is in the gene panel and is a missense mutation or a VUS (variant of unknown significance).
3) Gene is not in the panel (not in the known list of genes associated with a disease).
Why has there been difficulties surrounding cancer in the 100K genome project?
Standard practice is to preserve tumours in formalin then fix them in paraffin for imaging, but this process causes damaged and broken DNA to be extracted for sequencing.
What has been done to overcome the issues with tumour samples in the 100K genome project?
Tissue is now being frozen and effort is being made to ensure the sample is mainly tumour cells.
What proteins do mitochondrial DNA code for?
Proteins involved in oxidative phosphorylation. All other proteins involved in repair of machinery and mitochondrial transcription and translation are nuclear proteins.