Genome Variation Flashcards
What proportion of the genome is the exome - codes for protein?
How much of our genome do we share with someone else?
2%
99.7% - only 9 million bases out of 3 billion are different
What kind of differences in the genome are associated with disease?
Macro-level differences (e.g. trisomy 21) and micro-level, molecular-level differences (e.g. a single point mutation as in SCA/ a 3BP deletion in the CFTR gene leading to CF)
What is special about mono-zygotic twins’ DNA?
They are identical at every base
What in DNA terms is considered polymorphic?
Any base in the genome that varies between individuals is polymorphic
What is a reference sequence and a reference allele?
A sequence database which summarizes the base at that position that is present for the majority of people
The most common allele
How are variants/ polymorphic positions found?
By comparing someone’s sequence to a reference sequence and seeing that they are different
How was the referencing sequence generated?
4 anonymous individuals genomes were sequenced and averaged out in the human genome project
How often does a SNV occur in the reference sequence and in one individual?
Once every 300 nucleotides in the reference sequence; once every 1000 nucleotides in an individual
Where are the majority of SNVs found?
Not in the exome
How are SNVs generated?
By faulty mismatch repairing that occurs during DNA replication
What is a biallelic site?
A site in DNA where there could be 2 possible alleles (2 variants, one of which is the reference sequence base)
“A biallelic site is a specific locus in a genome that contains two observed alleles, counting the reference as one, and therefore allowing for one variant allele. In practical terms, this is what you would call a site where, across multiple samples in a cohort, you have evidence for a single non-reference allele.”
How is a SNV formed?
In DNA replication the two strands separate and are templates to synthesise complementary strands, forming identical copies.
However, when synthesising this strand instead of incorporating an A, a G has been incorporated.
The mismatch repair mechanism will identify this mistake and correct it so that the bases are a standard Watson-Crick base pair
However, in this instance it hasn’t corrected the G, it’s replaced the T with a C.
If this change occurs in the gametes and isn’t deleterious then it will get passed on to the next generation and as time goes on it can spread through the population
Where can SNVs be found?
In genes, promoters and non-coding regions
In genes, they can change (non-synonymous/ missense) or not change (synonymous) an amino acid, and could change the amino acid into a stop codon (nonsense). They also change where the splicing can occur in a sequence (the splice sites) and can occur in a UTR, affecting gene expression
In promoters they can affect protein expression
When do SNVs disappear from the genome?
When they have a deleterious effect (causing harm/ damage) or cause population annihilation
What kind of mutation is SCA?
A point, missense mutation
How common is the SCA point mutation?
White European people
0.02%
2 in every 10,000 chromosomes
African people
4.5%
1 in every 20 chromosomes
What is heterozygote advantage?
When being heterozygous for a disease allele provides benefit e.g. having a sickle cell trait protects against malaria in heterozygotes
Minor allele frequency figures:
- For a mutation
- For a polymorphism (but this does not imply no pathogenicity e.g. SCA)
- Rare polymorphism
- Common polymorphism
- <1%
- > 1%
- 1-5%
- > 5%
What determines if a variant (that starts off rare) remains that way?
Evolutionary forces
How do SNVs spread?
- Migration introduces the variant into another population, known as gene flow
- Random change in variant allele frequency between generations, known as genetic drift
- Selection in favour of the allele (non random change in variant allele frequency)
What is negative and positive selection?
Negative selection is the selective removal of rare alleles that are harmful, whilst positive selection are the traits of a species that are selected for
What must be considered when determining of a variant is pathogenic?
If it is occurring in a gene/ not and if so, is it a key developmental gene (e.g. HOXD1, in which case it could be pathogenic) or not (e.g. MC1R gene for pigmentation)
So it is not easy to determine this. It depends on both the type of variant and the environment.
Why is every genome not exactly 3000Mbs?
Every individual has a different (and highly variable) number of microsatellites/ STRs/ SSRs (simple sequence repeats)
What are the types of microsatellites?
Di/ tri/ tetra/ penta/ hexa nucleotides (tells how many bases are present in 1 repeat)