Gene duplication & Exon shuffling Flashcards

1
Q

How can a genome acquire a new gene?

A
  • Horizontal gene transfer
  • Exon shuffling
  • Duplication and divergence
    o 1% chance for 1 gene to duplicate in 1 million years
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Function of genes

A
  • Promiscuous = Side reaction has no biological function
  • Bifunctional = both activities have a biological function
  • Over evolution, 2 functions diverge → enzymes pick up different mutations → specialise → become better at catalysing one reaction or what was originally a side reaction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is DNA duplicated by recombination?

A
  1. Unequal crossing over (meiosis)
    o Only requires certain lengths of similar sequences
    o Can get recombination between sets of repeats that are inappropriately lined up
    o One chromosome has duplication; other has deletion → have different daughter gametes → if have selective advantage will survive through evolution
  2. Unequal sister chromatid exchange (mitosis)
    o Involves exchange between two chromatids
    o Paired up on repeat sequence → one chromatid duplication, one deletion
    o Depending on species will not be passed on to progeny
  3. DNA amplification during replication
    o In haploid organisms (e.g. bacteria)
    o Unequal recombination during replication → ‘replication bubble’: DNA splits up in replication forks → homologous DNA but inappropriate lining up so one strand has duplication of region, other gets deletion
  4. Replication Slippage
    o For short DNA sequences e.g. microsatellites, CAG triplet, poly-Q Huntington’s disease
    o Not common for genes
    o DNA loops out one repeat and starts to re-pair-up downstream → added DNA repeat as part of replication cycle
    o Other end has looped out → priming in wrong place → deleting the sequence
    o Can get insertions or deletions
    o Partial duplication of genetic material that codes for protein
  5. Retrotransposition
    o Retrotransposons can reverse transcribe RNA copies back into DNA and spread across genomes over evolutionary time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Successful gene duplication

A
  • Successful = gene survives
  • Successful outcome #1 → gene originally w/one copy duplicated → hypothesis: 2 copies should double synthesis rate if everything else is equal
    o If beneficial → retain that
    o If second copy does not provide dosing advantage → can pick up random mutations → will eventually inactive random mutation → over evolutionary time accumulate mutations → get pseudogenes (no longer fully functional gene)
  • Successful outcome #2 → getting new function
    o “neofunctionalization” or sub-function of parental copies
  • If selection pressure just for dosage → genes stay similar
  • If no selection pressure for second copy → one copy either degrades entirely (pseudogene) or gets a new function if it provides advantage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Gene neofunctionalization example

A
  • Trypsin vs chymotrypsin
    o Evolved to be different proteases
    o Trypsin → cuts at Arg & Lys
    o Chymotrypsin → cuts at Phe, Trp & Tyr
    o Not structurally identical but similarities in proportion of strand/helices and nature of active site
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pseudogenes

A
  • Copies of functional genes → altered/missing regions
  • Often have stop codons/frameshifts/missense mutations → kill reading frame of protein
  • May have regulatory role → often producing RNA
  • Increase genome size (cost/benefit)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of pseudogenes

A
  • “non-processed” pseudogenes:
    o Tandem duplication of genomic region
    o Inactivating mutations/incomplete duplications
    o Part of genome missing regulatory regions → no promoter, enhancers in correct place but does have original intron/exon structure
  • “processed” pseudogenes:
    o Undergoes reverse transcriptase activity (LINE, retrovirus) → mRNA to cDNA → genome integration to make second duplicated gene copy
    o Lacks regulatory regions e.g. introns
    o Can have different combinations of exons
    o Loses most of promoter region except 5’ untranslated region at front of gene
    o Could contain poly(A) tail
    o Can integrate into same or different chromosome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Examples of pseudogenes

A
  • Ribosomal proteins
    o Highly duplicated across different species and highly conserved (essential for protein synthesis machinery)
    o Associated w/ L1 retrotransposon
    o May have functional role as have high expression rate
  • Humans have 20,000 pseudogenes → most are ribosomal
    o 2/3 of these also in chimpanzee genome
    o Less than 12 shared w/mouse genome
    o Not clear what these genes are doing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Multigene families

A
  • If duplication is beneficial, multigene family can be formed.
  • E.g. rRNA (v. important so highly conserved)
  • Tandem gene families = clustered on same chromosome
  • Dispersed gene families = on different chromosome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Globin superfamily

A

Example of duplication & divergence
Carry out different functions in different tissues
Mixture of co-localised gene sin clusters and dispersal of these across the whole genome on different chromosomes → tandem & dispersed
Can trace evolution over different organisms → compare genes within/between species
Globins are v. common → present in all 3 domains of life
Haem-containing protein domain → v. diverse
Used for oxygen transport, storage, sensing & detoxification
Haemoglobin: tetramer (2α, 2ß)
Myoglobin: monomer
Different structures because changes the property of which they can load/take off oxygen
Others include: neuroglobin, androglobin, cytoglobin, globin E, globin X, globin Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Haemoglobin

A
  • Cooperativity in binding:
    o Difficulty when oxygen initially tries to bind haem at low concentration
    o Each subsequent oxygen binding cooperatively helps the next one within tetramer → get non-linearity in binding curve → sigmoidal curve as haem requires high levels of oxygen to bind oxygen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Myoglobin

A
  • Found in muscles
  • Has simpler binding curve → no cooperativity
  • Higher affinity for oxygen
  • Having different proteins for oxygen storage and transport w/different binding affinities is useful
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Genome duplication

A
  • Larger duplication than genes/segments is possible → can affect genome structure
  • Whole chromosome duplication → trisomy 21 → ‘down syndrome’
    o Gene product imbalance
    o Reduced life expectancy
  • Genome sequencing suggested major metazoan lineages have undergone whole genome duplications (WGD)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Polyploidy

A
  • Multiple complete sets of chromosomes
  • Useful in agriculture to make bigger cells → bigger fruit
  • ~80% of flowering plants: oats, cotton, potatoes, bananas, coffee, etc
  • Common in invertebrates, fish & amphibians; rare in mammals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Autopolyploid

A
  • Multiplication of identical species within single species
  • Meiosis error within single species
  • Fertilization of unreduced gametes
  • Accidental production of diploid gametes not v. rare (1-40%)
  • Can induce disease symptoms:
    o ‘Genomic shock’ → widespread activation of transposons, gene expression, recombination (short-term effect)
    o These can then stabilise over time → produce fertile gametes and pass down duplications
  • Need to have even/paired up number of chromosomes to align properly during metaphase
  • Autopolyploids can reproduce successfully but cannot breed with parent species → introduces speciation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Allopolyploidy

A
  • Hybridisation between 2 reproductively compatible species
  • One-step model:
    o Fertilization of unreduced gametes from 2 diploid species
  • Two-step model:
    o Hybridisation between haploid gametes followed by somatic doubling of chromosomes in zygote
    o In plants, pollen from 1 species germinates on stigma of 2nd → endoreduplication in zygote
  • Triploids:
    o Tetraploid + diploid parents → triploid paired up zygotes
    o Triploid is viable but makes unbalanced gametes (odd #) so cannot segregate in meiosis II
17
Q

Triploid example: wheat-rye hybrid

A
  • Cross good traits: high yield of wheat + disease tolerance of rye
  • Wheat (n=28) + rye (n=14) = Triticale (n= 21) → not fertile
  • To overcome this:
    o Treatment w/colchicine (chemical) interferes w/spindle machinery of cells → doubles chromosomes in germ cells
    o Now have 42 chromosomes → fertile Triticale
18
Q

The effects of WGD

A
  • Cytogenetics = chromosome counts
    o Use dyes; do karyograms
  • Detect multivalent formation = chromosomes line up and undergo homologous recombination
    o Can undergo more diversity and local gene duplication
    o Genome size comparison, etc
    o Difficult to discern ‘auto’ vs ‘allo’-ploidy
  • Saccharomyces cerevisiae → brewer’s yeast
    o Compare every gene to every other gene
    o Duplicated sets of genes → can compare to ancestors, related yeast species, etc
    o Long time ago so evidence lost but estimate 10% of genes derive from WGD
19
Q

Genome duplication in multicellular organisms

A
  • Genome duplication drives metazoan expansion
  • Increase in organisational complexity
  • Main controller of body counts = Homeobox gene (Hox genes)
    o Encode for ‘homeodomain’ → DNA binding proteins (~60 amino acids long) → transcription factors that regulate genes
  • Studied a lot in fruit flies
    o E.g single homeotic mutation doubles number of wings in Drosophila (bithorax)
20
Q

Hox gene family

A
  • V. well organised
    o Spatial and temporal collinearity
    o Order of genes on chromosome reflects expression order
  • Expressed in different regions of developing embryo
  • Blueprint same across many different species
    o Insect only 1 Hox cluster
    o Vertebrates e.g. mouse have many Hox cluster (usually 4)
    o Number of segments corresponds to number of clusters and components within them
21
Q

2R/3R hypothesis of WGD

A
  • “Complexity in fish and vertebrate formation probably driven by WGD”
  • Evidence: looking at Hox clusters
  • B. lanceolatum → 1 cluster w/15 genes
    o Thought to be last common ancestor of all vertebrates
  • Sea lamprey (fish-like parasite) → 4 clusters before increase in body plan complexity
  • Hagfish (has spinal cord) → 4 clusters
  • Sharks → even more clusters
22
Q

WGD benefits

A
  • Raw material for evolutionary diversification
  • Potential for neofunctionalization, divergence, pseudogene formation, etc for single genes → large amount of substrate for WGD
  • Debate how beneficial it is in short-term → can get genomic shock from too much DNA
  • Extra copies of genes provides some protection against environment and extinction
  • Defence against mutation because have spare copy of every gene
    o Allows to do new things e.g. colonise new environments
  • Fitness consequences:
    o Increased cell size (polyploidy)
    o Increased organ size
    o Faster growth (more metabolic components)
    o Have to evolve dosage regulated gene expression
  • In allopolyploidy get heterosis (hybdrid vigour) → when unrelated sets of genes coming together give healthier, longer-lived, more robust offspring than highly-inbred species → providing larger combinations of wild-type and non-specialised genes
23
Q

Eukaryotic gene structure

A
  • Evolution = Increasing complexity → gene number; protein number; functions
  • Genes are split
  • By Walter Gilbert in 1978
  • Invented terms intron/exon
  • Predicted existence of:
    o Alternative splicing = when RNA inside cell gets matured in different ways and introduce different exons so same gene can make different proteins within same cell
    o Exon shuffling = evolutionary process to increase/decrease number of introns/exons and swapping them around
24
Q

Exon shuffling theory

A
  • Introns/exons often border particular subfunctions within proteins
  • Eukaryotic proteins → ‘mosaic of motifs’
    o Domains 40-100aa = small motif building blocks for stabilization, binding, catalysis, etc
    o Discrete and modular → amenable for evolution
  • Primordial exons correspond to domains:
    o Duplication, permutation, rearrangement when in new genome positions could generate new genes and proteins w/diverse functions
  • Repetition in original gene has different outcomes:
    o Affect (increase?) stability, catalysis and modifies functions
25
Q

Illegitimate non-homologus recombination

A

Illegitimate non-homologus recombination
Can get unequal crossing between repeat sequences
For short motifs can get replication slippage
Microhomology can drive this illegitimate N-HR
E.g. αA-crystallin gene (hamster) transfected into mouse
Heat shock protein → topoisomerase I nicks DNA and ligates non-homologouse ends → end result: shuffled domain with duplicated gene
Over evolutionary time, this process could happen w/v. low probability but non-zero chance; if domain provide selective advantage, it would become fixed in population
Domain shuffling:
Structural domains from different genes joined together
Mechanisms include illegitimate N-HR → rare process but higher rate of shuffling in some organisms suggests retrotransposition

26
Q

LINES

A
  • In exon shuffling:
    1. Gene w/domains I, II and III → downstream have a LINE between 2 exons
    2. When LINE is transcribed might take bit of adjacent exon w/ it
    3. After retro-transposition converted to cDNA jumps into genome disrupting gene
    4. Get chimeric transcript → potentially makes new protein different exons (adding one, replacing one, etc)
    5. Over time LINE is deleted; might undergo retro-transposition elsewhere in genome
    6. In evolution → get new gene
  • LINES also induce ds breaks → can lead to domain shuffling and DNA repair
27
Q

Transposons

A
  • Transposon carries exon w/it
  • Mutator-like transposable elements (MULEs) → longer and can carry more things; contain flanking exons/introns
    o Found in plants (e.g. rice has 3000 MULEs in genome)
  • As MULEs move around the genome they collect sequence → can form internal hybrid genes
    o Exons lack translation initiation/termination signals; not full genes, just coding DNA w/o stop codon
28
Q

Phases

A
  • If shuffle exons in incompatible reading frames → end up adding extra domain in middle of protein → disrupts correct reading frame of all exons downstream
    o Perfect system needs exact multiples of 3 and needs to be in perfect frame to maintain protein shuffling compatibility during evolution
    o Need intron phases/classes to be the same
  • Phase 0 = introns lie between 2 codons; perfect set of codons in exons 1 and 2; 0 phase shift
  • Phase 1 = introns located after first nucleotide
  • Phase 2 = introns located after 2nd nucleotide
    o Have extra 1/2 base in exon so need corresponding number of bases on other exon to restore reading frame
  • Not all intron/exon classes are compatible w/each other → problem for shuffling
  • Not all shuffling event is successful; further evolution required to purify mutated frame shift
29
Q

Splice frame rule

A
  • ‘Following a successful shuffling event a newly acquired exon will be flanked by 2 introns of the same phase, otherwise it will produce a frameshift in the resulting coding sequence’
  • Incompatible splicing → negative selection acts on gene to mutate further
30
Q

Evidence for exon shuffling

A
  • Multicellular structure:
    o Extracellular matrix
    o Cell adhesion
    o Cellular receptors
  • Shuffling essential for multicellular metazoans
  • 6.4% of human genes show evidence for exon shuffling
  • Phase 0 → most common form of exon
  • Excess of symmetric exons in simpler organisms
  • Over evolution, increase in 1-1 exon shuffling since the first animals
    o Could be because extra base Gly necessary for multi-domain linkage
31
Q

Evidence of 1-1 exon shuffling

A
  • Phase 0 introns (most common)→ more ancient → higher frequency in earlier stages of eukaryotic evolution → explains prevalence in non-metazoan lineage
  • 1-1 associated w/emergence of necessary features of multi-cellularity
    o After divergence of metazoan, shuffling began
    o Acquisition of protein domain must overcome structural limitations
    o Small & flexible domains fold independently → must be linked
    o Phase 1 introns v. frequently interrupt glycine codons → common in linker regions
32
Q

Shuffling evidence

A
  1. Exon duplication: protein example α2 Type 1 collagen
    • Highly repetitive sequence
    • Tripeptide → Gly – X (often Pro) –Y (often hydroxyproline)
    • Gly-Pro typical in linker regions → destroys α/β structures
    • Chicken gene has 52 exons → 42 of them have Gly-X-Y repeats
    1. Tissue plasminogen activator (TPA)
      * Found in vertebrate blood → blood clotting
      * 4 exons
      * Upstream exon encodes ‘finger module’ → fibronectin on top of cascade interacts with plasminogen activator → plasminogen and EGF (epidermal growth factor)
      -Each subdomain coded by individual exons enable 1 protein to interact via partial dimerization between modules → easy way to build complicated protein-protein interaction network
      * Blood clotting cascade:
      -Lots of clotting factors
      - Exons spread through entire family → some for function (e.g. proteases); some for interaction between members (e.g. kringle domain)
      - Evolution has duplicated exons, shuffled them together for right compatibility, correct introns between them to get spliced and form different variants → get large/complicated split genes
33
Q

Protein-protein interaction

A
  • Evolution of new protein has domains of other proteins; interacts with itself and other proteins and becomes hub of PPI networks
  • Shuffling promotes self-interaction capacity
    o Many human PPI networks self-interact between components in network
    o Allow formation of dimeric proteins
    o Positive natural selection fixing this
    o Some domain types have v. specific type of interactions or v. promiscuous interactions → increase diversity in network → e.g. polyglutamines found in everything (sticky and unstructured)
34
Q

PPI example: Amyloid precursor protein (APP)

A
  • From Alzheimer’s disease
    • Has different isoforms w/different exons from alternative splicing
    • APP undergoes protease processing:
    • α-secretase cuts immature APP → soluble APP
    • Or β/γ-secretase gives mixture of soluble APP and β-amyloid protein (bad because it forms v. stable b-sheet structure; intermediates are toxic to cell → drives neurodegeneration)
    • Amyloid plaque formation and different proteases due to different alternative splicing and expression of different variants of ‘Kunitz-type protease inhibitor’ (KPI) domain and different secretase forms
    • Processing determined by:
    • KPI domain presence → inhibits α-secretase (binds to trypsin domain)
    • If inherited → more likely to get one type of familial related Alzheimer’s
    • Evidence for KPI domain being gained by shuffling:
    • Flanked by introns
    • Has homology w/other related proteins in genome also embedded in exons