Eukaryotic genomes and their evolution Flashcards

(125 cards)

1
Q

How big is the human genome?

A

Genome=Set of genetic material (DNA) present in a cell or organism
Also includes non-coding sections
3 billon base pairs which is 2 metres of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain variability in genomes between humans and bacteria

A

A lot of variability in composition and size of genomes between species
Human is mostly made of non-coding DNA
Bacteria is mostly made of coding DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the C-paradox?

A

Size of genome doesn’t correlate to complexity
Example amoeba has 600 billion base pairs but human only has 3 billion base pairs
Amoeba is less complex but has a larger genome than humans
Gene number is also highly variable
Worm has same number of genes as human but is much simpler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the composition of the human genome

A

1% of human genome consists of exons (coding DNA that makes proteins)
24% is introns
Exons comprise 5% of each gene, so genes (exons + introns) comprise 25% of the genome
Human genome has 20,000 genes
Repetitive DNA (transposable elements) (<50%)
Regulatory elements (introns/other intergenic DNA): switches that activate/deactivate genes
non coding genome consists of 1 million enhancers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is gene duplication and what does it lead to?

A

Gene duplication is how new genes evolve
38% of human genes are derived from gene duplication
Gene duplication leads to gene families (paralogous genes)
Are sister genes that share a common ancestor
Very similar sequence
Found in the same genome
Can be found on different or the same chromosome
May be clustered together or dispersed through the genome with diverse function
Degenerate into pseudogenes: come from same ancestor but have lost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Paralogous vs Orthologous genes

A

Paralogous: 2 sister genes or gene clusters in the same organism, arises from gene duplication, structural similarity, come from common ancestor but have diverged since
Orthologous: same gene found in 2 different genomes with the same function, Example humans and chimpanzees both have a specific gene (with usually the same name)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is gene duplication rewarded by evolution?

A

More protein production
If gene doesn’t work anymore, sister gene can produce a similar protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is synteny?

A

Pieces of genome/chromosomal regions of different species where homologous genes occur in the same order
Come from the same ancestor
Relationships between mouse and human genomes, most functional genes are in a syntenic region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the two different approaches to the human genome project

A

Public (Watson/colins) aproach said they would sequence in 15 years and cost 3 billion dollars
Celera genomics aimed to sequence in 3 years and 300k dollars. Used shotgun sequencing.
HGP was published in 2003
But 8% of genome is still unsequenced due to heterochromatin
Now there is next generation sequencing techniques (Illumina) that sequences quickly and cheap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are genomic elements conserved among species? How can we use bioinformatics?

A

Conservation between species varies depending on what we are looking at: coding genes, enhancers/promoters, transcription factor binding sites
Bioinformatics: Uses sequence alignment tools to study conservation of the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are coding genes conserved between species?

A

Sequence conservation predicts conservation in function. Orthologues are most likely to retain the common ancestral function 80% of human genes are found in mice. So can express the gene in mice to study effect of a specific disease gene. Use mice as model organisms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are regulatory elements conserved between species (transcription factors)?

A

Does not apply to cis-regulatory elements. Conservation of binding preferences and binding sites. But only small amount of transcription factor binding is conserved among species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are enhancers conserved across species?

A

Enhancers with conserved sequences across species are NOT equally functional
Most enhancers are not functional across species
80% of human genes are conserved in mice
But humans and mice have different enhancers that regulate which genes are expressed
So function differently even though genes are the same
This also applies to primates
Humans and chimpanzees are 98% genetically similar but have different enhancers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Compare genome similarity of Humans vs Chimps

A

1% divergence between genes shared (98% same)
6% of genes are not shared between humans and chimps
Large amount of loss and gain of genes since evolutionary split
Human chromosome 2 is a result of the fusion of the chimp chromosomes 2A and 2B
Humans have lost many olfactory genes (humans don’t need to smell as much)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are molecular clocks?

A

LUCA lived 3.8 billion years ago (first form of life)
We know due to molecular clocks
Uses fossils and rate of mutations to deduce when a species diverged
Nucleotide or amino acid sequences are compares among species to date when they last shared a common ancestor
Rate of mutation assumed to be constant
Rate may differ from gene to gene
Genes that are responsible for basic functions mutate more slowly
Mitochondria was formed from symbiosis: was a bacteria that was incorporated into the cell due to it’s essential function
Use mitochondrial genome to measure mutation rate as it has a constant mutation rate
Effects of mutations are neutral
Circular DNA with only a few genes
Inherit mitochondrial DNA from the mother without recombination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are mitochondrial haplogroups?

A

Haplogroup: specific mutations present in mitochondrial DNA
Lived about 200,000 years ago in West Africa
Supports out of Africa hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does a genome acquire new genes?

A

Horizontal gene transfer
Exon shuffling
Duplication and divergence - this is very rare (1% chance for 1 gene in 1 million years)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the 3 different outcomes of gene duplication?

A

Duplication of one gene leads to 2 similar genes
Selective pressure on both genes: genes stay similar (More genes = more proteins)
Selective pressure on just one of the genes: one copy degrades (Accumulates mutations and generates pseudogenes)
Selective pressure on just one of the genes: one copy acquires a new function (Gene is important but can tolerate a new function. sub-functionalization: new copy of gene is slightly different = specialization)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does gene duplication occur during DNA replication/meiosis?

A

Gene duplication can occur during chromosomal recombination (crossing over)
Crossing over occurs during meiosis and leads to new combinations of alleles
Error in chromatid pairing leads to duplication of regions

During DNA replication due to DNA polymerase slippage
DNA replication occurs via DNA polymerase
Ex. 15 CA repeats originally
Polymerase pauses in CA repeat domain
Newly formed strand melts and reanneals incorrectly (slipping)
Mutation is repaired incorrectly = duplication
Ex. now 17 CA repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Neo (sub) functionalization? Give an example

A

After gene duplication, two genes with identical function are unlikely to be maintained in the genome
Each daughter gene adopts a part of the function of the parental gene
Changes occur in expression pattern of two genes
Gains mutations
Leads to genes having similar but not identical functions (specialization)
Genes are expressed at different times and in different cell types
Example: trypsin vs chymotrypsin
Duplicated 1500 million years ago
Proteases
Trypsin: cuts at arginine and lysine
Chymotrypsin: cuts at phenylalanine’s, tryptophan’s, tyrosine’s
Example: transcription factor families (S0X genes)
Many paralogues of S0X with similar functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are pseudogenes?

A

Pseudogenes: gene duplicates and one copy completely degrades
Occurs in the first million years after duplication if the gene is not under selection
Gene duplication generates function redundancy
Not advantageous to keep identical copies of the same gene
Mutations disrupting structure and function and not deleterious
Accumulate until gene becomes non-functional pseudogene
Time frame = 4 million years
Pseudogenes can still be transcribed to mRNA but will not produce a functional protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the non-processed pseudogenes?

A

Tandem duplication of genomic region (from a normal duplication event)
1 copy faces lack of selection
Inactivating mutations or incomplete duplication
Missing regulatory regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are processed pseudogenes?

A

Reverse transcriptase activity (LINE, retrovirus, transposons): parasitic elements with a copy paste mechanism
Gene is transcribed to RNA
RNA is reverse transcribed to cDNA and re-integrated into the genome
Lack of regulatory regions/introns (mRNA source) = non functional
Contain polyA tail/flanking repeats (responsible for transcription termination)
Can integrate into the same or different chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are ribosomal protein pseudogenes in humans and how are they conserved across primates?

A

20,000 human pseudogenes in genome
Many are ribosomal protein pseudogenes
Large family (2000 copies)
Processed pseudogenes
Form specific L1 retrotransposon
Highly transcribed / high expression rate
Highly conserved across primates
2/3rds human RP pseudogenes also in chimpanzee genome
<12 shared with ordents
Implies recent origin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are multigenerational families?
Multigene family: When a duplication is beneficial to form a group of similar genes Genes in family can have slightly different functions so become specialized Example rRNA genes (Mycoplasma genitalium:2, Xenopus laevis > 500) Tandem gene family: members of multigene family are on the same chromosome Dispersed gene family: members of the multigene family are on different chromosomes
26
What are HOX genes and what is their function?
HOX genes are a multigene family They form a homeotic protein Encode for transcription factors that bind DNA and can regulate activation or inactivation of genes during embryonic development Important for development and patterning of limbs / appendages Control pattern of body formation during early embryonic development Control compartmentalization / regionalization of body parts in animals along head to tail (anterior-posterior) axis
27
What is the homeodomain in HOX genes?
Homeodomain / homeobox / HOX Domain: functional unit of a protein 60 amino acid protein, forms a helix-turn-helix, highly conserved protein in animals Has a DNA binding domain Zinc finger domain also important for DNA binding
28
How was the growth of anntennapedia discovered?
Normal antennapedia gene is expressed in second segment of a flies thorax and helps in the development of the second pair of legs Mutation changes the location of the gene and causes legs to frow from the fly's head in place of the antennae Not important how much genes are expressed but rather where they are expressed If HOX TF's are expressed in the wrong location, appendages / limbs grow in the wrong place Homeotic = something has changed to resemble something else
29
Explain the composition and function of HOX genes in insects
Insects have one cluster of HOX genes consisting of 8 genes 8 genes are expressed in a specific region of the body Cluster is divided into 2 clusters / complexes Antennapedia complex: 4 genes responsible for head and first and second thoracic segments Bithorax complex: 4 genes responsible to third thoracic complex, bithorax complex and 8 abdominal segments Homeotic transformations in insects: mutations in insect HOX genes result in one body segment taking on the identity of another
30
Explain human HOX genes
Humans have 4 clusters of HOX genes Each cluster has 13 genes = 52 HOX genes Each cluster is in a different chromosome (4 in total) HOXA, HOXB, HOXC, HOXD (HOXA1, HOXA2 for gene number) Gene duplication and neo functionalization lead to 52 TF in humans = specialization = more complex structure / function in humans than in insects
31
How are HOX genes conserved between species?
Conservation of HOX genes between drosophila and humans Many of human HOX genes were already present in drosophila (ancestral versions) But neo functionalization allows human to be more complex than a fly (8 vs 52 genes)
32
Give some examples of mutation of HOX genes and their impact
HOXD13: patterning of fingers is impaired HOXA2: impacts ear development HOXB1: eye and face development
33
What is the HOX vertebrae common ancestor?
Branchiostoma lanceolatum: Oldest vertebrae known to have 1 cluster of HOX genes, ancestor of humans Marine fish-like chordate (vertebrae) Displays features of last common ancestor 1 cluster (15 genes) of HOX genes: barely has an appendage/mouth
34
What HOX genes does a Sea lamprey display?
Sea lamprey: oldest vertebrae that has 4 clusters of HOX genes like humans Before increase in body plan complexity More HOX genes = more complexity
35
What is genome duplication? Why is it more tolerated than single chromosome duplication?
Larger duplications than genes and segments is possible Genome duplication: duplicating the entire genome (incl. transposons, regulatory elements) One singular chromosome duplication is not tolerated well Example down syndrome trisomy on chromosome 21, Edwards syndrome, trisomy 18, Patau syndrome, trisomy 13) Leads to gene product imbalance and reduced life expectancy Whole genome duplications (WGD) could be a source of speciation Duplicating the entire genome is more tolerated Eukaryotes contain 2 haploid gene sets (diploid) Polyploidy: have multiple complete sets of chromosomes (entire genome is duplicated not only 1 chromosome
36
Where is polyploidy common and what are the 2 types?
Polyploidy is widespread in plants 80% of flowering plant species originated via polyploidy Ex. oats, cotton, potatoes, banana, coffee Polyploidy is common in invertebrates, fish and amphibians but rare in mammals 2 main types of polyploidy Autopolyploidy: happens within the same species. mistake during meiosis makes diploid gametes instead of haploid gametes (4 chromosomes of each instead of 2) Allopolyploidy: occurs between different species, hybrid reproduction
37
What is autopolyploidy and what issues does it produce?
Multiplication of identical species within a single (sub) species Fertilisation by unreduced gametes Error in meiosis accidentally produces diploid gametes 1-40% frequency of formation Very common in plants Can reproduce Successfully but can't breed with parent species (2n + n = 3n) Allows speciation Autopolyploids are more viable than allopolyploids (especially in plants) because each chromosome has a homologous partner and can form a bivalent in meiosis Issues=can induce disease symptoms Genomic shock: widespread activation of transposons, gene expression, recombination Things that are not meant to be repressed / activated
38
What is allopolyploidy?
Hybridization between 2 species reproductively compatible species that are very similar ex. only recently split in evolution One step model (most common route): both / one parent(s) have unreduced gametes (diploid) due to error in meiosis = polyploid offspring (diploid + diploid = tetraploid Two step model: hybridization between haploid gametes followed by somatic doubling (after mating duplication event)
39
What are the benefits of whole genome duplication?
Raw material for evolutionary diversification Functional gene divergence Defence against mutation (If one gene looses it's function, another gene can replace it's function) Buffer against environment (and extinction) Colonise new environments Fitness consequences (Increases cell size, Organ size, Faster growth, Dosage regulated gene expression)
40
Locus
each gene has a locus which is a specific position on a pair of homologous chromosomes
41
Allele
alternative form of a gene. each parent donates one allele for every gene
42
Homozygous
alleles are identical. Same genetic variant in the two alleles in gene locus
43
Heterozygous
alleles are different. Different genetic variants in the two alleles in a gene locus
44
Genotype
combination of two alleles (maternal and paternal) for each gene
45
Dominant alleles
always upper case. Gene that will be expressed if two alleles are different
46
Recessive alleles
always lower case. Masked if two alleles are different
47
Phenotype
Physical manifestation of genotype
48
What is an SNP and how often do they occur?
SNPs: DNA sequence variations that occur when a single nucleotide (A, T, C, G) in the genome sequence is altered Example: AATCGAC --> AAGCGAC For a variation to be considered an SNP, it must occur in at least 1% of the population SNPs make up 90% of all human genetic variation SNPs occur approximately every 1000 bases
49
Why are SNPs important?
Can affect how humans develop diseases Can affect how an individual responds to pathogens Can affect how an individual responds to drugs, etc In biomedical research for comparing regions of the genome between cohorts
50
Where do SNPs occur in the genome?
Intergenic region: a transcription factor or enhancer/regulatory sequence In promotor or transcription factor binding region In exon: affects amino acid sequence = affects protein (example a premature stop codon truncates the protein) In intron: can be a regulatory region example mutation in splice site affects splicing
51
What are the 2 categories of disease associated SNPs?
SNPs may be a direct cause of a disease or signal for increases chance of disease Disease associated SNPs fall into 2 categories: Monogenic: One SNP in one gene. One nucleotide change leads to disease. Easy to detect / analyse Simple traits Polygenic: SNP in multiple genes Multiple nucleotide changes affect chance of disease Hard to detect Complex traits Example familiar vs sporadic Alzheimer's Familiar: 5-10% of cases, SNP in single gene (APP) Sporadic: late onset due to polygenic SNP
52
What are the two types of coding SNPs?
Coding SNPs: disease causing as they affect the aa sequence and the protein 2 Types of coding SNPs Synonymous (silent): Affected codon codes for the same amino acid so mutation is silent Non-synonymous: affected codon codes for a different amino acid - can be detrimental and change protein
53
Explain transition vs transversion with SNPs
Transition: most common substitution. Replacement of purine by another (example A --> G) or pyrimidine by another (example T --> C) Transversion: less common. Replacement of purine by pyrimidine or vice versa (A-->C, A--> T, G-->C). Changes biochemistry and structure of DNA Transition : Transversion ratio varies within genome and used to assess GWAS data quality Across entire genome averages around 2 (in stable genomes) In protein coding regions usually higher, around 3 due to transversions in third base of a codon being more likely to change the encoded amino acid
54
Explain how SNPs in Apolipoprotein E can lead to Alzheimers disease
SNP's are not always the absolute indicators of disease Apolipoprotein E (ApoE) contains 2 SNP's that result in 3 possible alleles for the gene: E2, E3, E4 Protein product of each gene differs by one amino acid Person who inherits at least one E4 allele is more likely of developing Alzheimer's Person who inherits at least one E2 allele is less likely to develop Alzheimer's E3 is neutral / no effect on inducing disease Someone who has inherited two E4 alleles may never develop Alzheimer's Inheriting two E2 alleles may cause Alzheimer's Alzheimer's (and heart disease, diabetes, cancer) is caused by variations in several genes Sporadic Alzheimer's: ApoE gene alone is not responsible for Alzheimer's. Need multiple genes to develop the disease
55
Where are disease associated SNPs in the non-coding genome located? give an example of a disease.
Disease associated SNP's are usually in regulatory DNA sequences: Enhancers, Promotors, AD boundaries, Long non-coding RNAs Approx. more than one million enhancers in human genome 98% of type 2 diabetes associated SNP's are non-coding Most of genome is non-coding: 80% of SNP's are found in non-coding region
56
How can SNPs disrupt splice sites? Give an example
Introns are spliced out by activation of splice sites (found 2-3 nucleotides in front of exon) SNP's can affect splice sites and splicing --> affects exons --> affects protein produced OAS1 Gene is associated with type 1 diabetes Intron 6 AG-AA variant shifts 3' splice site by 1 nucleotide Changed the reading frame of exon 7 resulting in a longer protein At least 10% of all mutations causing human inherited disease disrupt splice site consensus sequences Can cause total loss of associated exon SNP can also introduce a cryptic splice site Auxiliary sequences: stimulate splicing and found in exons (exonic splicing enhancers, ESE) and introns (Intronic splicing enhancers, ISE) SNP's in auxiliary sequences causes impairment of splicing
57
What are insertions and deletions (indels) and what do they cause?
Indels: more likely to change function of protein than an SNP 90% of variation in the genome is due to SNP's, the rest is Indels Can cause: disrupted start codon, disrupted stop codon, disrupted splice site, frame shift Frameshift mutation (base is inserted or deleted): alters the codon and changes reading frame so that all downstream codons are out of frame Can either cause a protein that is too long or too short due to stop codon
58
What is a GWAS and what is it used for?
Genome Wide Association Study Requires sequencing of thousands of genomes 500 people that have disease and 500 people that don't have the disease Scan for SNPs that are higher frequency in people with disease and lower frequency / not present in people without disease GWAS has been able to identify genetic variations that contribute to risk of: Type 2 diabetes, Parkinson's disease, heart disorder, obesity, Crohn's disease, prostate cancer
59
What are the challenges of GWAS?
Accurately identifying the SNP's Coding mutations alter the amino acid sequence of a protein = effect is clear Usually SNP's in GWAS are in non coding regions (since genome is 98% non-coding) so less clear Not all genes are functional 50% of human genes show tissue specific expression GWAS can identify candidate SNPs but confirmation requires additional work GWAS is a starting point
60
How are GWAS results visualised?
Use a Manhattan plot to visualise GWAS findings -log10 P-value (frequency of SNP) vs chromosome number The more significant the P-value, the higher the change the disease is associated with the SNP
61
What was the 100,000 genomes project?
Genomics England project in collaboration with NHS Aims to sequence the genomes from 70,000 people Participants have a rare disease or cancer Genomes of families are also sequenced Identified variants associated with the disease Patients can be offered a diagnosis when this wasn't possible before Problem with project: diversity missing, majority white Caucasian
62
How does linkage disequilibrium affect GWAS?
Linkage disequilibrium: association of alleles at two or more loci within a population During crossing over some parts of the chromosome always travel together (bias) SNP's close to each other or on same chromosome are in LD Haplotypes don't occur at the expected frequencies - are random If 5 SNP's are associated more frequently than normal and one is the cause of the disease - don't know which one is causing the disease Would need to test every single SNP
63
What is expression Quantitative Trait Loci (eQTL)?
Most SNP's are non coding (98% DNA is non-coding) so found in regulatory elements Combines GWAS (identifies SNP's in non coding regions) and RNA-seq (next generation sequencing that measures amount of mRNA production by gene / measures gene expression) Identifies SNPs in non-coding regions that are responsible for changes in gene expression SNP's will most likely be enhancers or promotors that regulate gene expression Disease associated SNP which is an eQTL can be responsible for disease eQTL mapping allows regulated genes to be identified as they are unlikely to be closest to a disease associated SNP Cis eQTL: gene affected by SNP is found on same chromosome Trans eQTL: gene affected by SNP is on a different chromosome
64
What is SNP genotyping?
SNP genotyping: uses microarrays (chips) to identify presence of SNPs in an individual Contains all possible SNP combinations for the gene (probes) Specific SNP = specific study For Affy SNP array: probe contains ATTCATG On the array will be another probe for the alternative SNP: ATTTATG
65
What factors/elements are needed for gene regulation?
Non-coding regions: enhancers and promotors. They are in cis (on the same chromosome as the gene) Enhancer region: distal to the gene, 10-100kD's (1Kd = 1000 nucleotides), activators/repressors, act as switches Promotor: proximal to the gene Genome forms a loop: mediated by DNA-bending proteins and transcription factors DNA binding proteins (transcription factors): bind enhancer and promotor DNA and activates them (enhancer = switch, TF = flips the switch) Mediator proteins: recruit DNA polymerase II and start transcription
66
Why do eukaryotic organisms have multiple different cell types?
Eukaryotic organisms have multiple different cell types Different shapes, different functions, populate different areas of the body All cells have the same genome but not all genes are expressed in every cell due to gene regulation
67
What are transcription factors? Give an example. When specifically are they important?
Example: HOX genes DNA binding proteins with a DNA binding domain (zinc-finger, homeodomain). Bind specific DNA sequences (motifs), high sequence specificity Non coding SNP's are detrimental because one change in the enhancer site can affect TF binding Can activate or repress gene expression by modulating promotor and enhancer activity Some TF's are only repressors or only activators but some can do both Determines if switches (enhancers/promotors) are ON or OFF Very important during development: during cell differentiation because need to activate or repress genes at a specific time
68
What is the nucleosome?
DNA is condensed in chromatin Nucleosome: 146 bp of DNA is wrapped around 8 histone proteins Histone octamer: 2 copies of each histone (2X H2A, H2B, H3, H4) Interactions between DNA and histones is sequence independent: hydrogen bonding + ionic interactions with sugar phosphate backbone Chromatin is made of repeating units of nucleosomes Nucleosomes are disassembled during replication
69
What is chromatin?
Chromatin is made of repeating units of nucleosomes There are 2 types of chromatin Euchromatin: active form, uncondensed, TF can bind Heterochromatin: silent form, condensed/compact, TF can't bind
70
What are pioneer transcription factors?
Not all TF's bind DNA in the same way Pioneer factor mechanism Can bind condensed heterochromatin by recognizing a motif (usually TF can't bind heterochromatin) and recruits other TF's ATP hydrolysis by BAF, NuRF, ISWI makes DNA accessible and other TF can now bind Pioneer factors have a dual role Passive role: permanently bound which speeds inductive responses Active role: pioneer factor only binds when needed
71
What are topologically associated domains (TADs)?
Enhancers are cell type specific and can't interact with all promotors TADs determine which enhancers can interact with which promotors Are fundamental units of three dimension (3D) nuclear organisation Regions bordering TADs are called TAD boundaries Enhancers and promotors can only interact within TAD boundaries Look at image in notes Mutation disrupting TAD boundary = enhancers can activate genes (PAX3) that are not meant to be activated
72
How do cohesin and CTCF define TADs?
Loop extrusion model Transcription factor (CTCF) binds CTCF binding site Cohesin ring pulls DNA out of the ring until 2 CTCF molecules meet and TAD boundary is closed Loop extrusion model determines the formation of the TAD TADs can be very long (880kb in mice) and have similar sizes in non-mammalian species Enclosure of genes and their respective enhancers and promotors: used to get them closer to optimize usage of space
73
Explain phosphorylation of the CTD in RNA polymerase II
RNA polymerase II: transcription to synthesize mRNA Pre initiation complex recruits RNA pol II --> paused RNA pol II --> elongation --> termination C-terminal domain of RNA pol II is important in transcription Long domain with many repeats and highly conserved Phosphorylation of repeats in CTD is important in transcription Stage dependent phosphorylation: different phosphorylation label different stages of transcription Phosphorylation is species specific but the repeats are conserved
74
Explain promotor proximal pausing of RNA pol II
Widespread genome wide (not only early response genes) Regulated by many proteins ex. NELF and DSIF that block RNA pol II Important to quickly transition to productive elongation Keep genes poised/prepared to be activated when needed Will only need a few phosphorylation to activate genes quickly at a specific time Important in development and heat shock genes Transcription is stopped after 50bp Ser5 of CTD is phosphorylated
75
What is epigenetics?
Epigenetics: external modifications of DNA that don't affect the amino acid sequence to regulate gene expression Example DNA and histone methylation Epigenetic modifications can be inherited Are reversible and self-perpetuating
76
What is the function of histone H1?
Histone H1: additional histone that keeps the nucleosome together
77
How are histones covalently modified?
Methyl / acetyl group can be added to a histone Histone methylation is the most common epigenetic modification This is NOT the same as DNA methylation HISTONE methylation: can either repress or activate gene expression depending on which groups are methylated
78
How are heterochromatin and euchromatin distinguished in terms of their histone modifications?
Histones present different modifications on heterochromatin and euchromatin Act as a flag to label the different states of chromatin Euchromatin has lysine acetylation and arginine methylation Heterochromatin has lysine methylation, lysine ubiquitination
79
What is constitutive heterochromatin?
Methylation of lysine 9 of histone 3 (H3K9me2/me3) 3 methyl groups on lysine 9 DNA is always kept as heterochromatin (inactive) Found in regions that you never want to activate Example transposable elements (parasitic elements)
80
What is facultative heterochromatin?
Methylation of lysine 27 of histone 3 (H3K27me2/me3) 3 methyl groups (trimethylation) Temporary heterochromatin Genes that don't need to be expressed in the moment but need to be activated later on
81
What is the H3K27 methylation and what protein deposits it?
H3K27 methylation is deposited by PRC2 PRC2 is a 3 protein complex and found in regions that are methylated temporarily (to inactivate gene expression) EZH2, EED, SUZ12 proteins are always be present EZH2 is the most important catalytic subunit (enzyme) Other subunits may also be involved in the complex
82
Explain how histone modifications affect enhancers?
Different types of enhancers (active/repressed/poised) have different types of histone modifications Active enhancers have different modifications than active promotors ChIP-sequence = used to map active/repressed/poised enhancers and promotors H3K27 acetylation are common in active enhancers If this is removed, enhancers still function and genes are still activated Don't know why: maybe because necessary to keep enhancers active Histone modifications are conserved between species
83
How are epigenetics related to cancer?
Epigenetics plays a role in the development of cancers Epigenetic change that silences a tumour suppressor gene (gene that controls growth of cell) can lead to uncontrolled cell growth Change that turns off genes that repair damaged DNA = increase in DNA damage = increases cancer risk
84
What is X-inactivation?
Females have 2 X chromosomes so either the maternal or paternal X chromosome is randomly inactivated/silenced Occurs in embryonic development around gastrulation in mammals Occurs after initial cell division so a different X can be inactivated in individuals cells/tissue Once it has occurred in a cell, all it's descendants will maintain the same inactivation X chromosome (Barr body) is silenced by histone modifications --> DNA condenses as heterochromatin --> transcriptionally inactive Inactivation carried out by Xist gene - a long non-coding RNA that recruits H3K27me3 to form facultative heterochromatin
85
Explain how X chromosome inactivation leads to tortoiseshell cats
Tortoiseshell cats have a unique pattern of coat colour due to XCI All tortoiseshell cats are female Black and orange alleles of fur colour gene are on the X chromosome If cat is heterozygous, fur colour is dependant on what X chromosome is inactivated (random) If you clone a tortoiseshell cat you won't get the same copy
86
What are agouti mice?
Agouti gene is associated with bodyweight and fur colour Two mice, mother is light + fat and offspring is dark + skinny are genetically identical The difference is the mother was provided with a methyl-rich diet 2 weeks before mating When mouse agouti gene is unmethylated: yellow coat, obese, prone to diabetes When agouti gene is methylated: brown coat colour and low disease risk Methylation leads to repression of the agouti promotor = gene not expressed
87
Explain how epigenetic changes affects twins
Genes in identical twins are identical so differences are due to epigenetic changes Environment and diet can affect epigenetics The older the twins, the more epigenetic changes, the more different Can label histone modifications on fluorescent probes Chromosome pairs in each set of twins are superimposed One twin's epigenetic tags are dyed red, the other's green When red and green overlap, the region shows up as yellow (same epigenetic changes)
88
Explain how epigenetic changes were inherited during the dutch famine
Epigenetic changes can be inherited During the Dutch famine, diet was poor in methyl groups People who were then conceived had less methyl groups on insulin-like growth factor II (IGF2) Long term effect: children suffered from obesity and cardiovascular diseases F2 generation had higher weights and BMI in offspring of exposed F1 fathers than in unexposed F1
89
Explain the erasing of methylation during fertilisation
DNA of sperm is highly methylated and eggs also methylated but less Once egg is fertilized, most of methylation is erased, especially from paternal genome Methylations are converted to hydroxymethylation which is diluted out as cells divide As embryo develops, methylation is lost further from maternal genome up to blastocyst stage Not all methylation is erased so inheritance is possible After this stage, cells differentiate and DNA becomes methylated again to produce specialized cells
90
How is methylation inherited?
Methylation patterns are usually erased in primordial germ cells Methylations are converted to hydroxymethylation which is diluted out as cells divide Some residual DNA and Histone methylations persist in the fertilized egg that signal how to remethylate once cell division starts
91
What is imprinting?
Inherit 2 working copies of a gene, one maternal and one paternal Imprinted genes: only inherit one working copy that is either maternal or paternal The other copy is silenced by epigenetics (permanently methylated) Epigenetics tags on imprinted genes stay for the life of the organism 80 imprinted genes in humans and mice (out of 22,000) Imprinted gene is at an increased risk of disease because only one working copy is present
92
What is the genetic conflict hypothesis?
Only a hypothesis Males can father multiple offspring with multiple partners at the same time with low cost of personal resources Females can only produce one set of offspring at a time and is very resource costly Costs are greater for the mother than the father (mother has to carry the baby) Father want's offspring to be big to ensure survival Mother needs needs to balance big offspring with costs to herself so wants to limit size Imprinted genes are involved in growth and metabolism
93
How does epigenetic regulate processes in plants?
Flowering and colour is regulated by epigenetics in plants Flowering is controlled by genes affected by environmental conditions (temperature, humidity, light) that change epigenetics Ensures production of flowers even when plants are growing under adverse conditions
94
What is DNA methylation and where does it occur?
Histone and DNA methylation are different DNA methylation is an epigenetic modification It is reversible Methylation of position 5 of cytosine by methyltransferases In mammals: occurs at CpG sites called CpG islands (in GC rich regions) Methylation of CpG islands represses gene expression: forms compacted chromatin which prevents binding of TF's and represses transcription learn to draw image in notes
95
How can DNA methylation be used to identify promotor regions?
Promotors are rich in GC's so CpG islands can be used to identify promotor regions
96
What are the 3 types of methyltransferases in DNA methylation?
3 types of methyltransferases in mammals: DNMT3a and DNMT3b in normal conditions DNMT1 only works during mitosis - Hemi methylated DNA is created, where the copied strand is unmethylated. It reproduces methylation of the other strand
97
How is loss of DNA methylation related to disease?
DNA methylation patterns in disease tissues are different to those in normal tissues Permanent loss of methylation in: cancer, Alzheimer's, neurodegenerative diseases Loss of methylation = activating promotors = activating genes that are meant to be repressed ex. oncogenes Abnormal methylation silences tumour suppressor genes
98
In which genomic regions does DNA methylation occur?
Promotor regions - loss of methylation leads to disease Transposable elements are methylated as they need to be repressed via histone and DNA methylation Intergenic regions - usually methylated ex. enhancers Repetitive elements - usually methylated Gene upstream regions - usually unmethylated
99
What is the effect of DNA methylation at splice sites
Methylation in a splice site deactivates it (silencing) Genes have different isoforms
100
What is the effect of DNA methylation at a promotor region
Methylation of promotor deactivates it (silencing) and prevents transcription Genes have more than one promotor to be able to produce proteins of different lengths with different functions Main promotor is unmethylated and other promotors are methylated Aberrant promotor = created by the insertion of repetitive elements and want to repress it If both promotors bind RNA pol II they could collide
101
What is the effect of DNA methylation at repetitive elements (transposons)
Transposable elements are highly mutagenic so methylation protects genome from them Methylation prevents recombination and translocation Methylated C mutates to T over time so prevents transposition
102
How is DNA methylation linked to cancer? Transposable elements
Genomic instability: unmethylated transposable elements can move in the genome and disrupt existing genes Disruption/deactivation of a tumour suppressor gene (ex. P53) leads to uncontrolled cell growth/cancer Can also lead to epithelial mesenchymal transition: immobile cell converts to mobile cell which causes metastasis Transposable elements can land inside a tumour suppressor gene OR in it's promotor or enhancer Hypomethylation at intergenic regions, repeats, transposable elements causes genomic instability and is found in all cancers
103
How can DNA methylation be identified using MeDIP-Seq?
Methylated DNA immunoprecipitation DNA of interest (disease and normal) is fragmented by sonication (waves) and denatured Separate non-methylated from methylated DNA by using an antibody that binds to the methylated (5-methylcytosine) Antibody in solution is bound to magnetic breads to use a magnet to isolate methylated DNA Separate by immunoprecipitation Isolated fraction (methylated) are sequenced by next generation sequencing Sequences mapped back onto reference genome by alignment software Example: identifying breast cancer
104
How can DNA methylation be identified using Bisulphite sequencing?
Sample is treated with bisulphate Converts cytosine to uracil but 5-methylcytosine is unaffected (methylated can't be converted) After transcription, unmethylated cytosine will appear as uracil and methylated as cytosine Treated samples are sequenced and compared/mapped to genome to determine methylation in cancer vs normal cells Higher cost, greater resolution
105
What are long non coding RNA sequences and what is their function?
Non-coding RNA is a sequence that does NOT have a start codon, a reading frame or a stop codon. Are polyadenylated and transcribed, but don't code for a protein ncRNA longer than 200 nucleotides are IncRNA Can help regulate gene expression, target different aspects of gene transcription mechanism Co-regulators to modify transcription factor activity Can help stabilize a TAD's
106
What is Xist?
Xist is a long non-coding RNA, 17Kb long (17,000 nucleotides) Xist is randomly expressed on one of the two X chromosomes Xist works in Cis (functions on same chromosome on which it is expressed) Xist RNA coats the inactive X chromosome Expression of Xist is the first detectable event in X inactivation Xist contains 6 repeats (RepA, RepB, …) RepA is required for the silencing function of Xist Rep A binds Xist to histone methyltransferase complex Polycomb Repressive Complex 2 Protein complex with many subunits Deposits histone methylation (H3K27me3) along the chromosome
107
What is HOTAIR?
HOTAIR: Regulator of HOX transcription factors Works in Trans: expressed from HOXC locus on chromosome 12 but represses HOXD locus on chromosome 2 Binds to PRC2 and LSD1 (H3K4me3 demethylase) so acts as a scaffold PRC2 adds repressive H3K27me LS1 removes active H3K4me Usually methylation is repressive expect lysine 4 which is an active methylation Combined function produces repressive chromatin structure to repress HOX genes when they are not needed HOX genes need to be expressed at specific time points in specific regions in the embryo In cancer: HOTAIR acts on regions other than HOXD (represses regions that are not meant to be repressed)
108
How are long non coding RNAs correlated with disease?
Abnormal activity of lnc-RNAs is often associated with disease mechanisms Abberantly active lnc-RNA can mis-regulate genomic loci in-trans and in-cis Example: abnormal neuronal death in Alzheimer's Abberant activity of lnc-RNA MEG3 triggers necroptosis which induces neuron death MLAT-1 and NEAT-1 in ALS and Huntington disease. In LAS affect neuronal structure and function of RNA binding proteins MALAT-1 and HOTAIR in Parkinson's FMR4, 5, 6 in fragile X syndrome
109
What are transposable elements and how much of the genome do they compose?
Transposable elements and other repeats account for 50% of human genome Parasitic elements that want to propagate Move from one location in the genome to another
110
What are the sources of transposons?
Retroviruses inserted back into the host genome Old sequences that integrate back into the genome Degenerated ribosomal RNA
111
What are class 1 retrotransposons?
Copy and paste mechanism to propagate Transcribe DNA sequence using RNA polymerase from host Use retro transcriptase to convert RNA back to DNA Retro transcriptase is produced by the DNA of the TE Use transposase to cut genome at a different location and insert DNA transposable element
112
What are the 4 types of retrotransposons?
Autonomous retrotransposons: LINES (long interspaced nuclear elements) and ERVS (endogenous retroviruses) Autonomous because they have many ORF's to encode for the enzymes they need Non-autonomous retrotransposons: SINES (short interspaced nuclear elements) also known as Alu elements and SVA's don't have the enzymes Only functions when there is an active LINE that is producing the retrotransposons
112
What are class 2 DNA transposons?
Cut and past mechanism Excision of DNA element out of genome Use transposase enzyme that cuts DNA out of region and inserts it in a different region Integration into target DNA Don't multiply but can still move in the genome Autonomous
113
Explain the activity of transposons in the human genome over time and how many transposons are still active
Genome represses TE's but sometimes escape methylation = dangerous So most TE's lose their ability to transpose over time By accumulation of mutations in the ORFs that encode for the proteins that regulate the transposition Currently only a few LINES, Alu's and SVA's are a still active in the human genome Not a single ERV or DNA transposon that are still active 20% of genome is LINES
114
How are transposable elements silenced?
TE's are primarily parasites and must be silenced By DNA methylation or Histone methylation (primarily H3K9me3) Different types of small/short non-coding RNAs (especially pi-RNAs) piRNAs target transposable element for degradation KRAB-ZINC-FINGER proteins - TF's that bind transposons and recruit methyltransferases
115
How are timing and context of transposition important?
Transposon in somatic cell ex. skin cell --> somatic polymorphism --> not inherited Can lead to cancer ex. if it lands in a tumour suppressor gene Transposon in germline cell ex. sperm --> germline polymorphism --> inherited More detrimental as it is inherited by every cell
116
How is mutualism present in transposable elements and what are the 4 mechanisms?
Some TE's present interesting features which make them functionally useful for the host genomes so are not repressed / methylated Transposons are domesticated / recruited / co-opted Mutualistic relationship 4 mechanisms of co-option to keep transposons unrepressed 1. TE-derived promotors and enhancers 2. TEs act as TAD boundaries 3. TE-derived lncRNA 4. Transposase transcription factor fusions
117
Explain TE-derived promotors and enhancers by co-option/mutualism
Transposons can be co-opted as functional promotors / enhancers Transposon moves into enhancer or promotor by chance and contains a sequence recognized by TF which boosts expression of the gene Many examples of TEs co-opted as functional promotors and enhancers example SVA's
118
Explain SVAs as an example of co-option of TE as promotors/enhancers
SVA's (transposons) are important in the development of human brain Most SVAs are repressed except in the hippocampus Hippocampus is the part of the brain responsible for memory, communication, learning, navigation AND neurogenesis (makes new neurons) Human hippocampus is bigger than expected Differentiation: Stem cell --> neuronal progenitors (INPs) --> neuron Human hippocampus is bigger because INPs proliferate more than in other species INPs expresses TBR2 transcription factor Human specific SVAs have TBR2 binding sites so co-opted as TBR2-modulated enhancers in human INPs
119
What are SVAs?
SVA's make up 0.3% in human genome Youngest group of transposons so can still move in genome because have not accumulated enough mutations Hominid specific (found in ancestors of the great apes) Made of 2 SINE elements with a variable number of terminal repeats Divided into subgroups (A, B, C, D, E, F) 3000 SVAs in humans: E and F subgroups are human specific
120
How can transposable elements act as TAD boundaries?
Most CTCF binding sites come from SINEs and ERVs
121
How are transposable elements derived from lncRNA?
Example LINE produces lncRNA important for cortical development Developmentally timed expression of TE-derived lnc-RNAs
122
How do transposable elements lead to transcription factor fusions
Transcription factors need a DNA binding domain Transposases have DNA binding domains (as they need to bind, cut, insert DNA in new location) Transposase domain of transposon lands next to ancestral gene = fusion to get a 2-domain gene with DNA binding abilities Example: PAX family of TF's evolved from fusion. PAX6 produces eyes
123
Explain the tail loss evolution in humans and apes
Tail-loss evolution in humans and apes Evolutionary advantage of not having a tail in humans All primates have Alu element in TBXT gene Only apes (no tail) have another Alu element (random). During splicing, two Alu elements dimerize so exon 6 gets trapped in a loop so mature RNA does not contain exon 6 When remove exon 6 from a mouse, mouse will have no tail Completely random insertion/coincidence of Alu element caused humans to have no tail
124
What are some examples of TE derived traits?
Amylase in saliva in hominoids - insertion of ERV Prolactin in endometrium (MER39) Corticotropin releasing hormone and platencin in the placenta (THE1B)