Eukaryotic genomes and their evolution Flashcards
(125 cards)
How big is the human genome?
Genome=Set of genetic material (DNA) present in a cell or organism
Also includes non-coding sections
3 billon base pairs which is 2 metres of DNA
Explain variability in genomes between humans and bacteria
A lot of variability in composition and size of genomes between species
Human is mostly made of non-coding DNA
Bacteria is mostly made of coding DNA
What is the C-paradox?
Size of genome doesn’t correlate to complexity
Example amoeba has 600 billion base pairs but human only has 3 billion base pairs
Amoeba is less complex but has a larger genome than humans
Gene number is also highly variable
Worm has same number of genes as human but is much simpler
What is the composition of the human genome
1% of human genome consists of exons (coding DNA that makes proteins)
24% is introns
Exons comprise 5% of each gene, so genes (exons + introns) comprise 25% of the genome
Human genome has 20,000 genes
Repetitive DNA (transposable elements) (<50%)
Regulatory elements (introns/other intergenic DNA): switches that activate/deactivate genes
non coding genome consists of 1 million enhancers
What is gene duplication and what does it lead to?
Gene duplication is how new genes evolve
38% of human genes are derived from gene duplication
Gene duplication leads to gene families (paralogous genes)
Are sister genes that share a common ancestor
Very similar sequence
Found in the same genome
Can be found on different or the same chromosome
May be clustered together or dispersed through the genome with diverse function
Degenerate into pseudogenes: come from same ancestor but have lost function
Paralogous vs Orthologous genes
Paralogous: 2 sister genes or gene clusters in the same organism, arises from gene duplication, structural similarity, come from common ancestor but have diverged since
Orthologous: same gene found in 2 different genomes with the same function, Example humans and chimpanzees both have a specific gene (with usually the same name)
Why is gene duplication rewarded by evolution?
More protein production
If gene doesn’t work anymore, sister gene can produce a similar protein
What is synteny?
Pieces of genome/chromosomal regions of different species where homologous genes occur in the same order
Come from the same ancestor
Relationships between mouse and human genomes, most functional genes are in a syntenic region
Explain the two different approaches to the human genome project
Public (Watson/colins) aproach said they would sequence in 15 years and cost 3 billion dollars
Celera genomics aimed to sequence in 3 years and 300k dollars. Used shotgun sequencing.
HGP was published in 2003
But 8% of genome is still unsequenced due to heterochromatin
Now there is next generation sequencing techniques (Illumina) that sequences quickly and cheap
How are genomic elements conserved among species? How can we use bioinformatics?
Conservation between species varies depending on what we are looking at: coding genes, enhancers/promoters, transcription factor binding sites
Bioinformatics: Uses sequence alignment tools to study conservation of the genome
How are coding genes conserved between species?
Sequence conservation predicts conservation in function. Orthologues are most likely to retain the common ancestral function 80% of human genes are found in mice. So can express the gene in mice to study effect of a specific disease gene. Use mice as model organisms
How are regulatory elements conserved between species (transcription factors)?
Does not apply to cis-regulatory elements. Conservation of binding preferences and binding sites. But only small amount of transcription factor binding is conserved among species.
How are enhancers conserved across species?
Enhancers with conserved sequences across species are NOT equally functional
Most enhancers are not functional across species
80% of human genes are conserved in mice
But humans and mice have different enhancers that regulate which genes are expressed
So function differently even though genes are the same
This also applies to primates
Humans and chimpanzees are 98% genetically similar but have different enhancers
Compare genome similarity of Humans vs Chimps
1% divergence between genes shared (98% same)
6% of genes are not shared between humans and chimps
Large amount of loss and gain of genes since evolutionary split
Human chromosome 2 is a result of the fusion of the chimp chromosomes 2A and 2B
Humans have lost many olfactory genes (humans don’t need to smell as much)
What are molecular clocks?
LUCA lived 3.8 billion years ago (first form of life)
We know due to molecular clocks
Uses fossils and rate of mutations to deduce when a species diverged
Nucleotide or amino acid sequences are compares among species to date when they last shared a common ancestor
Rate of mutation assumed to be constant
Rate may differ from gene to gene
Genes that are responsible for basic functions mutate more slowly
Mitochondria was formed from symbiosis: was a bacteria that was incorporated into the cell due to it’s essential function
Use mitochondrial genome to measure mutation rate as it has a constant mutation rate
Effects of mutations are neutral
Circular DNA with only a few genes
Inherit mitochondrial DNA from the mother without recombination
What are mitochondrial haplogroups?
Haplogroup: specific mutations present in mitochondrial DNA
Lived about 200,000 years ago in West Africa
Supports out of Africa hypothesis
How does a genome acquire new genes?
Horizontal gene transfer
Exon shuffling
Duplication and divergence - this is very rare (1% chance for 1 gene in 1 million years)
What are the 3 different outcomes of gene duplication?
Duplication of one gene leads to 2 similar genes
Selective pressure on both genes: genes stay similar (More genes = more proteins)
Selective pressure on just one of the genes: one copy degrades (Accumulates mutations and generates pseudogenes)
Selective pressure on just one of the genes: one copy acquires a new function (Gene is important but can tolerate a new function. sub-functionalization: new copy of gene is slightly different = specialization)
How does gene duplication occur during DNA replication/meiosis?
Gene duplication can occur during chromosomal recombination (crossing over)
Crossing over occurs during meiosis and leads to new combinations of alleles
Error in chromatid pairing leads to duplication of regions
During DNA replication due to DNA polymerase slippage
DNA replication occurs via DNA polymerase
Ex. 15 CA repeats originally
Polymerase pauses in CA repeat domain
Newly formed strand melts and reanneals incorrectly (slipping)
Mutation is repaired incorrectly = duplication
Ex. now 17 CA repeats
What is Neo (sub) functionalization? Give an example
After gene duplication, two genes with identical function are unlikely to be maintained in the genome
Each daughter gene adopts a part of the function of the parental gene
Changes occur in expression pattern of two genes
Gains mutations
Leads to genes having similar but not identical functions (specialization)
Genes are expressed at different times and in different cell types
Example: trypsin vs chymotrypsin
Duplicated 1500 million years ago
Proteases
Trypsin: cuts at arginine and lysine
Chymotrypsin: cuts at phenylalanine’s, tryptophan’s, tyrosine’s
Example: transcription factor families (S0X genes)
Many paralogues of S0X with similar functions
What are pseudogenes?
Pseudogenes: gene duplicates and one copy completely degrades
Occurs in the first million years after duplication if the gene is not under selection
Gene duplication generates function redundancy
Not advantageous to keep identical copies of the same gene
Mutations disrupting structure and function and not deleterious
Accumulate until gene becomes non-functional pseudogene
Time frame = 4 million years
Pseudogenes can still be transcribed to mRNA but will not produce a functional protein
What are the non-processed pseudogenes?
Tandem duplication of genomic region (from a normal duplication event)
1 copy faces lack of selection
Inactivating mutations or incomplete duplication
Missing regulatory regions
What are processed pseudogenes?
Reverse transcriptase activity (LINE, retrovirus, transposons): parasitic elements with a copy paste mechanism
Gene is transcribed to RNA
RNA is reverse transcribed to cDNA and re-integrated into the genome
Lack of regulatory regions/introns (mRNA source) = non functional
Contain polyA tail/flanking repeats (responsible for transcription termination)
Can integrate into the same or different chromosome
What are ribosomal protein pseudogenes in humans and how are they conserved across primates?
20,000 human pseudogenes in genome
Many are ribosomal protein pseudogenes
Large family (2000 copies)
Processed pseudogenes
Form specific L1 retrotransposon
Highly transcribed / high expression rate
Highly conserved across primates
2/3rds human RP pseudogenes also in chimpanzee genome
<12 shared with ordents
Implies recent origin