Flashcards in The Human Genome Deck (31):
What are the goals of the human genome project?
Stated goals were achieved ahead of schedule and were:
To identify all genes in human DNA
To determine the sequence of the 3 billion chemical bases pairs that make up human DNA (estimated 15 years)- starting with euchromatin
Develop tools for meta analysis
Post-genome studies that use the info:
Comparative studies of different species' genome
Variation between individuals
All the strategies used to sequence the human genome relied on a shotgun method for sequence assembly. How does this method work?
DNA is fragmented into pieces of 500 bp. Each fragment is sequenced individually and then the overall sequence is reconstructed by overlapping the fragments. Markers are used to align the sequences to specific regions of the genome, The location of markers is determined by classic linkage analysis
What problems arise from the shotgun method?
Works well for small genomes but not for highly repetitive ones or for large genomes
Markers are used on the genome to enable the assembled sequences to be aligned to specific regions of the genome in shotgun approach. How is the location of the markers determined?
By classic linkage analysis in family pedigrees
What were the first markers to be used? What is the limitation with these?
Restriction Fragment Length Polymorphisms (RFLPs)- polymorphic sites in which a single base difference either creates or destroys a restriction enzyme site. Limited as not many are known and because they are not very polymorphic (ie the restriction site is either present or absent)
Microsatellites are repeats of short nucleotide sequences. Why might these make better markers than RFLPs?
They are much more polymorphic than RFLPs and these can be detected by PCR using primers that flank the repeat region
Two competing techniques were used to sequence the human genome- the clone contig approach and the shotgun approach. How was the clone contig approach carried out?
Genomic libraries that contain large inserts were made, and the individual clones of the libraries were assembled using into contigs (overlapping sets of clones that form a contiguous stretch of DNA). It was then possible to arrange the clones into linear arrays by looking for overlaps. Each clone of a contig was then sequenced individually by the shotgun approach.
It is useful to make libraries of all genes/DNA derived from a chromosome of interest. Individual chromsomes can be separated by flow cytometry. Describe this process.
Dividing cells with condensed chromosomes are broken open
The intact chromosomes are are stained with fluorescent dye- each chromosome will bind to a different amount of dye depending on their size
The mixture of chromosomes is dilutedand passed in droplets through fine aperture (each droplet will only contain one chromosome)
A detector measures the flourescence in each droplet
A charge is applied to droplets which have a certain amount of flourescence
These droplets are then passed through deflecting plates and only those with a charge will be deflected so it is therefore possible to get the one you're interested in
Bacterial artificial chromosomes (BACs) and Yeast artificial chromosomes (YACs) are used as vectors in creating genomic libraries. Which one is used more and why?
BAC vectors are much more stable than YACs
The human genome was sequenced once a good map and high density BAC library had been produced.
Which two methods were used?
Does the reference genome contain one or multiple sequences from different people?
Clone contig (public) and whole genome shotgun (private)
16 institutes worked on it (USA, Europe, China and Japan)
>50 ethically diverse volunteers- all male
How did Celera (private company) sequence the human genome? How many volunteers?
21 ethically diverse and mixed gender individuals
What do parallel sequencing, high through put sequencing and next gen. sequencing produce?
Short 'reads' of sequences which are not suitable for assembling genome sequence de novo but can be used to compare to the reference genome
What sort of mutation do many melanomas show?
BRAF mutations- can be detected by whole genome sequencing e.g Next Generation sequencing
Sometimes it is useful to just sequence target areas (e.g exomes or genes involved in therapeutic targets etc.). How was this done before and what are the newer approaches?
Designing PCR primers to flank desired regions then sequencing the PCR products
Newer approaches more high-throughput- called sequence capture technology- e.g using biotinylated oligonucleotide pools (which correspond to regions of interest) in shot gun analysis. Stretavidin beads used to pull down the complex of capture oligos and genomic DNA fragments
What can the modification status of histones allow for?
to give an indication of the underlying DNA and can be used to predict transcription start sites
Chromatin immunoprecipitation (ChIP) is a technique used to study protein-DNA interactions- e.g which DNA sequences are bound by a specific transcription factor or are associated to a modified histone. Describe the process.
Cells treated with formaldehyde to cross-link proteins to the DNA
Chromatin isolated and the DNA sheared to small pieces by sonication
Mixture is incubated with an antibody that recognises a transcription factor/modification of interest
DNA is isolated from the material and the sequences of the DNA are associated with the protein of interest
Start sites for transcription are usually found where?
In CpG islands
Where are RNA splicing sites normally found?
exon/intron boundaries and also a conserved sequence (branch site) within the intron
Where are transcription start sites normally found? Are these found a lot?
In DNA that has a greater than average GC content and contains a cluster of CpG dinucleotide sequences (CpG islands). CpG dinucleotides are found at a frequency below what would be expected by chance. Any area in which there is a greater proportion of them than 0.6 is indicative of an island
Where are CpG islands, Cfp1 binding sites and H3K4me3 found/colocalised?
At gene promoters
Is the human genome well conserved? Do we have many repetitive sequences?
No, not compared with other animals (only 5%) and only around 1% encodes protein
50% of our genome contains repetitive DNA
Have many chromosome rearrangements occurred during mammalian genome evolution?
Yes e.g human chromosome 17 corresponds to mouse chromosome 11 but with extensive rearrangements
What is the relationship between gene density and the complexity of the organism?
Gene density DECREASES with complexity of organism. e.g Human have a lower gene density then E.coli. (This does not mean we have a smaller genome. We have a bigger genome but fewer genes within that)
Can you find genes-within-genes/ overlapping genes? Can you give examples?
Yes- e.g Genes in class 3 region of HLA complex on short arm of chromosome 6
3 small genes within a large intron of the NFI gene- these are transcribed the opposite way to the NFI gene itself
Higher eukaryotes have many interrupted genes (i.e have many exons in them). Why is there a variation between length in individual human genes?
Due to length of introns and number of exons (exons typically similar sizes)
There are gene families and unique genes (those which do not belong to any family). As the genome size increases, which one increases and which descreases?
Family sizes increase over evolution as additional genes are obtained and the proportion of unique genes decrease
The genome of higher eukaryotes are characterised by the presence of what two things?
presence of repetitive DNA and of introns; some repetitive DNA is found WITHIN introns
What can lead to duplication within a genome? What happens when there is selective pressure on both of the duplications? What happens when there is selective pressure on only one?
When a gene is flanked by regions of repetitive DNA, these repeats can misalign- this can take place between homologous chromosome or between sister chromatids, leading to duplication
Selective pressure on both results on both genes staying similar
Pressure on one means the other may acquire a mutation to make it a non-functional pseudogene and sometimes just acquires a new function that may benefit the organism
What are orthologs and paralogs?
Orthologs- genes present in different organisms that evolved from a common ancestral gene speciation
Paralogs- genes present in the same organism that resulted from a duplication event
What can result from domain duplication?
This can be when recombination occurs between repeat sequences found in introns. It could lead to conferring a beneficial function (e.g duplication of a DNA binding site might lead to stronger interaction with DNA) or it may evolve a new function. Recombination between intronic repeats can lead to the evolution of new proteins. Tissue plasminogen activator (TPA) has acquired domains that enable it to bind to fibrin as well as acquiring a domain from the epidermal growth factor gene that enables it to stimulate cell proliferation