Genomic Sequencing Flashcards
Why Sequence genomes?
- To understand genetic variation with respect to phenotypic variation
- Inheritance
- Comparative genomics (ancestry/evolution)
- Forensics
- Understand genetics of extinct species
- Gives insight into normal functions of genes
- Pharmacogenomics: Tailored drug treatments for specific genome
To sequence the human genome:
Whole Genome Shotgun Approach
-Mass cloning of fragments into cloning vectors.
Whole Genome Shotgun Approach Step 1
Extract DNA from cells
Whole Genome Shotgun Approach Step 2
Cut DNA into small, overlapping fragments with restriction enzymes
- Rxn performed in suboptimal conditions, which don’t let enzymes to cut all sites
- This is why fragments overlap
- Fragments are called “contigs” for continuous sequence
Whole Genome Shotgun Approach Step 3
Clone contigs into a cloning vector to make a genomic library.
Whole Genome Shotgun Approach Step 4
Sequence each clone using Sanger Sequencing technique
Whole Genome Shotgun Approach Step 5
Use computers to reassemble sequences of the contigs by puzzling together the overlapping sequences
Whole Genome Shotgun Approach Step 6
Deposit sequence information into NCBI GenBank Database
-Public can use this because it’s paid for by tax dollars.
- AKA “Dideoxysequencing or Chain Terminating Sequencing”
- Based on DNA replication/PCR of a DNA template (what you want to sequence)
- Can be circular or linear
- Polymerase adds nucleotides starting from a primer based on complementary sequences
Sanger method of sequencing
If you don’t know the sequence, how can you design a primer?
Use a universal primer.
1) Can’t design a primer against an unknown sequence
2) Can have a universal primer that can be used for all clones.
Deoxynucleotide vs. Dideoxynucleotide
- Deoxynuc. has OH group on 3’ C, can have phosphodiester bond
- Dideoxynuc. has H on 3’ C, cannot make phosphodiester bond
- Incorporation of ddNTP causes synthesis of that new strand to stop
What’s happening in the PCR tube?
There are:
- polymerase, plasmid, primer, dNTPs, ddNTPs
- fluorescent molecules tag end of sequences
After reaction is complete
Array of products with fluorescent molecules attached are separated by size, using a process called capillary gel electrophoresis
Gel-filled capillary
- when charge is applied, larger products congregate at top and smaller products congregate at bottom
- Smaller products come off from bottom which is when fluorescent molecules are detected.
Capillary Gel Electrophoresis
Reading a capillary gel electrophoresis
- different colored peaks represent a different base
- read the sequence by the order of the colored peaks
- can be some overlap
- read left to right
Final Step: reassembling the sequence
Repeat Sanger sequencing for each clone in the library and then reassemble the contigs using overlapping sequnces.
Things we have learned:
-The sequences of “simpler” organisms like yeast, bacteria, flies, and mice
-3.2 billion basepairs
-About 20,000 protein coding genes
-About 5,000 genes do not code for protein
code for: microRNA, exRNA, tRNA, rRNA, etc…
- Introns are large (can be >100kb)
- Genome is only 2% genes (but 98% isn’t junk!)
- Average gene is 3,000bp (largest is dystrophin=2.4million bp)
- Genes are clustered together on chromosomes
- People have 99.9% of their sequence in common.
What we haven’t learned:
- Long stretches of repeated DNA sequences that were hard to reassemble
- genes vs. pseudogenes vs. dubious ORFs
-What a gene product actually does
Can find out by:
-compare to a known gene product
-mutate gene product and study it
Looks like a gene but doesn’t make a gene product.
Dubious ORFs
Mutated so much that it can no longer make anything.
Pseudogene
How do we find protein coding genes (versus all the other sequences in the genome?)
- Compare the cDNA library to genomic library
2. Use computer algorithms to look for consensus sequences.
Use computer algorithms to predict Open Reading Frames (ORFs)
-Looks for TATA, Start, Stop, certain percentage of GC (genes tend to have more GC than noncoding regions)
Use of Computers to annotate genes
Identification and description of genes and their important sequences
Goal: assign functions to all of the genes of an organism
-Understand variation w/in and among organisms
-Identify where traits come from
Annotation
Alternative sequencing to Shotgun sequencing
- Next generation sequencing
- Exome sequencing
- Analyze genetic markers throughout the genome (SNPs)