Genomic Sequencing Flashcards Preview

Bio 233 > Genomic Sequencing > Flashcards

Flashcards in Genomic Sequencing Deck (34):

Why Sequence genomes?

-To understand genetic variation with respect to phenotypic variation
-Comparative genomics (ancestry/evolution)
-Understand genetics of extinct species
-Gives insight into normal functions of genes
-Pharmacogenomics: Tailored drug treatments for specific genome


To sequence the human genome:

Whole Genome Shotgun Approach
-Mass cloning of fragments into cloning vectors.


Whole Genome Shotgun Approach Step 1

Extract DNA from cells


Whole Genome Shotgun Approach Step 2

Cut DNA into small, overlapping fragments with restriction enzymes
-Rxn performed in suboptimal conditions, which don't let enzymes to cut all sites
*This is why fragments overlap
-Fragments are called "contigs" for continuous sequence


Whole Genome Shotgun Approach Step 3

Clone contigs into a cloning vector to make a genomic library.


Whole Genome Shotgun Approach Step 4

Sequence each clone using Sanger Sequencing technique


Whole Genome Shotgun Approach Step 5

Use computers to reassemble sequences of the contigs by puzzling together the overlapping sequences


Whole Genome Shotgun Approach Step 6

Deposit sequence information into NCBI GenBank Database
-Public can use this because it's paid for by tax dollars.


-AKA "Dideoxysequencing or Chain Terminating Sequencing"
-Based on DNA replication/PCR of a DNA template (what you want to sequence)
*Can be circular or linear
-Polymerase adds nucleotides starting from a primer based on complementary sequences

Sanger method of sequencing


If you don't know the sequence, how can you design a primer?

Use a universal primer.
1) Can't design a primer against an unknown sequence
2) Can have a universal primer that can be used for all clones.


Deoxynucleotide vs. Dideoxynucleotide

-Deoxynuc. has OH group on 3' C, can have phosphodiester bond
-Dideoxynuc. has H on 3' C, cannot make phosphodiester bond
-Incorporation of ddNTP causes synthesis of that new strand to stop


What's happening in the PCR tube?

There are:
-polymerase, plasmid, primer, dNTPs, ddNTPs
*fluorescent molecules tag end of sequences


After reaction is complete

Array of products with fluorescent molecules attached are separated by size, using a process called capillary gel electrophoresis


Gel-filled capillary
-when charge is applied, larger products congregate at top and smaller products congregate at bottom
-Smaller products come off from bottom which is when fluorescent molecules are detected.

Capillary Gel Electrophoresis


Reading a capillary gel electrophoresis

-different colored peaks represent a different base
-read the sequence by the order of the colored peaks
-can be some overlap
-read left to right


Final Step: reassembling the sequence

Repeat Sanger sequencing for each clone in the library and then reassemble the contigs using overlapping sequnces.


Things we have learned:
-The sequences of "simpler" organisms like yeast, bacteria, flies, and mice
-3.2 billion basepairs
-About 20,000 protein coding genes
-About 5,000 genes do not code for protein
code for: microRNA, exRNA, tRNA, rRNA, etc...

-Introns are large (can be >100kb)
-Genome is only 2% genes (but 98% isn't junk!)
-Average gene is 3,000bp (largest is dystrophin=2.4million bp)
-Genes are clustered together on chromosomes
-People have 99.9% of their sequence in common.


What we haven't learned:
-Long stretches of repeated DNA sequences that were hard to reassemble
-genes vs. pseudogenes vs. dubious ORFs

-What a gene product actually does
Can find out by:
-compare to a known gene product
-mutate gene product and study it


Looks like a gene but doesn't make a gene product.

Dubious ORFs


Mutated so much that it can no longer make anything.



How do we find protein coding genes (versus all the other sequences in the genome?)

1. Compare the cDNA library to genomic library
2. Use computer algorithms to look for consensus sequences.


Use computer algorithms to predict Open Reading Frames (ORFs)
-Looks for TATA, Start, Stop, certain percentage of GC (genes tend to have more GC than noncoding regions)

Use of Computers to annotate genes


Identification and description of genes and their important sequences
Goal: assign functions to all of the genes of an organism
-Understand variation w/in and among organisms
-Identify where traits come from



Alternative sequencing to Shotgun sequencing

1. Next generation sequencing
2. Exome sequencing
3. Analyze genetic markers throughout the genome (SNPs)


-Fast and Cheap sequencing method

Next Generation Sequencing


General steps for Next Gen. Sequencing

1. Extract DNA
2. Cut to overlapping contigs
3. affix DNA to solid support
4. one-by-one washings of dNTPs across the DNA
5. If that known dNTP is incorporated, then light is emitted
6. Reassemble by overlapping sequences


A specific region of DNA that varies among individuals
ex. SNPs are present 1 in every 1000 bp of DNA
-used to create a detailed map of the individual's genome.

DNA Markers


Set of SNPs that are close together on a chromosome



Within a family, haplotyes are rarely scrambled by:

genetic recombination


Group of individuals that share a common ancestor because they all have similar haplotypes



SNP used to represent an enire haplotype
aka. diagnostic SNP



Is a way to look for a whole bunch of SNPs at once in a genome.

SNP Chips/ Array


-More to do with a population than with individuals
-Is a collection of all the combinations of haplotypes present in a population
-Used to study inheritance of complex traits
-Used to study evolutionary relatedness

Haplotype map (hapmap)


Ethical concerns?

-Misconceptions about genetics by the layperson?
-oversight of personal genotyping services
-Insurance regulations?
-Patenting of genes?
-Are some people "better-suited" for certain careers based upon their DNA?
-Should certain people be discouraged from having children?