Shane - Lecture 3 Flashcards
(38 cards)
How many genes do human have?
22,000 genes
How long did it take to sequence the full human genome?
About 10 years
How much of your genetic material is the exact same as a random stranger?
99% of it is identical
Why did it take so long to sequence the human genome?
Because we have 3 billion base pairs but only 22,000 genes
What is computational gene prediction?
Trying to find what genes are found on a sequence of DNA i.e. what region of the uncharacterised sequence codes for proteins
What information can be found via computational gene prediction?
(6)
What regions codes for protein
Which DNA strand encodes the gene
Which reading frame is used
Where does the gene start and end
Where are the exon-intron boundaries in eukaryotes
Where are the regulatory sequences for that gene
What often acts as the start codon?
ATG
What are the benefits of gene finding on prokaryotes?
(3)
Small genomes
High coding density
No introns
What is the gene level accuracy of gene finding of prokaryotes?
99%
What are the characteristics of eukaryotic genes?
Large genomes
Low coding density
Intron/exon structure
What is the gene level accuracy of gene finding on eukaryotic genes?
About 50% accuracy
What are the problems associated with gene finding on prokaryotes?
(3)
Overlapping open reading frames
Very short genes - protein might be only a few dozen amino acids
Finding transcription start sites (TSS) and promoters
What is a TSS?
The point at which RNA polymerase starts trascribing
What is a TSS?
The point at which RNA polymerase starts transcribing
What are the four ways we can predict the location of genes in genomic sequences?
Searching by signal
Searching by content
Similarity-based methods
Comparative genomics
What is it called when searching by signal and content is done simultaneously?
Ab initio or intrinsic methods
What are intrinsic methods of gene prediction used for?
For looking for very specific features associated with genes
What is it called if similarity-based methods and comparative genomics are used together?
Extrinsic methods
What is meant by searching by signal gene prediction?
The analysis of a sequence signal involved in gene specification
What is meant by searching by content signal gene prediction?
Codon bias correlated with coding regions
What is meant by similarity based methods of gene prediction?
Use of similarity to known annotated sequences
What is meant by comparative genomics?
Aligning genomic sequences from different species
What is meant by extrinsic methods of gene prediction?
(2)
Is our unknown gene similar to other known gene sequences
This relies on pre-existing gene information
How does ab initio gene finding work?
(4)
We input a DNA string of letters (A, C, G, T)
We get out an annotation of the string of letters showing for every nucleotide whether it is coding or non-coding
Red = stop and start codons
Blue = exons
Black = introns
Identifies coding exons of protein-coding genes