Gene Identification Flashcards
Eukaryotes
-It’s not easy to identify eukaryote genes because protein-coding genes are sparsely distributed and interrupted by introns.
- another challenge is brought by alternative splicing.
Prokaryotes
-It’s easier to identify prokaryote genes because they have a smaller genome and fewer genes.
-genes are contiguous because they lack introns and have intergenic regions.
A priori method
The method works by recognizing sequence patterns within expressed genes and the regions flanking them.
- protein-encoding regions are recognized by codon statistics and the absence of stop-codons.
‘Been there, seen that’ method
The method works by recognizing regions corresponding to previously known genes.
- regions are recognized by looking for similarities of translated amino acid sequence to known proteins in another species/ by matching expressed sequence tags
Describe useful features of gene identification in addition to codon usage: what to look for in the beginning, middle and end of genes.
- Initial (5’) exon starts with a transcription start site (TSS), followed by a core promoter site (e.g., TATA box, ~30bp, free of inframe stop codons) and ends immediately before the GT splice signal.
-Internal exons are free of inframe stop codons, they begin immediately after an AG splice signal and end immediately before the GT splice signal. - Final exon start immediately after the AG splice signal and ends with a stop codon (i.e., TAA, TAG, TGA) followed by a polyadenylation (Poly-A) signal sequence.