Shane - Lecture 3 Flashcards

(38 cards)

1
Q

How many genes do human have?

A

22,000 genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How long did it take to sequence the full human genome?

A

About 10 years

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How much of your genetic material is the exact same as a random stranger?

A

99% of it is identical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why did it take so long to sequence the human genome?

A

Because we have 3 billion base pairs but only 22,000 genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is computational gene prediction?

A

Trying to find what genes are found on a sequence of DNA i.e. what region of the uncharacterised sequence codes for proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What information can be found via computational gene prediction?
(6)

A

What regions codes for protein

Which DNA strand encodes the gene

Which reading frame is used

Where does the gene start and end

Where are the exon-intron boundaries in eukaryotes

Where are the regulatory sequences for that gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What often acts as the start codon?

A

ATG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the benefits of gene finding on prokaryotes?
(3)

A

Small genomes

High coding density

No introns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the gene level accuracy of gene finding of prokaryotes?

A

99%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the characteristics of eukaryotic genes?

A

Large genomes

Low coding density

Intron/exon structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the gene level accuracy of gene finding on eukaryotic genes?

A

About 50% accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the problems associated with gene finding on prokaryotes?
(3)

A

Overlapping open reading frames

Very short genes - protein might be only a few dozen amino acids

Finding transcription start sites (TSS) and promoters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a TSS?

A

The point at which RNA polymerase starts trascribing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a TSS?

A

The point at which RNA polymerase starts transcribing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the four ways we can predict the location of genes in genomic sequences?

A

Searching by signal

Searching by content

Similarity-based methods

Comparative genomics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is it called when searching by signal and content is done simultaneously?

A

Ab initio or intrinsic methods

17
Q

What are intrinsic methods of gene prediction used for?

A

For looking for very specific features associated with genes

18
Q

What is it called if similarity-based methods and comparative genomics are used together?

A

Extrinsic methods

19
Q

What is meant by searching by signal gene prediction?

A

The analysis of a sequence signal involved in gene specification

20
Q

What is meant by searching by content signal gene prediction?

A

Codon bias correlated with coding regions

21
Q

What is meant by similarity based methods of gene prediction?

A

Use of similarity to known annotated sequences

22
Q

What is meant by comparative genomics?

A

Aligning genomic sequences from different species

23
Q

What is meant by extrinsic methods of gene prediction?
(2)

A

Is our unknown gene similar to other known gene sequences

This relies on pre-existing gene information

24
Q

How does ab initio gene finding work?
(4)

A

We input a DNA string of letters (A, C, G, T)

We get out an annotation of the string of letters showing for every nucleotide whether it is coding or non-coding

Red = stop and start codons
Blue = exons
Black = introns

Identifies coding exons of protein-coding genes

25
Give an example of one of the most common stop codons
TAA
26
How does searching by signal work?
There are four different signal found at different sites: - translation start codon ATG - 5' splice donor site - 3' splice acceptor site - translation stop codon - TAA, TAG, TGA
27
List the three stop codons
TAA TAG TGA
28
What can be used to look up the donor and acceptor splice sites of a sequence?
Consensus sequences can be used to find splice sites
29
What can be used to help identify a stop signal?
The Cs and Ts found running up to the stop
30
What does searching by content do?
Accurate prediction of exons dependant on content-based features -> can identify the type of exon
31
What are the three types of exons?
Initial exons Internal exons Terminal exons
32
What are initial exons?
Open reading frames delimited by a start site and 5' donor site
33
What are internal exons
Open reading frames delimited by a 3' acceptor site and 5' donor site
34
What are terminal exons?
Open reading frames delimited by 3' acceptor site and stop codon
35
Where is codon bias mostly found?
Found in exons more so than introns
36
What is codon bias?
The uneven usage of amino acids -> some are more frequently found and some are not
37
How can codon bias be useful?
It can be used to differentiate between coding and non-coding regions as some codons might only be found in coding regions etc
38
What are coding statistics?
A function that for a given DNA sequence computes a likelihood that the sequence is coding for a protein