Genome sequencing and sequence analysis II Flashcards

1
Q

What’s the core information contained in protein data banks?

A

The X, Y and Z coordinates for the atoms in proteins.

There are 150 000 proteins logged, many of them have similar structures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define sequence similarity.

A
  1. Sequence similarity is the number of identical and similar residues.
  2. “Similar” residues have similar chemical properties, and thus, interacts in the same way.
  3. Similarity is the product of sequence invariance and sequence (functional!) conservation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define sequence homology.

A

Two sequences are homologous if it can be concluded that they derive from the same evolutionary past.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define sequence identity.

A

The extent to which two sequences are invariant (same nt).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define sequence conservation.

A

The sequences may have different nucleic bases, however, the amino acids have similar chemical interactions, and thus, the functionality is conserved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the general threshold of DNA sequence similarity required to claim that two proteins share the same general structure?

A

20-30%

Form gives rise to function. The proteins may very well differ at a greater level of detail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why are global alignments not necessarily positive?

A

Global alignments look at sequence similarity over the entire sequence. Many times, DNA sequences (which correspond to different protein domains) have been shuffled around, still creating the same product, but in a different order. These re-shufflings don’t change the protein functionality, only the DNA sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is it logical to penalise the alignment score of 1nt gaps proportionally more than nucleotides in a larger gap?

First nt is penalized harder.
TATCTAAA
vs
TATC
**AT

A

It makes sense as the 1nt gap likely has the same functional consequence as a longer gap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s the concept behind amino acid substitution matrixes?

A

In a matrix, you can read the conservation of a protein. This is achieved by chemically comparing them, seeing if they can chemically interact in similar manners. If not, the penalty is high. If yes, then the score may still be positive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do you use Blastn for?

A

DNA –> DNA search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do you use plastp for?

A

protein –> Protein search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do you use blastx for?

A

DNA –> Protein search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do you use tblastn for?

A

Protein –> DNA search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do you use tblastx for?

A

DNA –> protein –> DNA search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the blast search strategy (protein-protein search).

A
  1. Make a list of k-letter words (words with 2,3 or 6 aas).
  2. Set a word score threshold (the threshold for what alignment score is needed for a aword to be counted as being aligned).
  3. Search for your words in other protein sequences.
  4. When BLAST identifies a word-match, the search extends bilaterally until the alignment score drops below a certain level. (90% of BLAST processing is spent here).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define ‘E-value’ and ‘S’ in the context of BLAST-searching.

A

E-value: The number of sequence alignments which have a greater alignment score than the threshold (S) WHICH OCCUR BY CHANCE.

17
Q

What are multiple sequence alignment (MSA) methods?

A

Methods used for bioinformatic evolutionary biology.

18
Q

What’s a heuristic method?

A

A method which relies on an algorithm that finds short high-scoring matches and extending its search bilaterally.

Heuristic methods are simple, they aim at finding the quickest solution.

19
Q

What’s a spike-in transcript, how are they used?

A

Spike-in transcripts are used when performing RNAseq. Without spike-ins, which have known characteristics and concentrations, RNAseq can only give relative measurements. With spike-ins, RNAseq can be used to quantify the absolute levels of transcripts.

20
Q

In RNAseq, what is the read depth?

A

Read depth is the total amount of reads which have been done.

21
Q

Is the following statement true?: Long- and short transcripts with the same amount of reads mapped to them, are equally active..

A

No. A long transcript will likely be matched by multiple reads. A short transcript with the same amount of reads as a long transcript, likely means it’s more transcriptionally active.

long transcript, four reads.
————————–transcr.
—- —- — —-

short transcript, four reads
—— transcript
——
——
——
——