Genome sequencing and sequence analysis II Flashcards

Question 1

Q

What’s the core information contained in protein data banks?

Answer

A

The X, Y and Z coordinates for the atoms in proteins.

There are 150 000 proteins logged, many of them have similar structures.

Question 2

Q

Define sequence similarity.

Answer

A

Sequence similarity is the number of identical and similar residues.
“Similar” residues have similar chemical properties, and thus, interacts in the same way.
Similarity is the product of sequence invariance and sequence (functional!) conservation.

Question 3

Q

Define sequence homology.

Answer

A

Two sequences are homologous if it can be concluded that they derive from the same evolutionary past.

Question 4

Q

Define sequence identity.

Answer

A

The extent to which two sequences are invariant (same nt).

Question 5

Q

Define sequence conservation.

Answer

A

The sequences may have different nucleic bases, however, the amino acids have similar chemical interactions, and thus, the functionality is conserved.

Question 6

Q

What’s the general threshold of DNA sequence similarity required to claim that two proteins share the same general structure?

Answer

A

20-30%

Form gives rise to function. The proteins may very well differ at a greater level of detail.

Question 7

Q

Why are global alignments not necessarily positive?

Answer

A

Global alignments look at sequence similarity over the entire sequence. Many times, DNA sequences (which correspond to different protein domains) have been shuffled around, still creating the same product, but in a different order. These re-shufflings don’t change the protein functionality, only the DNA sequence.

Question 8

Q

Why is it logical to penalise the alignment score of 1nt gaps proportionally more than nucleotides in a larger gap?

First nt is penalized harder.
TATCTAAA
vs
TATC**AT

Answer

A

It makes sense as the 1nt gap likely has the same functional consequence as a longer gap.

Question 9

Q

What’s the concept behind amino acid substitution matrixes?

Answer

A

In a matrix, you can read the conservation of a protein. This is achieved by chemically comparing them, seeing if they can chemically interact in similar manners. If not, the penalty is high. If yes, then the score may still be positive.

Question 10

Q

What do you use Blastn for?

Answer

A

DNA –> DNA search

Question 11

Q

What do you use plastp for?

Answer

A

protein –> Protein search

Question 12

Q

What do you use blastx for?

Answer

A

DNA –> Protein search.

Question 13

Q

What do you use tblastn for?

Answer

A

Protein –> DNA search.

Question 14

Q

What do you use tblastx for?

Answer

A

DNA –> protein –> DNA search.

Question 15

Q

Explain the blast search strategy (protein-protein search).

Answer

A

Make a list of k-letter words (words with 2,3 or 6 aas).
Set a word score threshold (the threshold for what alignment score is needed for a aword to be counted as being aligned).
Search for your words in other protein sequences.
When BLAST identifies a word-match, the search extends bilaterally until the alignment score drops below a certain level. (90% of BLAST processing is spent here).

Question 16

Q

Define ‘E-value’ and ‘S’ in the context of BLAST-searching.

Answer

A

E-value: The number of sequence alignments which have a greater alignment score than the threshold (S) WHICH OCCUR BY CHANCE.

Question 17

Q

What are multiple sequence alignment (MSA) methods?

Answer

A

Methods used for bioinformatic evolutionary biology.

Question 18

Q

What’s a heuristic method?

Answer

A

A method which relies on an algorithm that finds short high-scoring matches and extending its search bilaterally.

Heuristic methods are simple, they aim at finding the quickest solution.

Question 19

Q

What’s a spike-in transcript, how are they used?

Answer

A

Spike-in transcripts are used when performing RNAseq. Without spike-ins, which have known characteristics and concentrations, RNAseq can only give relative measurements. With spike-ins, RNAseq can be used to quantify the absolute levels of transcripts.

Question 20

Q

In RNAseq, what is the read depth?

Answer

A

Read depth is the total amount of reads which have been done.

Question 21

Q

Is the following statement true?: Long- and short transcripts with the same amount of reads mapped to them, are equally active..

Answer

A

No. A long transcript will likely be matched by multiple reads. A short transcript with the same amount of reads as a long transcript, likely means it’s more transcriptionally active.

long transcript, four reads.
————————–transcr.
—- —- — —-

short transcript, four reads
—— transcript
——
——
——
——