Lecture 3 DA Flashcards by Rodwan Halimi

What is the purpose of an alignment (4)?

Find homologues
See if a homologue has an associated protein structure
Determine function
Determine evolutionary relationships

How well did you know this?

Not at all

Perfectly

What is the purpose of multiple sequence alignments (2)?

Elucidate functional informarion within protein sequences

- Perform evolutionary analysis

How well did you know this?

Not at all

Perfectly

In a pairwise alignment, what does positioning of two amino acids at the same point imply? What is the best way to determine the implication, and can this always be done?

They perform the same role in homologous proteins.
Can be determined by performing structural alignment, where the amino acids are aligned in 3D. May not be possible if no 3D structure is available.

How well did you know this?

Not at all

Perfectly

What can be done to increase the accuracy of pairwise alignments? What does this reveal? What alignment to perform is usually worthwhile?

Adding more sequences to an alignment. Can reveal patterns that aren’t obvious in a pairwise alignment. Worthwhile to perform MSA.

How well did you know this?

Not at all

Perfectly

Describe how an MSA is done (6).

Find sequence of interest (ie. BLAST).
Prune if necessary.
Run multiple alignment algorithm.
Inspect output.
Remove disruptive sequences, and repeat.
Identify key conserved amino acids.

How well did you know this?

Not at all

Perfectly

How is scoring done in MSA (3)?

Alignment is arranged so a maximum number of characters in each sequence is matched.
Scores are accorded to the sum of pairs.
Each column is scored by summing all possible matches, mismatches, and gaps.

How well did you know this?

Not at all

Perfectly

What are some disadvantages of MSA (2)?

MSA is computationally expensive.

- Difficult for 4 sequences, more than 20 is impossible.

How well did you know this?

Not at all

Perfectly

How does the CLUSTAL algorithm work (4)?

Begin by pairwise alignment.
Build a phylogenetic guide tree.
Take most closely related sequences and align them, forming a consensus.
Repeat with the next most closely related sequence.

How well did you know this?

Not at all

Perfectly

What is an advantage and disadvantage of the CLUSTAL algorithm?

Advantage
- Results in near optimal alignment.
Disadvantage
- If an early error is made, it is preserved.

How well did you know this?

Not at all

Perfectly

What is a major problem with the CLUSTAL algorithm?

Selection of an appropriate matrix for alignments consists of divergent and closely related sequences.

How well did you know this?

Not at all

Perfectly

From an MSA, what do highly conserved residues suggest?

Correspondence to an active site.

How well did you know this?

Not at all

Perfectly

From an MSA, where are insertions and deletions often found?

In surface loops.

How well did you know this?

Not at all

Perfectly

From an MSA, what do conserved patterns of hydrophobicity with a spacing of 2 indicate? What about 4?

B-sheet.

4 indicates a-helix.

How well did you know this?

Not at all

Perfectly

What is the terminal node?

End point.

How well did you know this?

Not at all

Perfectly

What is an internal node?

Hypothetical ancestor.

How well did you know this?

Not at all

Perfectly

What is a root?

Study These Flashcards

Common ancestor.

What are the three basic assumptions of cladistics?

Study These Flashcards

Any group of organisms are related by descent from a common ancestor.
Bifurcating pattern of cladogenesis.
Change in characteristics occur in lineages over time.

Can phylogenetic trees be rooted, or unrooted? What sequence is better to use? What can be done with this sequence, and what is it called?

Study These Flashcards

They can be rooted, or unrooted.
It is better to use a sequence that is more divergent from all other sequences. The tree can be rooted at this sequence, called an outgroup.

What is characteristic of fully resolved trees?

Study These Flashcards

They are binary, with no more than 2 branches at each node.

What is a problem with increases in taxa?

Study These Flashcards

The number of possible trees increases exponentially, making it hard to know the one drawn is the true tree.

What are three ways to build trees?

Study These Flashcards

Distance matrix method
Maximum parsimony method
Maximum likelihood method

Describe the distance matrix method.

Study These Flashcards

It is a clustering method, for a set of species. You choose the two most similar, add a node, then add the next most similar.

What is a disadvantage of the distance matrix method?

Study These Flashcards

Very simplistic, makes assumptions that may not be true.

Describe neighbour joining. What does it assume?

Study These Flashcards

Stepwise build, doesnt asume all taxa have the same evolutionary rate. It can detect it, and corrects for this.

What kind of tree does neighbour joining create?

It creates an unrooted tree. | If a rooted tree is needed, outgroup must be determined.

What is the character based method based on?

Sequences rather than distances.

How are trees constructed in character based methods?

By searching all tree topologies, looking for one with least changes.

What is a disadvantage of character based methods?

Computationally expensive.

What principle is the character based method based on?

Occam's razor, the simplest explanation is the correct one.

How does maximum likelihood work?

Searches for the evolutionary model that has the highest likelihood of producing the observed data. Every position in the alignment is scored, then summed.

What method does maximum likelihood use, and what is its disadvantage?

Uses a substitution method incorporating probability. Computationally expensive.

What is bootstrapping?

Way of statistically validating the tree.

How does bootstrapping work?

Data is resampled after being slightly perturbed (usually 1k times), and the number of times a node appears is given.

What statistics given by bootstrapping give a 95% probability of a node's correct position?

Statistics are hard to define. | If a node is present 700-1000 times, 95% probability it's in the correct position.

What are a-globin sequence analyses used for?

To estimate divergence time, a molecular clock.

What is used as a calibration point in for divergence times?

Humans and cows split point at 80mya.

What is assumed in divergence times? Is it true?

Linear relationship between time and mutation accumulation. not entirely true, but works for models, forms the molecular clock.

Lecture 3 DA Flashcards

(37 cards)