Lecture 3 - RH Flashcards Preview

Advanced Bioinformatics > Lecture 3 - RH > Flashcards

Flashcards in Lecture 3 - RH Deck (14):

Why do we perform alignments?

To find homologues

To see if homologue is associated with protein structure

To determine function

To determine evolutionary relationships


How much information can be obtained from aa sequence, pair of homologues, and many aligned sequences?

aa sequence = very little

pair of homologues = some info

many alignments = a lot of information


Why are MSAs performed?

To elucidate functional info within protein sequence

To perform evolutionary analysis


In a pairwise alignment what does the positioning of 2 aa's at the same point imply?

That they have the same role in homologous proteins


What happens when more sequences are added to an alignment?

More accurate results revealing results that are not obvious in a pairwise alignment


How is a MSA performed?

Find sequences you wish to align

Prune them if necessary

Run multi-alignment algorithm

Inspect the output

Remove disruptive sequences and repeat

Identify conserved aa's


How are alignments scored?

Alignment arranged so a maximum number of characters in each sequence are matched

Scoring is done according to the Sum of Pairs (SP)

Each column is scored by summing all possible matches, mismatches, and gaps.


What does the sum of pairs result indicate?

2^n where n is the final score of the Sum of Pairs. This is the number that represents the number of times that a sequence is identical due to homology rather than pure chance.


What can MSAs tell us?

Most highly conserved residues may correspond to the active site.

Insertions and deletions are probably in surface loops

Conserved pattern of hydrophobicity with spacing 2 may indicate beta sheet

Spacing 4 may indicate alpha helix


What are some ways to construct phylogenetic trees?

Distance matrix methods

Maximum parsimony methods

Maximum likelihood methods


What is Neighbour Joining?

Similar to UPGMA using stepwise build

Corrects for evolutionary rate

Creates an unrooted tree


What is character based maximum parsimony? What is the problem with this method?

Based on sequence characters rather than distances

Trees are constructed by searching all possible tree topologies and looking for one with the least changes

Problem: Computationally expensive and so not all sites are used


What is maximum likely based on?

Searches for the evolutionary model that has the highest likelyhood of producing the observed data

Uses a substitution model that incorporates probability.

In practice every position in alignment is scored based on probability.


What is bootstrapping and what is it used for?

A method of statistically validating a tree.

Data is resampled (generally 1000 times) after being slightly

Statistics are hard to define if a node is present 700 times from 1000 then that means 95% probability that it is in the correct position.

*Low bootstrap numbers are bad news