Struggle Flashcards
(36 cards)
What is bioinformatics?
It is the analysis and conceptualisation of complex biological information.
What us BLOSUM62
is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences. They are based on local alignments. Pairwise alignment greater than 62%
Explain Affine Gap Penalties
Penalises insertions/ deletions, Penalty for gap openings, gap extensions, length of gap extensions. Gap openings have a higher cost.
What is In Silico
Ligand analysis performed on a computer
Explain BLAST
(basic local alignment search tool) is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. It uses Heuristic to speed us computation
What is Dynamic Programming
dynamic programming is solving complex problems by breaking them into states. It gives a score to find the optimal alignment. This process is very slow. The steps involve 1. initialisation 2. scoring the matrix 3. traceback
Protein vs DNA
Protein has 20 characters rather than 4. Codons are degeneratable. Offers a longer look back in time.
Paralogs
Duplication event
Why is DNA used?
To identify cDNA, non-coding regions of DNA and to identify DNA polymorphorisms.
Types of Algorithms?
- Uniformative
- Ungapped
- Gapped
Describe a hierarchical approach?
- Different groups are given a chromsome to sequence
- The hroups genereate a bacterial artifical chromosome (BAC)
- BAC is divided and shothun sequences
- High fideltiy maps identify motifs and allow detection of overlapping sequences.
How many Genes were found
51k
How many genes code
20k
How many genes non code
20k
What are pseudo genes
genes that seem to be protein coding but mutation renderers them non coding. 18k found
How many mRNA’s found? and what does this mean?
98k, for every gene, 5 mRNA are made.
Why are MSA done?
To elucidate functional in formation with proteins. Perform evolutionary analysis.
How are alignments scored?
- Maximum number of sequences are matched.
- Scoring is done with Sum of Pairs
- Each column is scored by summing all possible matches, gaps and mismatches.
What is the E-value?
The Expect value (E) is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases.
What us the Affine Gap penality
Large opening gap penalty. smaller penalty for extending gap.
Neighbour Joining
Similar to UPGMA, corrects evolutionary rate. Created unrooted tree
What are some ways to create a tree?
- Distance Matrix Method
- Maximum parsimony method
- Maximum likelihood method
What is bootstrapping?
A way of statistically validating a tree
Data is resampled
How is MSA measured
It uses the ClustalW to form a phylogenetic tree. It uses the Sum of Pairs (Heuristic)