JEFFRIES Flashcards

(108 cards)

1
Q

name 3 purposes of phylogenetic trees:

A

visualisation of evolution
test hypotheses
track trait/gene changes over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is tokogeny?

A

the study of non-heirarchical genetic relationships between individuals such as parent-offspring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do phylogenies differ from pedigrees?

A

phylogenies represent species evolution over time whereas pedigrees track familial relationships within a species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

define anagenesis:

A

evolutionary change within a lineage without branching (which is trait changes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define cladogenesis:

A

branching evolution which leads to speciation and increased biodiversity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a monophyletic group?

A

a group containing a modern ancestor and all its descendants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a paraphyletic group

A

includes an ancestor but not all descendants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a polyphyletic group?

A

a group excluding the most recent common ancestor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

name 3 tree construction methods:

A

phenetics: similarity
cladistics: shared derived traits
molecular approaches: DNA/protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are terminal and internal nodes?

A

terminal nodes = current species/genes
internal nodes = common ancestors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a clade?

A

a monophyletic group consisting of an ancestor and all descendents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a cladogram?

A

a tree showing only branching order and not evolutionary change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a phylogram?

A

a tree with branch lengths representing evolutionary change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an ultrametric tree?

A

all tips are equidistant from the root - represents time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what distinguishes additive trees?

A

branch lengths can be summed to show total evolutionary distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is an OTU?

A

an operational taxonomic unit - a study defined unit so species or strain etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is a fully resolved tree?

A

a tree where all relationships are clearly defined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the difference between hard and soft polytomies?

A

soft = uncertainty in data
hard = true rapid lineage divergence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

when are network trees used?

A

for complex evolution e.g. horizontal gene transfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what’s the difference between rooted and unrooted trees?

A

rooted trees show direction (ancestry); unrooted show relationships only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

name 3 methods to root a tree:

A

outgroup rooting
clock rooting
paralogue rooting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what do branch lengths in a phylogram represent?

A

amount of evolutionary change e.g. substitutions per site

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is paleogenetics?

A

study of ancient DNA to understand extinct species and their evolutionary relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what are the 2 molecular evolution approaches

A

species to molecule - use evolutionary histories
molecule to species - infer species evolution from molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
list 3 types of molecular data used in evolution:
DNA sequences, protein sequences and genetic rearrangements
26
what determines the utility of loci in phylogenetics?
their rate of change, it's faster for recent events and slower for deep divergences
27
how do nuclear and organelle genomes differ in evolutionary studies?
nuclear = varied rates and broad utility Mt/Chl = stable, maternal and suited for recent/. intermediate divergence
28
what is the difference between orthologues and paralogues?
orthologues arise from speciation and paralogues from gene duplication
29
what are out-paralogues?
gene copies from a duplication event predating speciation and inherited by both speciesw
30
what problem can multiple duplications cause in phylogenetics?
complex relationships and misleading evolutionary trees
31
how do different evolution rates mislead phylogenies?
slowly evolving genes may look more similar within species than to orthologues in other species
32
why might gene trees and species trees differ?
due to HGT, incomplete lineage sorting and hybrid speciation
33
what is incomplete lineage sorting?
when genetic variation present in a common ancestor is not fully sorted into descendent species
34
what is the difference between homology and homoplasy?
homology = shared ancestry homoplasy = similar traits from convergence
35
define apomorphy, plesiomorphy and symplesiomorphy:
A: a derived character state P: an ancestral character state S: a shared ancestral character state
36
how do autapomorphy and synapomorphy differ?
autapomorphy is unique to one taxon whereas synapommorphy is shared among related tax and informs phylogeny
37
whats the difference between discrete and continuous characters?
discrete = categorical (base type) continuous = measurable (cms)
38
what is a character?
it is a trait
39
What defines ordered (stepwise evolution) and unordered character states?
ordered = evolutionary sequence (1 to 2 to 3) unordered = no sequence (base types e.g. A,T,C,G)
40
what is a polyphyletic group?
a group with members from different ancestors and not forming a true clade
41
what are 2 types of weighting used in phylogenetic analysis?
character weighting - gives more influence to characters considered reliable state change weighting = gives more influence to evolutionary changes deemed important
42
what are the main types of sequence changes during evolution?
invariance, substitution, deletion and insertion
43
why is molecular data superior to morphological data for phylogenetic inference?
it is more objective, universal and available in larger quantities
44
what is the difference between pairwise and multiple sequence alignments?
pairwise aligns 2 sequences MSA aligns 3 or more and is more complex
45
why is MSA computationally challenging?
the number of possible alignments grows exponentially with sequence number and length
46
what is the key difference between global and local alignments?
global aligns entire sequences while local finds the most similar regions
47
why is protein alignment generally better than nucleotide alignment?
proteins hae 20 informative states and evolve slower, nucleotides have only 4 states and are more prone to homoplasy
48
what are the benefits of aligning protein sequences before nucleic acids?
protein alignments are more reliable - indels in DNA must be in codon triplets to avoid frameshifts which many tools don't handle
49
what is a heuristic approach in sequence alignment?
a rule-of-thumb method to simplify and speed up alignment though it may sacrific accuracy
50
what is hill climbing in the context of alignment heuristics?
an iterative method that adjust the solution based of feedback to improve alignment score
51
why can early errors in progresive alignment be problematic?
errors appear through later alignments which affects overall accuracy
52
what is the most conservative hypothesis for inferring homology in sequence alignment?
maximising sequence identity
53
how does identity scoring differ between nucleic acids and proteins?
nucleic acids use simple scoring (equal weight) whereas proteins use complex scoring matrices (functional roles)
54
What identity threshold suggests high-confidence homology in long protein sequences?
30–100% identity
55
What identity is typically required for nucleic acid homology inference?
≥70%
56
what principle governs indel placement in seuqence alignment?
minimise the number of indels - each coutns as on evolutionary event
57
What is the role of a Gap Opening Penalty (GOP) vs. Gap Extension Penalty (GEP)?
GOP is high to discourage new gaps; GEP is low to allow gap extension
58
why are end gaps penalised less (EGP)?
to account for natural sequence variation/ incomplete sequencing
59
Define % identity and % similarity
% identity = exact matches % similarity = includes chemically similar amino acids
60
what is the transition to transversion ratio used in nucleotide scoring?
2:1
61
what is a unitary matrix?
score = 1 (match), 0 (mismatch) -1 (gap)
62
compare PAM and BLOSUM matrices:
PAM: based on evolutionary distance BLOSUM: based on observed blocks of conserved sequences
63
what is the difference between empirical and theoretical substitution matrices?
empirical: observed evolutionary data theoretical: based on physiochemical properties
64
what is dynamic weighting in alignments?
adjusting gap penalties based on local sequence context
65
why are protein structures good for alignment?
3D structure gives the most accurate inference of homology
66
what might an outlier sequence indicate?
unrelated sequence, a paralogue or a reverse complement
67
name 3 uses of sequence alignment:
phylogenetics, comparative genomics and clinical genetics
68
what do distance-based phylogenetic methods use to build trees?
pairwise genetic distances stored in a matrix
69
how do character based phylogenetic methods differ from distance based methods?
they use multiple sequence alignment directly, evaluating individual characters
70
what is the optimality criterion in maximum parsimony?
it minimises the number of evolutionary changes
71
what do maximum likelihood and bayesian inference aim to maximise?
The probability of the observed data given a model of evolution
72
What is genetic distance?
A numerical estimate of evolutionary change between two sequences
73
Why doesn’t genetic distance always reflect evolutionary time?
Due to variable mutation rates, convergent evolution, and multiple substitutions
74
What is patristic distance?
The total branch length between two sequences on a tree
75
What is the P-distance (Hamming distance)?
The proportion of observed differences per site between two sequences
76
Why is the observed genetic distance an underestimate of true changes?
Due to multiple hits (substitutions at the same site)
77
Name three types of homoplasy.
Convergent evolution, parallel evolution, reversion (back mutation)
78
What is saturation in sequence evolution?
When sites undergo multiple substitutions, masking true divergence
79
Why do proteins retain phylogenetic signal longer than DNA?
Proteins have 20 character states vs. 4 in DNA, reducing chance of repeated substitutions
80
What is the main assumption of distance matrix methods?
Additivity – distances reflect true evolutionary paths
81
How do models correct for multiple hits?
They adjust observed distances to estimate true evolutionary change
82
What is the Jukes-Cantor (JC69) model?
A one-parameter model assuming equal base frequencies and substitution rates
83
When does Jukes-Cantor model break down?
At low identity levels (≤30%) due to high substitution complexity
84
What improvements do advanced models offer over JC69?
They allow unequal substitution probabilities, site-rate variation, and base frequencies
85
What is among-site rate variation and how is it modeled?
Evolutionary rate differences across sites, often modeled with a gamma distribution
86
What are the five main phylogenetic inference methods?
Distance, Maximum Parsimony, Maximum Likelihood, Bayesian Inference, and Mixed approaches
87
Which distance method is discouraged and why?
UPGMA – assumes constant molecular clock, often unrealistic
88
Why are distance methods considered fast and efficient?
They handle large datasets (e.g., 50+ sequences) and provide quick estimations, suitable for preliminary analyses
89
When can distance methods perform as well as character-based methods?
When the phylogenetic signal is strong
90
Why are distance methods seen as reliable?
They have a long history of use with well-established practices
91
When are distance methods the only option?
With continuous data (e.g., nucleic acid hybridization values), where discrete character methods can’t apply
92
What information is lost in distance methods?
Site-specific data like indels and character states
93
What makes biological interpretation hard in distance methods?
Possible odd/negative branch lengths and the assumption of a constant molecular clock
94
Why do distance methods become less accurate with greater evolutionary distance?
Due to homoplasy and accumulated error—character-based methods are better for deep divergences.
95
What is the two-step process in distance matrix methods?
1. Compute pairwise distances → 2. Build a tree that fits those distances (patristic)
96
What does triangle inequality ensure in phylogenetics?
That distance relationships make sense geometrically; helps identify non-additivity
97
What type of tree does neighbour joining produce?
A single unrooted additive tree with branch lengths
98
Does NJ assume a molecular clock?
No, unlike UPGMA
99
When is NJ guaranteed to return the correct tree?
When data is perfectly additive
100
What does UPGMA assume?
A molecular clock
101
What kind of tree does UPGMA produce?
Ultrametric tree (equal distances from root to tips)
102
Is UPGMA reliable?
Not usually—often gives incorrect trees if molecular clock is violated
103
What does least squares aim to minimize?
The squared differences between observed and expected distances (Q value)
104
What does minimum evolution aim to minimize?
The total length of the tree.
105
What is “tree space”?
All possible tree topologies for a given number of taxa
106
What are local optima in tree space?
Clusters of trees that fit well but aren't the best globally
107
What's the difference between expected and realised trees?
Expected: infinite data; Realised: estimated from actual (finite) data
108
How do you identify an ancestral state?
Compare to outgroups—ancestral states are older traits.