Trees - cluster analysis Flashcards

Question 1

Q

What is the earliest quantitative method of tree construction

Answer

A

cluster analysis

Question 2

Q

What does cluster analysis look at

Answer

A

overall similarity - how much like each other are things

Question 3

Q

What is the assumption behind cluster analysis

Answer

A

species that share a most recent common ancestor should be more similar to each other than to any other species

Question 4

Q

How do you do a cluster analysis from a character matrix

Answer

A

you have to convert it to a similarity matrix or a dissimilarity/distance matrix which is known as p-distance or Hamming distance

Question 5

Q

What are some methods of performing a cluster analysis

Answer

A

least squares method
NJ (neighbor joining)
UPGMA

Question 6

Q

What are the criticisms by cladists when it comes to a distance matrix for cluster analysis

Answer

A

there is a loss of information: no distinction made between shared derived and shared primitive characteristics

Question 7

Q

The Mean character difference used for cluster analysis is also called what

Answer

A

Manhattan squares or taxicab geometry –> you can find the hypotenuse of a triangle with these values (the hypotenuse is the Euclidian distance - think sqroot(character1 difference squared + other character distances squared)

Question 8

Q

The UPGMA method for clustering is usually attributed to what people

Answer

A

Sokal and Michener

Question 9

Q

What is a major problem with the UPGMA method

Answer

A

it assumes that all groups evolve at the same rate - which is often not true (so this doesn’t account for unequal divergence rates?)

Question 10

Q

What clustering algorithms try to compensate for unequal divergence rates unlike UPGMA

Answer

A

least square methods: Here the best tree is the one that minimizes the sum of the squared differences between the true Dij values and the ones predicted on the tree dij
Neighbor joining method (saitou and Nei): this works by clustering but does not assume a clock. This seems to perform better than UPGMA

Question 11

Q

Describe the least square method

Answer

A

this is a clustering algorithm where the best tree is the one that minimizes the sum of the squared differences between the true Dij values (this is Euclidian distance values) and the ones predicted on the tree dij

Question 12

Q

What are the main criticisms of distance based approaches

Answer

A

some information about the data may be lost due to conversions: like going from character matrix to distance matrix and then to a tree
and the assumption of equal rates is questionable

Question 13

Q

What is the main advantage of distance based approaches

Answer

A

they are fast, and some methods like UPGMA and NJ can give a precise single answer

Brainscape's Knowledge GenomeTM

Trees - cluster analysis Flashcards

Brainscape's Knowledge Genome^TM