Mol. Evolution- Exam 2 Flashcards


Set Details Share
created 7 months ago by baddie_956
13 views
updated 6 months ago by baddie_956
show moreless
Page to share:
Embed this setcancel
COPY
code changes based on your size selection
Size:
X
Show:

1

What was the main impetus behind the development of Bayesian analysis?

- Very computationally intense, faster way to do models of evolution and use statistical approaches.

- Response to the increasing intractability of maximum likelihood analyses.

- Not an optimality method.

2

How is the statistical calculation for the tree score different from ML analyses? L= Prob(Hypothesis|Data)

Hypothesis is the tree itself.

3

Accuracy

True, the right answer

4

Precision

- How certain you are or how small is your level of uncertainty.

- Do people agree?

- Precision may be a little wide.

- How narrow is range of true/correct answer.

5

How do we know if our phylogenies are accurate?

- We never know 100%.

- We use simulated data sets that hopefully accurately represent genetic data.

6

If we can never know for sure, what two approaches are used to see if the methods we use are good at accurately determining historical phylogenetic relationships?

- Known phylogenies like bacterial sets

- Simulated data sets

- Congruence of methods

7

Prior probability

- Beginning guess, estimate the parameters of our model of evolution, starting point

- If I go into an analysis with some information, then I have more power to help inform my subsequent analysis

8

Posterior probability

- Information that comes from the prior probability

- Becomes the prior probability of the next tree

9

Burn-in period

- Plateau where no matter how many times we do it, we maxed out the scores of our trees, throw the bad trees away but save the good trees

- Use trees as stepping stones for better estimates

10

MCMC

- Way to rapidly estimate statistical scores of the trees

- Shortcut to speed up estimations we have to do when there's too much (complex)

- A quick way to stimulate a complex research space

11

What is the final outcome of a Bayesian analysis and how is this represented in a phylogeny?

- Hundreds of thousands of trees from the plateau phase and create one big tree out of those

- Does estimation faster than maximum likelihood and also does branching showing how often relationships were there

- A consensus tree created from a very large set of phylogenies saved from after the "burn-in" period with numbers above each node representing the posterior probabilities of each relationship

1. Hundreds of thousands of trees

2. Consensus tree for all trees

3. Support values

12

Bayesian analysis

Primary method of figuring out relationships.

Advantages:

- Faster than maximum likelihood

- Get result and estimate of process (probability support value)

- Bayesian analyses incorporate models of evolution in a statistical framework, but are more efficient than maximum likelihood methods

13

Posterior probability values

- How confident we are about the relationship in each node

- Measures precision

- If it's 100 = very high percentage they were related

14

Congruence

- Simply if two phylogenies match

- Precision across our results

15

What is the best method to find the phylogenetic relationships of a group and what does congruence have to do with this?

- Bayesian analysis

- Congruence of methods shows accurate representation/relationships of many organisms

16

Support measures

Estimate precision, how confident we are and how narrow our range is between the relationship

17

What are branch support measures testing?

Precision more than accuracy

18

Bremer support value

- Only for parsimony analysis

- Take every node on the tree and find very best tree

- Number represents difference between best tree and the tree that doesn't support the relationship

- Not confident in relationship if it's a small number

- Can't compare from one analysis to another

19

Jackknife support value

- For any analysis except for parsimony

- Take out one taxon and redo the analysis and see if everything looks the same

- If it all looks the same it means that the species didn't have an effect and very certain of relation

- Sensitivity to the taxa that are included

- Originally in parsimony, get analysis and get best tree and then redo and take out species

20

Taxon sampling, what is the effect of poor taxon sampling on a phylogeny?

- Process of selecting representative taxa for a phylogenetic analysis

- Lack of information, may not be accurate, may be missing so much information we didn't see the connections, want to include every species but we can't do that for diverse samples, it can mean we are missing some part of connection between species

21

Bootstrap support value

- Generally applicable

- Pseudo replica- replicate we constituted from original data set

- Recreate data set by sampling characters multiple times and some not at all

- Some samples are not going to hold the characters to give support for that relationship

- We get the best tree from thousands of trees by summarizing them

- Numbers between 50-100

- Can compare one analysis to another because it's done on 100% scale

- Widely applicable, used a lot

- Form pseudoreplication, data points can be sampled more than once, sample with replacement

- Tells how accurate data is across entire range

22

Bootstrap

1. Randomly resample characters with replacement to make a new data set the same size as the original (homology= data, point of evidence)

2. Find best topology (phylogeny) using new data set (pseudo replica)

3. Repeat (replicates), take all trees and make consensus tree

23

Posterior probability

Bayesian support measure from the last analysis

24

Why is the bootstrap support value most widely used?

- Widely applicable to all methodologies

- Relatively easy

- No new data

25

What are the bootstrap drawbacks?

1. Estimates precision, not accuracy

2. Tend to overestimate confidence

3. Assumes independence

4. Computationally inense

26

Supertrees

- Doesn't gather any new data

- Take analyses that have already been done and create a method to put them together

- A topology composed of different formal analyses with or without some sort of formal analysis

- Allows us to combine results from incompatible data sets

- Finds out areas of consensus, what if we don't agree how to represent all species

- Supertree methods ensure that we will find the true sets of relationships as long as the underlying assumptions are not violate (false)

27

Review what is meant by a consensus approach and a total evidence approach in making phylogenies and how this relates to supertree methods.

- Consenus approach- supertree is the agreement between total analysis

- Total evidence approach- take all the data and make a tree

28

Why did people start creating supertrees?

1. The unwieldiness of analysis, gets harder to work with bigger data sets

2. Like to summarize what has already been done, more formal way to summarize analyses

- They were originally created as a way to combine results from separate analyses where the underlying data was not congruent enough to assemble into a single data set

29

What is an informal supertree?

- No objective way to put them together/analysis that occurs

- Cut and paste, kind of know from other studies how some species are related, paste it with what is known with other species/groups

- Not fullproof, gives overall picture, doesn't have second analysis

30

Be able to outline the two processes used to make formal supertrees

- Agreement

- Optimization via matrix representation

(formal doesn't do well when there's conflict)

31

Agreement

- Making a consensus tree

- Here's phylogeny 1 and phylogeny 2, stick them together if they agree

Drawback: removed from analysis because incorrect data gets piled

32

Optimization via matrix representation

- Second round of analysis that goes on that synthesizes the original one, make a big matrix based on the trees

(more objective way to help decide when conflict)

33

What are the major criticisms of supertree methods?

Metanalysis, analysis of previous analyses without the original data, there is a lot of imprecision

1. No primary data

2. No "signal enhancement"

3. Novel clades not supported in source data (ended up with new relationships)

4. Inadvertent replication of source data (opposite of signal enhancement)

34

What is the more recent reason why people have proposed a supertree approach to building phylogenies?

There is so much data available and it can be impossible to do single phylogenies

35

Disk covering method

- Estimate relationships and then create supertree

- Need to have vague idea of relationships

- Gather new data and then use supertree methods to put them all together

- Areas with overlap, better fit and less sampling

36

Biclique method

- Find large data sets of what is currently available and put them together in sequence analyses that is based on the data

- Reanalyze data that is already available and put them together

- Put together matrix that represents different groups

- Helps identify good data sets

37

Reconstructing ancestral states

Look at the certain DNA for a certain sequence

38

Synapomorphy

- Single mutation that has been passed on to all the descendants

- Ex: feathers for birds

- Ex: making milk in mammals

39

Symplesiomorphy

- Important characteristic but is lost in their descendants

- Tetrapods where whales don't have legs

40

Convergence

- Similar characters derived independently

- Ex: bat, bird, and insect wings all derived independently but same function

41

Automorph

New characteristic but only in one species, not helpful

42

Which of the above character patterns provides direct evidence for classification of species into higher taxa?

Synapomorphies. All the other ones are noise/problems.

43

Fitch optimization

- Method to guarantee the most parsimonious mapping of complex characters

- Step by step mapping to tell us where mutations happened

44

Dollo parsimony

- Once a trait is lost it is not revolved

- Dollo Parsimony is best applied to the origin and evolution of complex features such a wings

- Ex: ancestors of stick insects are part of the winged insect group. The common ancestor of stick insects lost their wings but some current stick insects still have wings, they "revolved" them but basically they still carried the wing gene just turned it back on

- Ex: loss of teeth in vertebrates, teeth evolved only once at the origin of vertebrates and were then lost multiple times in turtles, birds, seahorses

45

Make sure you understand how mapping character traits onto a phylogeny allows us to reconstruct ancestral sites

Once we map a trait on a phylogeny it allows us to see variation

46

How is inference of ancestral states different under a maximum likelihood assumption?

Take into account branch lengths and models of evolution, what types of mutation are more likely

47

Convergence

- Bird, bat, and insect wings evolved independently but all used to fly

- Complex eyes of vertebrates, cephalopods, jellyfish, and arthropods evolved separately but are associated with vision

48

Be able to briefly outline Shimodaira-Hasegawa (SH) test

1. Uses bootstrap procedure

- Creates spread of possibilities

- Range of trees to accept or refuse

2. Tests wether an alternative hypothesis is significantly different than the best phylogeny

49

SH test drawbacks

- Has to create a distribution

- Anything that is a weakness of the bootstrap will be a weakness of SH test

- Tests to overestimate

- Not the best way to select which model of evolution to use for a genetic data set

50

Likelihood ratio test

- Is very flexible

- Uses a likelihood score

- Is widely applicable

51

What are the four primary uses of LRT?

1. Different phylogenies (is one phylogeny better than the other/significantly different?)

2. Molecular clock (can we use this data to estimate divergence time? only in limited situations, in data that's evolving naturally, can't predict in the short-term but can predict in the long-term)

3. Models of evolution (is this one better than this one?)

4. Looking for signs of natural selection in protein coding genes (is natural selection working on this gene or part of this gene?, purple= purifying selection, yellow= weak positive selection, red= strong positive selection)

52

How does one select a model of evolution?

- Do a stepwise comparison of all the different models, which is statistically different, and determine which one to choose

- Multiple tests to find best one

- The more complex the model of evolution, the less accurate

53

Orthology

- Two genes that can trace their common history back to a speciation event

- Two homologous genes, their divergence can be traced back to an ancient speciation event that split the most recent common ancestor of the two species with these genes into separate branches

54

Paralogy

- Two genes share common ancestor had gene duplication instead of speciation event

- Two homologous genes, their divergence can be trace back to a gene duplication event that predates the most recent common ancestor of the two species in which we find the genes

55

Xenology

- Horizontal gene transfer event can make a gene history not match the species history

- Huge problem for bacteria phylogenies

- Occurs but is very rare in eukaryotic genes

- Two homologous genes, one of them went through a horizontal gene transfer event and is now part of the genome of an organism very distantly related to the organism that has the other gene

56

Which of these subclasses are of use when trying to infer phylogenetic history?

Orthologous genes

57

Lineage sorting

- If speciation process is short and coalescence is fast, there there is no problem

- History of alleles doesn't trace the species history

- More than one allele in a population, one will be lost because of genetic drift

58

What are two processes that would make lineage sorting more likely?

Rapid speciation and long coalescence time

59

What are the three things that can cause a gene tree to conflict with a species tree (even when both trees are reconstructed accurately)?

- Gene duplication

- Horizontal gene transfer

- Coalescence (line sorting)

60

Gene duplication

Connected with paralogy

61

Horizontal gene transfer (plasmid/transformation, vectors with virus, and pilus)

Connected with xenology

62

Strict consensus tree

- Two, three, or more trees described together the trees agree with

- Very little resolution

63

Majority consensus tree

- How many trees show that relationship

- Better resolution

- D is more closely related to ABC than E

64

Pseudogene

Duplicated gene that no longer functions (still in the genome but is part of noncoding DNA)

65

Neofunctionalization

Duplicated gene now has a different function from what it did ancestrally (related to anagenesis)

66

Anagenesis

- Process of generating of new potential and diversity within a species over time

- Change in function over time but without any genes being created

67

Cladogenesis

- Process of speciation events where we generate new clades from a single ancestral population carrying characteristics into lineages and new functions

- Speciation process creating new clades or new groups

68

Is there a single species definition that can define all species?

No because speciation is a process, not an event and different species may have different processes that help establish the separation of a population into different species

69

Does this mean that the concept of a species is a human idea and not a biological reality? How can we reconcile this discrepancy?

Recognize it's a process and there are slight differences in some groups compared to others

70

Morphological species concept

- Most widely used

- Cats are different from dogs by looking at them

- Weakness where there is not enough morphological diversity to tell them apart

Strengths:

- Simple and easy

- Don't need special equipment, just need observation skills

Weaknesses:

- Need education on terms

- Need to be careful when there's a wide range of characteristics

71

Biological species concept

- Groups of actually or potentially interbreeding populations which are reproductively isolated from other such groups

- Used by defining rates of gene flow

- Can they exchange genetic material and at what level? Only relates to sexually reproducing species

Strengths:

- A little bit more scientific and objective

Weaknesses:

- Gene flow isn't 0 or 100, more in the middle

- Takes a lot of time and resources to get data set

- Can't use this for asexual species

72

Phylogenetic species concept

- Smallest monophyletic group distinguished by a shared derived character

- Only when other two don't work

Strengths:

- Very objective methodology

Weaknesses:

- Difficulties for asexual species

- Takes time and effort but it is only one thing, not multiple

73

Review the Wheeler paper and his arguments for the Phylogenetic Species Concept (PSC) being the single, unifying species concepts.

Even that has its own weaknesses because it’s a complex method when you can use simple morphological species concept that applies to species

74

What biological process does the PSC have a particular problem with?

Asexual reproducers ex: e. coli Hybrids and horizontal gene transfer, interbreeding would mess things up because trees would turn into networks