IntroductionPhylogenetic analysis has become an integral part of many biological research programs. These include such diverse areas as human epidemiology, viral transmission, biogeography, and systematics. With the advent of high throughput sequencing, an increasingly large volume of sequence data are becoming available. Scientists should be able to take advantage of these data and also of the research that others have performed. For example, when a new virus is detected, it should be possible to estimate a phylogenetic tree (an evolutionary history) containing all related viruses and the unknown variant in order to answer questions such as:
- Where did this virus come from?
- When did this virus arrive in the human population?
- Which related species might have antibodies appropriate for testing in develop- ing new treatments?
- Has this virus been genetically modified through natural or human induced recombinant technology?
- How is this virus evolving and what genetic changes occurred to allow it to successfully enter the human population?
Unfortunately, this kind of phylogenetic search is currently computationally infeasible. The time it takes to perform a complete search using maximum likelihood exceeds several months with even a small number of sequences (on the order of 100- 200). In the case of the SARS epidemic, and others like it, key information must be available in days or at most weeks in order for appropriate action to be taken. Much of the problem comes from the culture and software design for most phylogenetic software packages. These packages require the user to start a search from scratch every time a new sequence is added to the search (this is exactly the situation when a new antigen is observed). The software packages also do not allow users to share partial trees that could speed up the phylogenetic search process. This creates a culture where investigators see little or no benefit to collaborate in phylogenetic research.
What if it were possible to utilize trees from previous phylogenetic searches as a starting point for future searches? The jumpstarting algorithm presented in this thesis allows researchers to use previously generated phylogenetic trees to create better start tree for future searches. By utilizing jumpstarting, it is possible to find better trees in less time than conducting a naive phylogenetic search.
Although jumpstarting may seem like an intuitive concept, there are many factors that must be considered if prior trees are actually going to be of benefit. Through investigating the influence of these factors, researchers can make correct decisions when utilizing jumpstarting to speed up the generation of phylogenetic trees.
The following resources are available on jumpstarting
- A M.S. thesis describing the system. The appendix includes sql code for creating the database.
- Source code.
[an error occurred while processing this directive]