SEMPHY - Structural EM Phylogenetic Reconstruction
Version 2.0 is now available!!
Don't miss out on the new among-site-rate-variation-aware iterative Neighbor
Joining (see below and in the user manual for more information)
Summary
SEMPHY is a tool for data-intensive phylogenetic reconstruction.
SEMPHY infers
phylogenies by Maximum Likelihood, the most established criterion for
finding the correct phylogenetic tree. SEMPHY searches for both the
most
likely topology of the evolutionary tree, and the optimal lengths of
its
branches. It uses the algorithmic paradigm of Structural EM in a new
computational method for phylogenetic inference, making computation
both
effective and efficient: SEMPHY can handle very large data sets with
both
good accuracy and reasonable running time. We will refer to the EM
procedure as the "SEMPHY step". For a full description of the
SEMPHY algoritm please see Friedman et
al. 2002 in the publications section below.
The new version of SEMPHY 2.0 uses Maximum Likelihood methods and EM
to improve the accuracy of pairwise distance estimation for Neighbor
Joining (NJ) by taking into account among-site rate variation. SEMPHY
uses this novel variant of NJ to construct an "initial
guess" for the tree which is used as the starting point for the SEMPHY
search. The new NJ can also be used by itself where Maximum Likelihood
methods
like SEMPHY are not suitable, for example, where more than 500
sequences are involved. For a full description of the improved NJ
please see Ninio et al. 2006
in the publications section below.
Main Features
Version 2.0b1 (beta testing, October, 2006)
- A user manual describes the options and parameters for SEMPHY and
gives example of common usage (the manual is available with your
download after registration)
- Novel iterative variants of Neighbor Joining - use Bayesian
inference of the rate at each site (Among-Site Rate Variation) to
improve the accuracy of NJ. See the --posterior option in the
user manual. The algorithm is described in Ninio et al. 2006
(referenced below)
- The new NJ can be used on its own, or as the initial guess of the
tree followed by the SEMPHY search for the Maximum Likelihood tree.
- Handle datasets of many thousands of sequences. See the section
on "Running on large datasets" in the user manual.
Version 1.0a5 (alpha testing, March, 2004)
- Multiple types of sequences: Amino acid or nucleotide sequences
are currently supported.
- Multiple evolutionary models, including Jukes-Cantor, Dayhoff,
Kimura 2-parameter model, REV, JTT, WAG and cpREV.
- Supports user supplied evolutionary models.
- Supported input formats: Phylip, Molphy, Clustal, FASTA, MASE.
- Handles gaps.
- Support for Among Site Rate Variation in NJ.
- Can handle up to 100 taxa on a standard PC and several hundred
taxa with the improved NJ
- Available for Linux, Windows, osX and UNIX platforms.
Development Team
- Nir Friedman, Computer Science & Engineering,
Hebrew University, Jerusalem.
- Matan Ninio,
Computer Science &
Engineering, Hebrew University, Jerusalem.
- Tal Pupko, The department of cell research
and immunology ,George S. Wise Faculty of Life Sciences, Tel Aviv
University, Tel Aviv.
- Eyal Privman, The department of cell research
and immunology ,George S. Wise Faculty of Life Sciences, Tel Aviv
University, Tel Aviv.
- Itshak Pe'er, Molecular Genetics,
Weizmann Institute of Science.
Publications
- A
Structural EM Algorithm for Phylogenetic Inference by N. Friedman, M. Ninio, I. Pe'er, and T. Pupko.
Journal of Computational Biology, 2002; 9(2):331-53 PostScript,
PDF.
- Earlier version appeared in Proc. Fifth Annual Inter. Conf.
on Computational Molecular Biology (RECOMB), 2001. PostScript,
PDF.
- Presentation at RECOMB01: Powerpoint
Presentation, HTML Presentation
- Phylogeny Reconstruction: Increasing the Accuracy of Pairwise
Distance Estimation Using Bayesian Inference of Evolutionary Rates
by M. Ninio*,
E. Privman*, T. Pupko, and N. Friedman.
ECCB 2006.
Bioinformatics
2007 23: e136-e141. *These authors contributed equally
Please cite the appropriate reference if you use SEMPHY in your publications.
Availability
Semphy can be downloaded from The SEMPHY download page
If you are interested in using SEMPHY,
please join the SEMPHY announcement mailing list, by sending an email
with the subject "subscribe" to
semphy-announce-request@cs.huji.ac.il. This is a very low volume
list that is used only for announcement on new SEMPHY
versions, and will not be passed on to anyone. You may unsubscribe
at any time by sending the word "unsubscribe" as subject to the same
address.
If you require assistance, you can contact us at semphy@cs.huji.ac.il or join
the SEMPHY-users mailing list by sending an email with the word
"subscribe" as the
subject to
semphy-users-request@cs.huji.ac.il and asking there. Again, you may
unsubscribe at any time by sending "unsubscribe" to the same
address.