SEMPHY
(Version 2.00)
This manual covers the following topics:
All uses of
SEMPHY are by calling the executable semphy from a command prompt with
some parameters:
semphy
[parameters...]
Here we give a few examples of using SEMPHY for the most common tasks. Below is a table with the full list of options, which is also available by typing 'sepmhy -h' at the command prompt.
Neighbor
Joining (NJ) trees
Running standard NJ on
protein
sequences using the JTT replacement matrix:
semphy
-s prots.fasta -o out.txt -T prots.tree -l log.txt -a 20 --jtt -J -H
(Meaning: Input is a fasta sequence file and ouput is written to
three files (general output,
tree file and log file). Use alphabet of 20, i.e. amino
acids.
Use the JTT matrix. Do Neighbor Joining, with a homogeneous rates
model)
Same thing, but using the new
iterative NJ method with Bayesian estimation of the rate at each
site (i.e. "posterior" estimates of the rates. -O requests optimization
of rate parameters) For a description of the algorithm please see our
paper referenced on the SEMPHY homepage (Ninio et al. 2006).
semphy
-s prots.fasta -o out.txt -T prots.tree -l log.txt -a 20 --jtt
--posteriorDTME -O
Running iterative NJ on DNA
sequences, using the HKY model:
semphy
-s genes.fasta -o out.txt -T genes.tree -l log.txt -a 4 --hky
--posteriorDTME -O
(Alphabet of 4
indicates DNA or RNA)
Same thing, with
100 bootstrap
iterations:
semphy
-s genes.fasta -o out.txt -T genes.tree -l log.txt -a 4 --hky
--posteriorDTME -O --BPrepeats 100
Maximum Likelihood (ML) trees using SEMPHY
Running SEMPHY to find the ML tree for a set of protein
sequences using the JTT replacement matrix: (Standard NJ
will be used)
semphy
-s prots.fasta -o out.txt -T prots.tree -l log.txt -a 20 --jtt -S -O
(Meaning: Run SEMPHY on a fasta sequence file and write outputs,
tree and log files. Use alphabet of 20, i.e., amino acids.
Use the JTT matrix. Do SEMPHY steps, -O requests optimization of
rate parameters)
Same thing, but using the new
iterative NJ method for the initial tree:
semphy
-s prots.fasta -o out.txt -T prots.tree -l log.txt -a 20 --jtt -S
--posteriorDTME -O
Same thing, with
100 bootstrap
iterations:
semphy
-s genes.fasta -o out.txt -T genes.tree -l log.txt -a 4 --hky
-S --posteriorDTME -O --BPrepeats 100
List of options and parameters
The following table lists most of the available options
and parameters (the full list can be printed by typing
'semphy -h' at the command prompt)
| Flag | Full name | Description | Default | |
| -h | --help |
Print help and exit |
||
| --full-help |
Print help, including advanced
options, and exit |
|||
| -s [MSA file] | --sequence |
The input sequence file. The following formats are supported: Mase, Molphy, Phylip, Clustal, Fasta | Obligatory | |
| -t [tree file] | --tree |
An initial input tree file (in Newick format) | Optional | |
| -o [output file] | --outputfile |
File for general outputs |
Optional | |
| -T [output tree file] | --treeoutputfile |
Output of the final tree | Optional | |
| -l [log file] | --Logfile |
Log file | Optional | |
| -v [verbosity level] |
--verbose |
Verbosity level of the log file
(between 0 and 10) |
Optional |
|
| -a [alphabet] | --alphabet |
4 - nucleotides; 20 - amino acids; 61 - codons |
20 | |
| --BPrepeats |
Perform a number of bootstrap iterations | Optional | ||
| -S |
--SEMPHY |
Do SEMPHY steps to search for
the ML tree |
Optional |
|
| Distance Table Estimation Method (DTME) | Choice of NJ variant to be used
in SEMPHY, or by itself. Specifies the method that will be used
in NJ to calculate the distances table. Standard NJ is
-J. The recommended iterative NJ method is --posteriorDTME. Simple pairwise methods: -J is standard NJ, using ML distance with a homogeneous rates model (also evoked by --homogeneousRatesDTME). ML distance with a Gamma-ASRV model (--pairwiseGammaDTME) is usually not recommended. Iterative distance-based tree reconstruction methods: Using the common alpha parameter (--commonAlphaDTME) Using the ML rate for each site (--rate4siteDTME) Using the posterior distribution of the rate at each site (--posteriorDTME) |
NJ is not run unless some method was requested, or if -S is used with no method chosen then -J is implied | ||
| Evolutionary model |
The following models are supported: |
JTT | ||
| 15acdfgilswAmong Site Rate Variation | NOTE:
Either -H, -A, or -O must be given |
|||
| -H | --homogeneous |
A homogeneous rates model (no Gamma ASRV) | See above |
|
| -A [alpha] | --alpha |
Set the initial alpha parameter for Gamma ASRV | See above | |
| -O | --optimizeAlpha |
Optimize the alpha parameter for the reconstructed tree | See above | |
| -C [categories number] | --categories |
The number of discrete categories used in the
approximation of the Gamma distribution of rates |
8 |
Running on large datasets