FitchAln

INSTALLATION

After downloading the file FitchAln.tar to your computer, untar it by the following command, and the FitchAln package will be created.

  • tar xvf FitchAln.tar

  • The FitchAln software package is a free software developed in LINUX, compiled using g++ (4.4.1 or above) in LINUX operating system with the following command (Tests have been done in Fedora 11).

  • make

  • DESCRIPTION:

    Given a newick binary tree T and its correpsonding multiple sequence alignemnt A of size n x m, where n is the number of leaf nodes in T and m is the length of the alignment, FitchAln generates a (n-1) x (m+1) Fitch score matrix representing the (Maximum Parsimony) number of mutations for each site for each internal node. We number the root of T as 0 and number all the remaining internal nodes in Broadth First Order. Each row i of the matrix represents an internal node numbered i. Therefore, row 0 represents the root, and there are (n-1) rows in total. Each column of the matrix, except the last column, represents a site of the sequence alignment. For all 0<=i<=n-2 and 0<=j<=m-1, cell(i,j) of the matrix shows the number of mutations for site j of the alignment for an internal node numbered i; while each cell in the last column gives the total numbers of mutations over all sites of the alignment for the internal node. That is, for all 0<=i<=n-2, when j equals to m, cell(i, m) equals to the sum of cell(i,0), ..., cell(i,m-1).

    Input:
    A newick binary tree T and a multiple sequence alignment A (of clustalw or fasta format). (T has n leaves in total, each leaf corresponds to exactly one sequence in A and each sequence has m characters, so that T has 2n-1 nodes in all and A has a size of n x m.)
    Output:

    Format in FIRST PART:

    There are n-1 rows in the first part in total. Each row has two columns, the first column is node and the second column is subtree.
    node: [vi] (i is an integer ranged from 0 to n-2), represents an internal node of T numbered i. Note that the root of T is numbered 0 and all the remaining internal nodes are numbered in Breadth First Order. As T has n-1 internal nodes, these internal nodes are numbered from 0 to n-2.
    subtree: a newick format subtree rooted at an internal node numbered i.

    Format in SECOND PART

    This is a (n-1) x (m+1) Fitch score matrix representing the (Maximum Parsimony) number of mutations for each site of the alignment for each internal node. Each row i of the matrix represents an internal node numbered i, and each column j, except the last one, represents site j of the alignment. For all 0<=i<=n-2 and 0<=j<=m-1, cell(i,j) of the matrix represents the Fitch score of site j of the alignment of the internal node numbered i; if j equals m, for all 0<=i<=n-2, cells(i,m) is the sum of cell(i,0),..,cell(i,m-1). Note that row 0 always represents the root (which is numbered 0).

    PROGRAM:

    ./FitchAln -i alnfile -f alnformat -t treefile -o outfile
  • input_aln : an incoming multiple sequence alignment (MSA) file
  • input_tree: an incoming (binary newick) phylogenetic tree, each exterior node of which represents a distinct sequence in the incoming MSA
  • output: an outcoming result
  • alnformat : format of input_aln, 0-clustalw, 1-fasta
  • example:

    ./FitchAln -i example/T1.aln -f 0 -t example/T1.tree -o tmpout

    Download:

    Download FitchAln Package Here, updated on March 21 2010

    Questions?

    If you have any questions, please send an email to Yuan Li ( liy@cs.ucf.edu )