Tired of what ape’s collapse.singles() does? Here’s a way to put those nodes back…

R logo

When doing phylogenetics with R one often uses (directly or indirectly) the ape library, but there’s a catch: many times one has to get rid of nodes that have a single descendant by calling the collapse.singles() function.

An example might help here:

> library(phytools)
Loading required package: ape
Loading required package: maps
> tree <- read.newick(text="((a:0.1)A:0.5,(b1:0.2,b2:0.1)B:0.2);")
> tree

Phylogenetic tree with 3 tips and 3 internal nodes.

Tip labels:
[1] "a"  "b1" "b2"
Node labels:
[1] ""  "A" "B"

Rooted; includes branch lengths.
> plot(tree)
Error in plot.phylo(tree) : 
  there are single (non-splitting) nodes in your tree; you may need to use collapse.singles()

The tree ((a:0.1)A:0.5,(b1:0.2,b2:0.1)B:0.2) can be displayed as:

<tree> (3 tips, 2 levels)
├─────A : 0.500
│     └─a : 0.100
└──B : 0.200
   ├──b1 : 0.200
   └─b2 : 0.100

and it can be seen that node A has only one descendant a.

By calling collapse.singles(tree) one obtains the reduced tree (a:0.6,(b1:0.2,b2:0.1)B:0.2) with the node A removed; this can now be plotted (among other things):

The tree with the single node A removed.
The tree with the single node A removed.

But one manipulation one might want to do (and I do it myself a lot!) is to change the branch length (using, for example, nnls.tree() in phangorn) but then how do you get back to the original tree topology (i.e., re-insert the collapsed nodes)?

 

This is where a piece of R code I wrote recently (part of a larger script that I will discuss later) might help. I basically extended the original collapse.singles()‘s code by also recording the information about the removed nodes (their ancestor — if any — and their unique descendant as well as the branch length from the ancestor — if any — and to the descendant). This information can be used later to re-insert back the removed singles.

The code is implemented as a pair of functions

collapse.singles.reversible(tree)

and

reverse.collapse.singles(collapsed.tree, reverse.info, restore.brlen.method=c("original.proportion","equal.proportion","first.zero"))

where tree and collapsed.tree are objects of class phylo (such as tree in our example), reverse.info is a list containing the information needed to re-insert the removed singles and is returned by collapse.singles.reversible(), and restore.brlen.method controls the manner in which the length of the collapsed branch is distributed when the single is re-inserted: "original.proportion" means that the proportion between (ancestor → single) and (single  → descendant) in the original tree is maintained, "equal.proportion" means that the length is divided equally between the two new branches, while "first.zero" simply sets the ancestor → single branch length to 0.

Again, an example might help:

> reverse.info <- collapse.singles.reversible( tree )
> reverse.info
$original.tree

Phylogenetic tree with 3 tips and 3 internal nodes.

Tip labels:
[1] "a"  "b1" "b2"
Node labels:
[1] ""  "A" "B"

Rooted; includes branch lengths.

$collapsed.tree

Phylogenetic tree with 3 tips and 2 internal nodes.

Tip labels:
[1] "a"  "b1" "b2"
Node labels:
[1] ""  "B"

Rooted; includes branch lengths.

$removed.singles
  node name prev.node next.node prev.brlen next.brlen
1    5    A         4         1        0.5        0.1

$original.root
[1] 4
> 

The list reverse.info contains the original tree (in the field original.tree), the tree with the singles removed (in the field collapsed.tree) and the information on what nodes have been removed (in the field removed.singles) as well as the original tree’s root (in original.root). You can then play with the new collapsed tree by, for example, changing the branch lengths:

> tree2 <- reverse.info$collapsed.tree

> plot(tree2)

> tree2$edge.length[1] <- 3.0 # alter the length of the branch leading to a

> plot(tree2)
collapsed-tree-new-brlen
> tree3 <- reverse.collapse.singles(tree2, reverse.info, "original.proportion")
> tree3

Phylogenetic tree with 3 tips and 3 internal nodes.

Tip labels:
[1] "a"  "b1" "b2"
Node labels:
[1] ""  "A" "B"

Rooted; includes branch lengths.
>

tree3 cannot of course be plotted using ape (because of the re-inserted single node A) but its structure is:

<tree3> (3 tips, 2 levels)
├─────────────────────────A : 2.500
│                         └─────a : 0.500
└──B : 0.200
   ├──b1 : 0.200
   └─b2 : 0.100

It can be seen that not only A is safely re-inserted between the root and a, but that the new branch lengths respect the original ratio: (root → A) is 2.5 (original was 0.5) and (Aa) is 0.5 (original was 0.1) giving a ratio of 5:1.

Likewise,

> tree4 <- reverse.collapse.singles(tree2, reverse.info, "equal.proportion")
> tree4

Phylogenetic tree with 3 tips and 3 internal nodes.

Tip labels:
[1] "a"  "b1" "b2"
Node labels:
[1] ""  "A" "B"

Rooted; includes branch lengths.
>

is:

<tree4> (3 tips, 2 levels)
   ├───────────────A : 1.500
   │                 └───────────────a : 1.500
   └──B : 0.200
      ├──b1 : 0.200
      └─b2 : 0.100

with the two new branches equal to 1.5.

 

The code is quite well tested but bugs are of course possible! Likewise, it is relatively fast but I guess it can always be optimized. If you find any bugs, have suggestions or comments, please drop me an e-mail.

 

Now, the actual code, released here under a GLPv2 license (ape is also GPL ≥ 2 and I used collapse.singles()‘s code as a foundation for my own), is given below. Feel free to us it!

# Extend ape's collapse.singles() to allow restoring the removed nodes.
# Copyright (C) 2015  Dan Dediu

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.


# Auxiliary functions required by reverse.collapse.singles():
# Insert a row in a matrix at a given position:
.insert.row <- function(m, row=rep(NA,ncol(m)), where=nrow(m))
{
  if( is.null(m) || !is.matrix(m) || where <= 0 || length(row) != ncol(m) ) return (m); # nothing to do
  rnames <- rownames(m);
  if( where == 1 )
  {
    m <- rbind(row,m);
  } else if( where <= nrow(m) )
  {
    m <- rbind(m[1:(where-1),], row, m[where:nrow(m),]);
  } else if( where == nrow(m)+1 )
  {
    m <- rbind(m,row);
  } else
  {
    m <- rbind(m,matrix(NA,ncol=ncol(m),nrow=where-nrow(m)-1),row);
  }
  if( is.null(rnames) ) rownames(m) <- NULL;
  return (m);
}

# Insert an element in a vector at a given position:
.insert.element <- function(v, element=NA, where=length(v))
{
  if( is.null(v) || where <= 0 ) return (v); # nothing to do
  rnames <- names(v);
  if( where == 1 )
  {
    v <- c(element,v);
  } else if( where <= length(v) )
  {
    v <- c(v[1:(where-1)], element, v[where:length(v)]);
  } else if( where == length(v)+1 )
  {
    v <- c(v,element);
  } else
  {
    v <- c(v,rep(NA,where-length(v)-1),element);
  }
  if( is.null(rnames) ) names(v) <- NULL;
  return (v);
}

# Collapse single nodes also returning the collapsed nodes so that the original topology can be reconstructed later on:
collapse.singles.reversible <- function(tree) # return a list containing the new tree and the collapsed singles
{
  # This is shamelessly inspired from ape's original collapse.singles() function
  # Cache several tree properties:
  elen <- tree$edge.length;
  xmat <- tree$edge;
  node.lab <- tree$node.label;
  nnode <- tree$Nnode;
  ntip <- length(tree$tip.label);
  
  # The original tree's root:
  root <- find.root(tree);
  
  # Start processing the singles:
  singles <- NA;
  removed.singles <- data.frame("node"=rep(NA,nnode), "name"=NA, "prev.node"=NA, "next.node"=NA, "prev.brlen"=NA, "next.brlen"=NA); k <- 1; tree.orig <- tree; 
  while( length(singles) > 0 )
  {
    tx <- tabulate( xmat[,1] );
    singles <- which( tx == 1 ); # singles are those nodes with just one descendant
    if( length(singles) > 0 )
    {
      i <- singles[1]; # focus on the first single
      
      prev.node <- which( xmat[,2] == i ); # the ancestor
      next.node <- which( xmat[,1] == i ); # the single descendant
      prev.single.edge <- which(xmat[,1] == i); # the (ancestor -> single) edge

      xmat[ xmat > i ] <- xmat[ xmat > i ] - 1L; # adjust the node numbering

      # Store this to be removed node:
      removed.singles$node[k] <- i;
      if( !is.null(node.lab) ){ removed.singles$name[k] <- node.lab[i - ntip]; };
      if( length(prev.node) > 0 ){ removed.singles$prev.node[k] <- xmat[prev.node,1]; removed.singles$prev.brlen[k] <- elen[prev.node]; }
      if( length(next.node) > 0 ){ removed.singles$next.node[k] <- xmat[next.node,2]; removed.singles$next.brlen[k] <- elen[next.node]; }
      
      # Connect the (ancestor) directly to the (descendant) removing the (single):
      xmat[prev.node, 2] <- xmat[next.node, 2]; # connect the ancestor directly to the descendant        
      xmat <- xmat[ -prev.single.edge,]; # delete the (ancestor -> single) edge

      elen[prev.node] <- elen[prev.node] + elen[next.node]; # the new direct branch's length is the sum of the (ancestor -> single) and the (single -> descendant)'s branch lengths 
      if( !is.null(node.lab) ) node.lab <- node.lab[-c(i - ntip)]; # adjust the node labels as well
      nnode <- nnode - 1L; # one less node
      elen <- elen[-next.node]; # and one less edge
      
      k <- k+1; # prepare for the (possible) next removed single
    }
  }
  removed.singles <- removed.singles[ !is.na(removed.singles$node), ]; # clean them
  
  # Update the tree...
  tree$edge <- xmat;
  tree$edge.length <- elen;
  tree$node.label <- node.lab;
  tree$Nnode <- nnode;
  
  # ...and return it
  list("original.tree"=tree.orig, "collapsed.tree"=tree, "removed.singles"=removed.singles, "original.root"=root);
}

# Reverse the collapsing of single nodes:
reverse.collapse.singles <- function(collapsed.tree, reverse.info, 
                                     restore.brlen.method=c("original.proportion","equal.proportion","first.zero")) # use the original.tree and collapsed.singles list to restore the collapsed.tree
{
  if( is.null(reverse.info) || is.null(reverse.info$removed.singles) || nrow(reverse.info$removed.singles) == 0 ) return (collapsed.tree); # nothing to restore!
  
  restore.brlen.method <- restore.brlen.method[1];
  
  # Cache several tree properties:
  elen <- collapsed.tree$edge.length;
  xmat <- collapsed.tree$edge;
  node.lab <- collapsed.tree$node.label;
  nnode <- collapsed.tree$Nnode;
  ntip <- length(collapsed.tree$tip.label);
  singles <- reverse.info$removed.singles; # less typing
  
  # The removed.singles list records the order of removal: start from the end
  for( i in nrow(singles):1 )
  {
    s <- singles[i,]; # save typing...
    # Try to locate the ancestor ("prev") and decendant ("next") nodes in the tree as we need to restore this single between them:
    prev.node <- s$prev.node; next.node <- s$next.node;
    if( is.na(prev.node) )
    {
      # This is the new root: insert it:
      xmat <- .insert.row(xmat, c(s$node, next.node), 1); # insert the link
      
      # Adjust the node numbering and names:
      xmat[ xmat >= s$node ] <- xmat[ xmat >= s$node ] + 1L; xmat[1,1] <- s$node; # but make sure to keep the right root node
      if( !is.null(node.lab) ) node.lab <- .insert.element(node.lab, s$name, s$node - ntip);
      nnode <- nnode + 1L;
      
      # Update the brlens:
      if( !is.null(elen) )
      {
        elen <- .insert.element(elen, (s$next.brlen), 1);
      }
    } else
    {
      # Regular internal node:
      dlink <- which(xmat[,1] == prev.node & xmat[,2] == next.node); # locate the (ancestor -> descendant) direct link
      if( length(dlink) != 1 )
      {
        warning(paste0("Error restoring collapsed single node ",s$orig.node,ifelse(!is.na(s$node.name),paste0(", '",s$node.name,"'"),"")," cannot locate branch!\n"));
        return (collapsed.tree);
      }
      
      # Adjust the node numbering and names:
      xmat[ xmat >= s$node ] <- xmat[ xmat >= s$node ] + 1L;
      if( !is.null(node.lab) ) node.lab <- .insert.element(node.lab, s$name, s$node - ntip);
      nnode <- nnode + 1L;
      
      # Replace the direct link by two links (ancestor -> single) and (single -> ancestor):
      xmat[dlink,2] <- s$node; 
      # Insert a new link (single -> descendant):
      xmat <- .insert.row(xmat, c(s$node, ifelse(next.node >= s$node, next.node+1L, next.node)), dlink+1);
      
      # Update the brlens:
      if( !is.null(elen) )
      {
        if( !(restore.brlen.method %in% c("original.proportion","equal.proportion","first.zero")) )
        {
          warning("Unknown branch length restoration method: defaulting to original proportion\n");
          restore.brlen.method <- "original.proportion";
        }
        k <- elen[dlink]; # the brlen to be split
        if( restore.brlen.method == "original.proportion" )
        {
          elen[dlink] <- k * s$next.brlen / (s$prev.brlen + s$next.brlen);
        } else if( restore.brlen.method == "equal.proportion" )
        {
          elen[dlink] <- k / 2;
        } else if( restore.brlen.method == "first.zero" )
        {
          elen[dlink] <- 0;
        } else
        {
          # This should never happen
        }
        elen <- .insert.element(elen, (k - elen[dlink]), dlink);
      }
    }
  }
  
  # Update the tree...
  collapsed.tree$edge <- xmat;
  collapsed.tree$edge.length <- elen;
  collapsed.tree$node.label <- node.lab;
  collapsed.tree$Nnode <- nnode;
  
  # ... and return it:
  return (collapsed.tree);
}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s