(main page)
Network analysis
MCL-edge: Analysis of networks with millions of nodes

MCL-edge is a collection of ready-made network analysis tools that includes the clustering program MCL. MCL-edge is dedicated to analysis of very large networks, scaling up to millions of nodes and hundreds of millions of edges. It is comprised of a small set of tools supporting algorithms that are both commonly used and scale well. The tools are ready-to-run, command-line based, often allowing multi-processing and job dispatching. It is complementary to software packages such as R igraph and NetworkX that support a wider assortment of algorithms.

In MCL-edge network transformations are available to most programs for on-the-fly transforms obviating the need to save multiple network representations on disk. Stream-based interfaces allow easy incorporation into bioinformatic (or other) pipelines, and a binary network format allows very fast reading of very large networks, e.g. 10 million edges per second on moderate hardware. Compute intensive tasks (such as network centrality, and network creation from all pairwise correlations on large array data) can be tackled by using both multi-core and multi-machine parallelism. In a nutshell, MCL-edge supports

i
flexible network loading, creation, and conversion (with mcxload, mcxarray, mcxrand, and mcxdump)
ii
network transformations (with mcx in modes query and alter, the -tf option in many programs)
iii
computation of network traits (with mcx in modes query, ctty, clcf and others)
iv
clustering (with mcl)
v
clustering comparison and reconciliation (with clm in modes meet, dist, order and others)
Example snippets

There is a preliminary manual page describing some of these protocols. A further resource is a list of tools in MCL-edge.

 
Source code

For reasons of continuity, the software source code is for now still using the tag mcl and named using the format mcl-YY-DDD. Here YY indicates years after 2000 and DDD is count of days in the year. All MCL-edge programs and documentation are included however and will be installed by default.

 
Book chapter

A book chapter on protocols using tools from MCL-edge has been published, authored by Stijn van Dongen and Cei Abreu-Goodger. It is the chapter Using MCL to extract clusters from networks (pubmed 22144159) in Bacterial Molecular Networks : Methods and Protocols, edited by Jacques van Helden, Ariane Toussaint and Denis Thieffry. The title of the chapter notwithstanding, the workflows described in the chapter cover other aspects such as network creation and thresholding, as well as the analysis of clusterings once produced.

The first workflow in the chapter is based on a protein sequence similarity network based on the proteomes of twenty eight fully-sequenced genomes of the order Rickettsiales. These intracellular alphaproteobacteria have complex life cycles, involving arthropod and mammalian hosts, and their highly stream- lined genomes have become the paradigm of reductive evolution. They represent the closest extant relatives of the ancestor to mitochondria. Beyond having small genomes, Rickettsiales can still be quite different, reflecting several hundred million years of evolution. This case study illustrates how orthologs are separated from paralogs with mcl. The second workflow is based on an Escherichia coli co-expression network, based on expression across 466 different conditions, allowing the extraction of clusters of diverse functions. If you like have a look at a late-stage draft of the chapter.

 
Related software

The igraph package offers a large array of network algorithms and is far broader in scope compared to MCL-edge. It integrates well with R and is thus more flexible by virtue of being programmable. I highly recommend it, but I also believe that MCL-edge has something of an edge when it comes to very large networks and ease of coupling components and including them in larger (bioinformatic) pipelines. In other words, igraph and MCL-edge are by and large complementary.