The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs. It has found usage in bioinformatics and other disciplines. These pages contain descriptions, an animation, various bits and pieces, and mcl software. The software scales to millions of nodes and hundreds of millions of edges, running on a single machine and utilising multiple CPUs. The software is widely used, part of various Linux distributions, and has been stable for about a decade now. The mcl program comes with various network loading, creation and manipulation companions, and development is focused on these.
If you use MCL, please cite
van Dongen, Stijn, Graph clustering via a discrete uncoupling process, Siam Journal on Matrix Analysis and Applications 30-1, p121-141, 2008. ( https://doi.org/10.1137/040608635 ).
The basic interface to the algorithm is very simple - you need only one option (the -I flag) to get to the heart of it. The number of clusters cannot be specified. It is implicitly controlled using the inflation parameter. Inflation affects the granularity or resolution of the clustering outcome, with low values (1.3, 1.4) leading to fewer and larger clusters and high values (5, 6) leading to more and smaller clusters; the default value of 2 is a good starting point. For large graphs you should also be aware of the -scheme flag for regulating resources. The default approach is to vary the argument to -I over some interval (running mcl for each value).
MCL code and development can now be found
Do not use the tar releases that github automatically builds, but use the releases here, including the latest stable release 14-137. To build the source from github you will need GNU autotools and autoconf. This is only interesting if you wish to contribute to MCL development.
Recent MCL releases (starting with version numbers 22-ddd) require additional compilation of cimfomfa, a C utility library. Use the script install-this-mcl.sh to download and install the most recent MCL release, or use it as a simple installation recipe. For compilation/installation advice the github mcl page is the best source of information and also the place to open a discussion or issue.
For a full description of the MCL algorithm and process you are advised to read one of the technical reports among the publications. It is also possible to view a somewhat longer introduction or an introduction to some of the mathematics associated with MCL.
As of release 22-282 MCL ships with RCL or Restricted Contingency Linkage, a fast multi-resolution consensus clustering method. Visit github for more information or check out the preprint on bioRxiv.
The MCL algorithm was invented/discovered by Stijn van Dongen at the Centre for Mathematics and Computer Science (also known as CWI) in the Netherlands. The PhD thesis Graph clustering by flow simulation is centered around this algorithm, the main topics being the mathematical theory behind it, its position in cluster analysis and graph clustering, issues concerning scalability, implementation, and benchmarking, and performance criteria for graph clustering in general. The work for this thesis was carried out under supervision of Jan van Eijck and Michiel Hazewinkel. The thesis, technical reports, and preprints can be found in this section. For quickly getting an idea of how MCL operates, consider the flow pictorial at the top of this page, or even better, have a look at an animation of the MCL process.
MCL has been applied in a number of different domains, mostly in bioinformatics. Currently the number of papers citing core MCL publications is well over ten thousand. Get a quick impression from Google Scholar for the Enright/van Dongen/Ouzounis paper, my thesis, or the SIAM paper. Also of interest is the OrthoMCL paper.