(main page)
MCL - a cluster algorithm for graphs
Four iterands of MCL visualized
Characteristics of MCL

The algorithm simulates flow using (alternating) two simple algebraic operations on matrices. Its formulation is simple and elegant. There are no high-level procedural instructions for assembling, joining, or splitting of groups - cluster structure is bootstrapped via a flow process that is inherently affected by any cluster structure present.

The first operation used is expansion, which coincides with normal matrix multiplication. Expansion models the spreading out of flow, it becoming more homogeneous. The second is inflation, which is mathematically speaking a Hadamard power followed by a diagonal scaling. Inflation models the contraction of flow, it becoming thicker in regions of higher current and thinner in regions of lower current. The MCL process causes flow to spread out within natural clusters and evaporate inbetween different clusters. This animated example of an MCL process may give you an impression of how it works.


By varying a single parameter, clusterings on different scales of granularity can be found. The number of clusters can not and need not be specified in advance, but the algorithm can be adapted to different contexts.


The issue 'how many clusters?' is not dealt with in an arbitrary manner, but rather by strong internal logic. Cluster structure leaves its marks on the flow process simulated by the algorithm, and the flow parameters control the granularity of the cluster imprint.


The limit of the MCL process (the process simulated by the algorithm) is in general extremely sparse, and the iterands are sparse in a weighted sense. This gives the means to scale the algorithm drastically, leading to a worst-case complexity of order Nk^2, where N is the number of nodes of the input graph, and where k is a threshold for the number of resources allocated per node.


The iterands of the MCL process have structural properties which allow a cluster interpretation, and which generalize the mapping of limits onto clusterings. The mathematics associated with the process shows that there is an intrinsic relationship between the MCL process and cluster structure in graphs. This is valuable given the many heuristic approaches in cluster analysis.


An optimized MCL implementation, such as found on this page, should have complexity O(N k2), where N is the number of nodes in the graph, and k is the number of resources allocated per node. This number can be chosen surprisingly low without affecting clustering quality. The reason is that MCL computes very much a localized process, and consequently it is possible to implement a pruning regime that takes advantage of this. Disappointingly, a number of publications claim that MCL's complexity is O(N3), however, this is only true if only an extremely naive implementation is considered. The fact that MCL is naturally described in matrix algebra has perhaps led people to postulate a time complexity cubic in the size of the graph, disregarding the fact that these matrices are generally very sparse. For more information refer to the section about speed and memory.