This introduction was copied from the technical report A stochastic uncoupling process for graphs. There is also a less technical introduction to the MCL algorithm.

An MCL process is characterized by an
infinite row of pairs $(e$_{i}, r_{i}), where the $e$_{i} are integers greater
than one, and the $r$_{i} are real numbers greater than zero. An input matrix
M yields an infinite row of matrices $M$_{(i)} by
setting $M$_{1} = M, defining the even-labeled iterands
by setting $M$_{2i} to $M$_{2i-1} raised to the power $e$_{i},
and the odd-labeled iterands by $M$_{2i+1} = $\Gamma $_{r_i}(M_{2i}).
The operator $\Gamma $_{r_i} transforms a column-stochastic matrix into another
column-stochast matrix by raising each entry to the power $r$_{i}
and rescaling the result to be stochastic again. For stochastic matrices
diagonally similar to a symmetric matrix, the type of limit invariably found
is that of a doubly idempotent matrix; idempotent under both matrix squaring
and matrix inflation.

A nonnegative idempotent matrix $L$ without zero columns induces an overlapping clustering on the column indices with the property that each cluster has at least one element not contained within any of the other clusters. The number of clusters, say $k$, is equal to the multiplicity of the eigenvalue 1 in the spectrum of $L$. The sets of unique elements in the clusters form the strongly connected components in the associated graph of $L$, the number of which is also $k$. Experiments yield that initiating an MCL process with an input matrix, the associated graph of which has only one strongly connected component, may in general give an idempotent limit with a larger number of connected components. Interpreting the connected components as clustering yields a clustering whose distribution is invariably strongly related to the density characteristics of the input matrix. In this sense, the MCL process appears to be useful.

The MCL process converges quadratically in the neighbourhood of doubly idempotent stochastic matrices for which all columns have precisely one nonzero entry. It converges quadratically on a macroscopic scale (in terms of block structure) for doubly idempotent stochastic matrices in general. The clustering associated with such a matrix is stable under perturbations of the MCL process (that is, it is essentially defined by the block structure), except for the phenomenon of overlap. This is covered extensively in this technical report.

The limit resulting from an MCL process is in general extremely sparse. Current evidence suggests that limit matrices which have columns with more than one nonzero entry imply the existence of a nontrivial automorphism for the underlying graph of the input matrix. The sparseness of the limit, and the sparseness in a weighted sense of intermediate iterands have nice and important repercussions for the scalability of the cluster algorithm based on the MCL process.

The MCL process is interesting from a mathematical point of view,
since it apparently has the power to inflate the spectrum of a stochastic matrix, by
pressing large eigenvalues towards 1. This effect is strong enough to
overcome the effect of matrix exponentiation, which has the property of
exponentiating the associated eigenvalues.
The fundamental property established is
that $\Gamma $_{r} maps two nested classes of stochastic matrices with
real spectra onto themselves. The largest
class is that of diagonally symmetrizable stochastic matrices, i.e. matrices
which are diagonally similar to a symmetric matrix without further
constraints. This class is mapped onto itself by $\Gamma $_{r} for
arbitrary $r$ in $\mathbb{R}$. Defining diagonally positive semi-definite
(*dpsd*) as the property of being diagonally similar to a positive
semi-definite matrix, the second class is that of stochastic *dpsd*
matrices. This class is mapped onto itself by $\Gamma $_{r} for $r$ in $\mathbb{N}$.

Using the property that minors of a *dpsd* matrix $A$ are nonnegative, it is
shown that the relation $~~>$ defined on the nodes of its associated
graph by $q~~>\; p\; iff\; |A$_{pq}| > |A_{qq}|,
for $p$ and $q$ different, is a directed
acyclic graph (DAG) if indices of identical (modulo multiplication by a
scalar on the complex unit circle) columns, resp. rows are lumped together.
This generalizes the mapping from nonnegative idempotent column
allowable matrices onto overlapping clusterings,
and it sheds some light on the tendency of the MCL process limits to have a
larger number of strongly connected components than the input graph. It is
then shown that applying $\Gamma $_{inf} to a
stochastic *dpsd* matrix $M$ yields a
matrix that has spectrum of the form $\{0n-k,\; 1k\}$,
where $k$, the multiplicity of the eigenvalue 1, equals the number of
endclasses of the ordering of the columns of $M$ provided by the associated
DAG. It is not necessarily true that $\Gamma $_{inf}(M) is idempotent.
However, the observation is confirmed that $\Gamma $_{r} tends to inflate the
spectrum of M for $r>1$, as $\Gamma $_{r}(M) may be regarded as a function of
varying $r$ for fixed $M$, and as such is continuous.