cdlib.algorithms.markov_clustering¶

cdlib.algorithms.markov_clustering(g_original: object, expansion: int = 2, inflation: int = 2, loop_value: int = 1, iterations: int = 100, pruning_threshold: float = 0.001, pruning_frequency: int = 1, convergence_check_frequency: int = 1) → NodeClustering¶

The Markov clustering algorithm (MCL) is based on simulation of (stochastic) flow in graphs. The MCL algorithm finds cluster structure in graphs by a mathematical bootstrapping procedure. The process deterministically computes (the probabilities of) random walks through the graph, and uses two operators transforming one set of probabilities into another. It does so using the language of stochastic matrices (also called Markov matrices) which capture the mathematical concept of random walks on a graph. The MCL algorithm simulates random walks within a graph by alternation of two operators called expansion and inflation.

Supported Graph Types

Undirected	Directed	Weighted
Yes	No	No

Parameters:

g_original – a networkx/igraph object
expansion – The cluster expansion factor
inflation – The cluster inflation factor
loop_value – Initialization value for self-loops
iterations – Maximum number of iterations (actual number of iterations will be less if convergence is reached)
pruning_threshold – Threshold below which matrix elements will be set set to 0
pruning_frequency – Perform pruning every ‘pruning_frequency’ iterations.
convergence_check_frequency – Perform the check for convergence every convergence_check_frequency iterations

Returns:

NodeClustering object

Example:

>>> from cdlib import algorithms
>>> import networkx as nx
>>> G = nx.karate_club_graph()
>>> coms = algorithms.markov_clustering(G)

References:

Enright, Anton J., Stijn Van Dongen, and Christos A. Ouzounis. An efficient algorithm for large-scale detection of protein families. Nucleic acids research 30.7 (2002): 1575-1584.

Note

Reference implementation: https://github.com/GuyAllard/markov_clustering