markov_clustering(g_original: object, expansion: int = 2, inflation: int = 2, loop_value: int = 1, iterations: int = 100, pruning_threshold: float = 0.001, pruning_frequency: int = 1, convergence_check_frequency: int = 1) → cdlib.classes.node_clustering.NodeClustering

The Markov clustering algorithm (MCL) is based on simulation of (stochastic) flow in graphs. The MCL algorithm finds cluster structure in graphs by a mathematical bootstrapping procedure. The process deterministically computes (the probabilities of) random walks through the graph, and uses two operators transforming one set of probabilities into another. It does so using the language of stochastic matrices (also called Markov matrices) which capture the mathematical concept of random walks on a graph. The MCL algorithm simulates random walks within a graph by alternation of two operators called expansion and inflation.

Supported Graph Types

Undirected Directed Weighted
Yes No No
  • g_original – a networkx/igraph object
  • expansion – The cluster expansion factor
  • inflation – The cluster inflation factor
  • loop_value – Initialization value for self-loops
  • iterations – Maximum number of iterations (actual number of iterations will be less if convergence is reached)
  • pruning_threshold – Threshold below which matrix elements will be set set to 0
  • pruning_frequency – Perform pruning every ‘pruning_frequency’ iterations.
  • convergence_check_frequency – Perform the check for convergence every convergence_check_frequency iterations

NodeClustering object

>>> from cdlib import algorithms
>>> import networkx as nx
>>> G = nx.karate_club_graph()
>>> coms = algorithms.markov_clustering(G)

Enright, Anton J., Stijn Van Dongen, and Christos A. Ouzounis. An efficient algorithm for large-scale detection of protein families. Nucleic acids research 30.7 (2002): 1575-1584.


Reference implementation: