cdlib.algorithms.surprise_communities

surprise_communities(g_original: object, initial_membership: list = None, weights: list = None, node_sizes: list = None) → cdlib.classes.node_clustering.NodeClustering

Surprise_communities is a model where the quality function to optimize is:

\[Q = m D(q \parallel \langle q \rangle)\]

where \(m\) is the number of edges, \(q = \frac{\sum_c m_c}{m}\), is the fraction of internal edges, \(\langle q \rangle = \frac{\sum_c \binom{n_c}{2}}{\binom{n}{2}}\) is the expected fraction of internal edges, and finally

\(D(x \parallel y) = x \ln \frac{x}{y} + (1 - x) \ln \frac{1 - x}{1 - y}\) is the binary Kullback-Leibler divergence.

For directed graphs we can multiplying the binomials by 2, and this leaves \(\langle q \rangle\) unchanged, so that we can simply use the same formulation. For weighted graphs we can simply count the total internal weight instead of the total number of edges for \(q\) , while \(\langle q \rangle\) remains unchanged.

Supported Graph Types

Undirected Directed Weighted
Yes Yes Yes
Parameters:
  • g_original – a networkx/igraph object
  • initial_membership – list of int Initial membership for the partition. If None then defaults to a singleton partition. Deafault None
  • weights – list of double, or edge attribute Weights of edges. Can be either an iterable or an edge attribute. Deafault None
  • node_sizes – list of int, or vertex attribute Sizes of nodes are necessary to know the size of communities in aggregate graphs. Usually this is set to 1 for all nodes, but in specific cases this could be changed. Deafault None
Returns:

NodeClustering object

Example:
>>> from cdlib import algorithms
>>> import networkx as nx
>>> G = nx.karate_club_graph()
>>> coms = algorithms.surprise_communities(G)
References:

Traag, V. A., Aldecoa, R., & Delvenne, J.-C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816. 10.1103/PhysRevE.92.022816

Note

Reference implementation: https://github.com/vtraag/leidenalg