Synthetic Benchmarks

Evaluating Community Detection algorithms on ground truth communities can be tricky when the annotation is based on external semantic information, not on topological ones.

For this reason, cdlib integrates synthetic network generators with planted community structures.

Note

The following lists are aligned to CD evaluation methods available in the GitHub main branch of cdlib.

Static Networks with Community Ground Truth

Benchmarks for plain static networks. All generators return a tuple: (networkx.Graph, cdlib.NodeClustering)

GRP(n, s, v, p_in, p_out, directed, seed) Generate a Gaussian random partition graph.
LFR(n, tau1, tau2, mu, average_degree, …) Returns the LFR benchmark graph and planted communities.
PP(l, k, p_in, p_out, seed, directed) Returns the planted l-partition graph.
RPG(sizes, p_in, p_out, seed, directed) Returns the random partition graph with a partition of sizes.
SBM(sizes, p, nodelist, seed, directed, …) Returns a stochastic block model graph.

Benchmarks for node-attributed static networks.

XMark(n, gamma, beta, m_cat, ), theta, mu, …) Returns the XMark benchmark annotated graph and planted communities.

Dynamic Networks with Community Ground Truth

Time evolving network topologies with planted community life-cycles. All generators return a tuple: (dynetx.DynGraph, cdlib.TemporalClustering)

RDyn(size, iterations, avg_deg, sigma, …) RDyn is a syntetic dynamic network generator with time-dependent ground-truth partitions having tunable quality (in terms of conductance).