Evaluation and Benchmarking

The evaluation of Community Discovery algorithms is not an easy task. cdlib implements two families of evaluation strategies:

  • Internal evaluation through fitness scores;
  • External evaluation through partitions comparison.

Moreover, cdlib integrates both standard synthetic network benchmarks and real networks with annotated ground truths, thus allowing for testing identified communities against ground-truths.

Finally, cdlib also provides a way to rank clustering results generated by a set of algorithms over a given input graph.


The following lists are aligned to CD evaluation methods available in the GitHub main branch of cdlib.

Internal Evaluation: Fitness scores

Fitness functions allows to summarize the characteristics of a computed set of communities. cdlib implements the following quality scores:

avg_distance(graph, communities, **kwargs) Average distance.
avg_embeddedness(graph, communities, **kwargs) Average embeddedness of nodes within the community.
average_internal_degree(graph, community, …) The average internal degree of the community set.
avg_transitivity(graph, communities, **kwargs) Average transitivity.
conductance(graph, community, summary) Fraction of total edge volume that points outside the community.
cut_ratio(graph, community, summary) Fraction of existing edges (out of all possible edges) leaving the community.
edges_inside(graph, community, summary) Number of edges internal to the community.
expansion(graph, community, summary) Number of edges per community node that point outside the cluster.
fraction_over_median_degree(graph, …) Fraction of community nodes of having internal degree higher than the median degree value.
hub_dominance(graph, communities, **kwargs) Hub dominance.
internal_edge_density(graph, community, summary) The internal density of the community set.
normalized_cut(graph, community, summary) Normalized variant of the Cut-Ratio
max_odf(graph, community, summary) Maximum fraction of edges of a node of a community that point outside the community itself.
avg_odf(graph, community, summary) Average fraction of edges of a node of a community that point outside the community itself.
flake_odf(graph, community, summary) Fraction of nodes in S that have fewer edges pointing inside than to the outside of the community.
scaled_density(graph, communities, **kwargs) Scaled density.
significance(graph, communities, **kwargs) Significance estimates how likely a partition of dense communities appear in a random graph.
size(graph, communities, **kwargs) Size is the number of nodes in the community
surprise(graph, communities, **kwargs) Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution.
triangle_participation_ratio(graph, …) Fraction of community nodes that belong to a triad.
purity(communities) Purity is the product of the frequencies of the most frequent labels carried by the nodes within the communities

Among the fitness function a well-defined family of measures is the Modularity-based one:

erdos_renyi_modularity(graph, communities, …) Erdos-Renyi modularity is a variation of the Newman-Girvan one.
link_modularity(graph, communities, **kwargs) Quality function designed for directed graphs with overlapping communities.
modularity_density(graph, communities, lmbd, …) The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures.
modularity_overlap(graph, communities, weight) Determines the Overlapping Modularity of a partition C on a graph G.
newman_girvan_modularity(graph, communities, …) Difference the fraction of intra community edges of a partition with the expected number of such edges if distributed according to a null model.
z_modularity(graph, communities, **kwargs) Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit.

Some measures will return an instance of FitnessResult that takes together min/max/mean/std values of the computed index.

FitnessResult(min, max, score, std)

External Evaluation: Partition Comparisons

It is often useful to compare different graph partition to assess their resemblance. cdlib implements the following partition comparisons scores:

adjusted_mutual_information(first_partition, …) Adjusted Mutual Information between two clusterings.
adjusted_rand_index(first_partition, …) Rand index adjusted for chance.
f1(first_partition, second_partition) Compute the average F1 score of the optimal algorithms matches among the partitions in input.
nf1(first_partition, second_partition) Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input.
normalized_mutual_information(…) Normalized Mutual Information between two clusterings.
omega(first_partition, second_partition) Index of resemblance for overlapping, complete coverage, network clusterings.
overlapping_normalized_mutual_information_LFK(…) Overlapping Normalized Mutual Information between two clusterings.
overlapping_normalized_mutual_information_MGH(…) Overlapping Normalized Mutual Information between two clusterings.
variation_of_information(first_partition, …) Variation of Information among two nodes partitions.

Some measures will return an instance of MatchingResult that takes together mean and standard deviation values of the computed index.

MatchingResult(score, std)

Synthetic Benchmarks

External evaluation scores can be fruitfully used to compare alternative clusterings of the same network, but also to asses to what extent an identified node clustering matches a known ground truth partition.

To facilitate such standard evaluation task, cdlib exposes a set of standard synthetic network generators providing topological community ground truth annotations.

In particular, cdlib make available benchmarks for:

  • static community discovery;
  • dynamic community discovery;
  • feature-rich (i.e., node-attributed) community discovery.

All details can be found in the dedicated page.

Networks With Annotated Communities

Although evaluating a topological partition against an annotated “semantic” one is not among the safest path to follow [Peel17], cdlib natively integrates well-known medium-size network datasets with ground-truth communities.

Due to the non-negligible sizes of such datasets, we designed a simple API to gather them from a dedicated remote repository transparently.

All details on remote datasets can be found on the dedicated page.

Ranking Algorithms

Once a set of alternative clusterings have been extracted from a given network, is there a way to select the best one given a set of target fitness functions?

cdlib exposes a few standard techniques to address such an issue: all details can be found in the dedicated documentation page.

[Peel17]Peel, Leto, Daniel B. Larremore, and Aaron Clauset. “The ground truth about metadata and community detection in networks.” Science advances 3.5 (2017): e1602548.