Node Clustering¶
Overview¶

class
NodeClustering
(communities, graph, method_name, method_parameters=None, overlap=False)¶ Node Communities representation.
Parameters:  communities – list of communities
 graph – a networkx/igraph object
 method_name – community discovery algorithm name
 method_parameters – configuration for the community discovery algorithm used
 overlap – boolean, whether the partition is overlapping or not

adjusted_mutual_information
(clustering)¶ Adjusted Mutual Information between two clusterings.
Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score to account for chance. It accounts for the fact that the MI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared. For two clusterings \(U\) and \(V\), the AMI is given as:
AMI(U, V) = [MI(U, V)  E(MI(U, V))] / [max(H(U), H(V))  E(MI(U, V))]
This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.
This metric is furthermore symmetric: switching
label_true
withlabel_pred
will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.Be mindful that this function is an order of magnitude slower than other metrics, such as the Adjusted Rand Index.
Parameters: clustering – NodeClustering object Returns: AMI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.adjusted_mutual_information([[1,2], [3,4]])
Reference:  Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 28372854.

adjusted_rand_index
(clustering)¶ Rand index adjusted for chance.
The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.
The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
ARI = (RI  Expected_RI) / (max(RI)  Expected_RI)
The adjusted Rand index is thus ensured to have a value close to 0.0 for random labeling independently of the number of clusters and samples and exactly 1.0 when the clusterings are identical (up to a permutation).
ARI is a symmetric measure:
adjusted_rand_index(a, b) == adjusted_rand_index(b, a)
Parameters: clustering – NodeClustering object Returns: ARI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.adjusted_rand_index([[1,2], [3,4]])
Reference:  Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193218.

average_internal_degree
(**kwargs)¶ The average internal degree of the algorithms set.
\[f(S) = \frac{2m_S}{n_S}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.average_internal_degree()

avg_odf
(**kwargs)¶ Average fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[\frac{1}{n_S} \sum_{u \in S} \frac{\{(u,v)\in E: v \not\in S\}}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> >>> communities = eva(g, alpha=alpha) >>> pur = communities.purity()

conductance
(**kwargs)¶ Fraction of total edge volume that points outside the algorithms.
\[f(S) = \frac{c_S}{2 m_S+c_S}\]where \(c_S\) is the number of algorithms nodes and, \(m_S\) is the number of algorithms edges
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.conductance()

cut_ratio
(**kwargs)¶ Fraction of existing edges (out of all possible edges) leaving the algorithms.
..math:: f(S) = frac{c_S}{n_S (n − n_S)}
where \(c_S\) is the number of algorithms nodes and, \(n_S\) is the number of edges on the algorithms boundary
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.cut_ratio()

edges_inside
(**kwargs)¶ Number of edges internal to the algorithms.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.edges_inside()

erdos_renyi_modularity
()¶ ErdosRenyi modularity is a variation of the NewmanGirvan one. It assumes that vertices in a network are connected randomly with a constant probability \(p\).
\[Q(S) = \frac{1}{m}\sum_{c \in S} (m_S − \frac{mn_S(n_S −1)}{n(n−1)})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the ErdosRenyi modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.erdos_renyi_modularity()
References: Erdos, P., & Renyi, A. (1959). On random graphs I. Publ. Math. Debrecen, 6, 290297.

expansion
(**kwargs)¶ Number of edges per algorithms node that point outside the cluster.
\[f(S) = \frac{c_S}{n_S}\]where \(n_S\) is the number of edges on the algorithms boundary, \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.expansion()

f1
(clustering)¶ Compute the average F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/nonoverlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: F1 score (harmonic mean of precision and recall) Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.f1([[1,2], [3,4]])
Reference:  Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth. In Complex Networks VII (pp. 133144). Springer, Cham.

flake_odf
(**kwargs)¶ Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms.
\[f(S) = \frac{ \{ u:u \in S, \{(u,v) \in E: v \in S \} < d(u)/2 \}}{n_S}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.flake_odf()

fraction_over_median_degree
(**kwargs)¶ Fraction of algorithms nodes of having internal degree higher than the median degree value.
\[f(S) = \frac{\{u: u \in S, \{(u,v): v \in S\} > d_m\} }{n_S}\]where \(d_m\) is the internal degree median value
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.fraction_over_median_degree()

get_description
(parameters_to_display=None, precision=3)¶ Return a description of the clustering, with the name of the method and its numeric parameters.
Parameters:  parameters_to_display – parameters to display. By default, all float parameters.
 precision – precision used to plot parameters. default: 3
Returns: a string description of the method.

internal_edge_density
(**kwargs)¶ The internal density of the algorithms set.
\[f(S) = \frac{m_S}{n_S(n_S−1)/2}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.internal_edge_density()

link_modularity
()¶ Quality function designed for directed graphs with overlapping communities.
Returns: the link modularity score Example: >>> from cdlib import evaluation >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.link_modularity()

max_odf
(**kwargs)¶ Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[max_{u \in S} \frac{\{(u,v)\in E: v \not\in S\}}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\) and \(d(u)\) is the degree of \(u\)
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.max_odf()

modularity_density
()¶ The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. The idea of this metric is to include the information about algorithms size into the expected density of algorithms to avoid the negligence of small and dense communities. For each algorithms \(C\) in partition \(S\), it uses the average modularity degree calculated by \(d(C) = d^{int(C)} − d^{ext(C)}\) where \(d^{int(C)}\) and \(d^{ext(C)}\) are the average internal and external degrees of \(C\) respectively to evaluate the fitness of \(C\) in its network. Finally, the modularity density can be calculated as follows:
\[Q(S) = \sum_{C \in S} \frac{1}{n_C} ( \sum_{i \in C} k^{int}_{iC}  \sum_{i \in C} k^{out}_{iC})\]where \(n_C\) is the number of nodes in C, \(k^{int}_{iC}\) is the degree of node i within \(C\) and \(k^{out}_{iC}\) is the deree of node i outside \(C\).
Returns: the modularity density score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.modularity_density()
References: Li, Z., Zhang, S., Wang, R. S., Zhang, X. S., & Chen, L. (2008). Quantitative function for algorithms detection. Physical review E, 77(3), 036109.

newman_girvan_modularity
()¶ Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model.
In the standard version of modularity, the null model preserves the expected degree sequence of the graph under consideration. In other words, the modularity compares the real network structure with a corresponding one where nodes are connected without any preference about their neighbors.
\[Q(S) = \frac{1}{m}\sum_{c \in S}(m_S  \frac{(2 m_S + l_S)^2}{4m})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the NewmanGirvan modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.newman_girvan_modularity()
References: Newman, M.E.J. & Girvan, M. Finding and evaluating algorithms structure in networks. Physical Review E 69, 26113(2004).

nf1
(clustering)¶ Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/nonoverlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: MatchingResult instance Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.nf1([[1,2], [3,4]])
Reference:  Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth.
 Rossetti, G. (2017). : RDyn: graph benchmark handling algorithms dynamics. Journal of Complex Networks. 5(6), 893912.

normalized_cut
(**kwargs)¶ Normalized variant of the CutRatio
\[: f(S) = \frac{c_S}{2m_S+c_S} + \frac{c_S}{2(m−m_S )+c_S}\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms internal edges and \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.normalized_cut()

normalized_mutual_information
(clustering)¶ Normalized Mutual Information between two clusterings.
Normalized Mutual Information (NMI) is an normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). In this function, mutual information is normalized by
sqrt(H(labels_true) * H(labels_pred))
Parameters: clustering – NodeClustering object Returns: normalized mutual information score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.normalized_mutual_information([[1,2], [3,4]])

omega
(clustering)¶ Index of resemblance for overlapping, complete coverage, network clusterings.
Parameters: clustering – NodeClustering object Returns: omega index Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.omega([[1,2], [3,4]])
Reference:  Gabriel Murray, Giuseppe Carenini, and Raymond Ng. 2012. Using the omega index for evaluating abstractive algorithms detection. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics, Stroudsburg, PA, USA, 1018.

overlapping_normalized_mutual_information_LFK
(clustering)¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by Lancichinetti et al.
Parameters: clustering – NodeClustering object Returns: onmi score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.overlapping_normalized_mutual_information_LFK([[1,2], [3,4]])
Reference:  Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 033015.

overlapping_normalized_mutual_information_MGH
(clustering, normalization='max')¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by McDaid et al. using a different normalization than the original LFR one. See ref. for more details.
Parameters:  clustering – NodeClustering object
 normalization – one of “max” or “LFK”. Default “max” (corresponds to the main method described in the article)
Returns: onmi score
Example: >>> from cdlib import evaluation, algorithms >>> g = nx.karate_club_graph() >>> louvain_communities = algorithms.louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> evaluation.overlapping_normalized_mutual_information_MGH(louvain_communities,leiden_communities) :Reference:
 McDaid, A. F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515. Chicago

significance
()¶ Significance estimates how likely a partition of dense communities appear in a random graph.
Returns: the significance score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.significance()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.

size
(**kwargs)¶ Size is the number of nodes in the community
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example:
>>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.size()

surprise
()¶ Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hypergeometric distribution.
According to the Surprise metric, the higher the score of a partition, the less likely it is resulted from a random realization, the better the quality of the algorithms structure.
Returns: the surprise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.surprise()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.

to_json
()¶ Generate a JSON representation of the algorithms object
Returns: a JSON formatted string representing the object

to_node_community_map
()¶ Generate a <node, list(communities)> representation of the current clustering
Returns: dict of the form <node, list(communities)>

triangle_participation_ratio
(**kwargs)¶ Fraction of algorithms nodes that belong to a triad.
\[f(S) = \frac{  \{ u: u \in S,\{(v,w):v, w \in S,(u,v) \in E,(u,w) \in E,(v,w) \in E \} \not = \emptyset \} }{n_S}\]where \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of communitywise score Returns: a FitnessResult object/a list of communitywise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.triangle_participation_ratio()

variation_of_information
(clustering)¶ Variation of Information among two nodes partitions.
$$ H(p)+H(q)2MI(p, q) $$
where MI is the mutual information, H the partition entropy and p,q are the algorithms sets
Parameters: clustering – NodeClustering object Returns: VI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.variation_of_information([[1,2], [3,4]])
Reference:  Meila, M. (2007). Comparing clusterings  an information based distance. Journal of Multivariate Analysis, 98, 873895. doi:10.1016/j.jmva.2006.11.013

z_modularity
()¶ Zmodularity is another variant of the standard modularity proposed to avoid the resolution limit. The concept of this version is based on an observation that the difference between the fraction of edges inside communities and the expected number of such edges in a null model should not be considered as the only contribution to the final quality of algorithms structure.
Returns: the zmodularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.z_modularity()
References: Miyauchi, Atsushi, and Yasushi Kawase. Zscorebased modularity for algorithms detection in networks. PloS one 11.1 (2016): e0147805.
Methods¶
Data transformation and IO¶
NodeClustering.to_json () 
Generate a JSON representation of the algorithms object 
NodeClustering.to_node_community_map () 
Generate a <node, list(communities)> representation of the current clustering 
Evaluating Node Clustering¶
NodeClustering.link_modularity () 
Quality function designed for directed graphs with overlapping communities. 
NodeClustering.normalized_cut (**kwargs) 
Normalized variant of the CutRatio 
NodeClustering.internal_edge_density (**kwargs) 
The internal density of the algorithms set. 
NodeClustering.average_internal_degree (**kwargs) 
The average internal degree of the algorithms set. 
NodeClustering.fraction_over_median_degree (…) 
Fraction of algorithms nodes of having internal degree higher than the median degree value. 
NodeClustering.expansion (**kwargs) 
Number of edges per algorithms node that point outside the cluster. 
NodeClustering.cut_ratio (**kwargs) 
Fraction of existing edges (out of all possible edges) leaving the algorithms. 
NodeClustering.edges_inside (**kwargs) 
Number of edges internal to the algorithms. 
NodeClustering.conductance (**kwargs) 
Fraction of total edge volume that points outside the algorithms. 
NodeClustering.max_odf (**kwargs) 
Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself. 
NodeClustering.avg_odf (**kwargs) 
Average fraction of edges of a node of a algorithms that point outside the algorithms itself. 
NodeClustering.flake_odf (**kwargs) 
Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms. 
NodeClustering.triangle_participation_ratio (…) 
Fraction of algorithms nodes that belong to a triad. 
NodeClustering.newman_girvan_modularity () 
Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model. 
NodeClustering.erdos_renyi_modularity () 
ErdosRenyi modularity is a variation of the NewmanGirvan one. 
NodeClustering.modularity_density () 
The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. 
NodeClustering.z_modularity () 
Zmodularity is another variant of the standard modularity proposed to avoid the resolution limit. 
NodeClustering.surprise () 
Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hypergeometric distribution. 
NodeClustering.significance () 
Significance estimates how likely a partition of dense communities appear in a random graph. 
Comparing Node Clusterings¶
NodeClustering.normalized_mutual_information (…) 
Normalized Mutual Information between two clusterings. 
NodeClustering.overlapping_normalized_mutual_information_MGH (…) 
Overlapping Normalized Mutual Information between two clusterings. 
NodeClustering.overlapping_normalized_mutual_information_LFK (…) 
Overlapping Normalized Mutual Information between two clusterings. 
NodeClustering.omega (clustering) 
Index of resemblance for overlapping, complete coverage, network clusterings. 
NodeClustering.f1 (clustering) 
Compute the average F1 score of the optimal algorithms matches among the partitions in input. 
NodeClustering.nf1 (clustering) 
Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. 
NodeClustering.adjusted_rand_index (clustering) 
Rand index adjusted for chance. 
NodeClustering.adjusted_mutual_information (…) 
Adjusted Mutual Information between two clusterings. 
NodeClustering.variation_of_information (…) 
Variation of Information among two nodes partitions. 