CDlib - Community Discovery Library¶
CDlib
is a Python software package that allows to extract, compare and evaluate communities from complex networks.
The library provides a standardized input/output for several existing Community Discovery algorithms. The implementations of all CD algorithms are inherited from existing projects, each one of them acknowledged in the dedicated method reference page.
Date | Python Versions | Main Author | GitHub | pypl |
2020-09-22 | 3.7-3.8 | Giulio Rossetti | Source | Distribution |
CDlib Dev Team¶
Name | Contribution |
Giulio Rossetti | Library Design/Documentation |
Letizia Milli | Community Models Integration |
Rémy Cazabet | Visualization |
Salvatore Citraro | Community Models Integration |
Overview¶
CDlib
is a Python language software package for the extraction, comparison, and evaluation of communities from complex networks.
Who uses CDlib?¶
The potential audience for CDlib
includes mathematicians, physicists, biologists, computer scientists, and social scientists.
Goals¶
CDlib
is built upon the NetworkX python library and is intended to provide:
- a standard programming interface and community discovery implementations that are suitable for many applications,
- a rapid development environment for collaborative, multidisciplinary, projects.
The Python CDlib library¶
CDlib
is a powerful Python package that allows simple and flexible partitioning of complex networks.
Most importantly, CDlib
, as well as the Python programming language, is free, well-supported, and a joy to use.
Free software¶
CDlib
is free software; you can redistribute it and/or modify it under the terms of the BSD License.
We welcome contributions from the community.
Download¶
Software¶
Source and binary releases: https://pypi.python.org/pypi/cdlib
Github (latest development): https://github.com/GiulioRossetti/cdlib
Documentation¶
Installing CDlib¶
Before installing CDlib
, you need to have setuptools installed.
Quick install¶
Get CDlib
from the Python Package Index at pypl.
or install it with
pip install cdlib
and an attempt will be made to find and install an appropriate version that matches your operating system and Python version.
You can install the development version with
pip install git://github.com/GiulioRossetti/cdlib.git
Optional Dependencies¶
CDlib
relies on a few packages calling C code (namely: python-igraph
, leidenalg
, angel_cd
and infomap
).
The default installation will not set up such requirements since their configuration under non unix-like systems is not trivial and cannot be easily automated.
Such a choice has been made to allow (even) Windows user to install the library and get access to its core functionalities.
To made available (most of) the optional packages you can either:
- (Windows) manually install the optional packages (versions details are specified in
requirements_optional.txt
) following the original projects guidelines, or - (Linux/OSX) run the command:
pip install cdlib[C]
Such caveat will install everything that can be easily automated under Linux/OSX.
(Advanced) Graph-tool¶
The only optional dependency that will remain unsatisfied following the previous procedures will be graph-tool (used to add SBM models). If you need it up and running, refer to the official documentation.
Installing from source¶
You can install from source by downloading a source archive file (tar.gz or zip) or by checking out the source files from the GitHub source code repository.
CDlib
is a pure Python package; you don’t need a compiler to build or install it.
Source archive file¶
Download the source (tar.gz or zip file) from pypl or get the latest development version from GitHub
Unpack and change directory to the source directory (it should have the files README.txt and setup.py).
Run python setup.py install to build and install
GitHub¶
Clone the CDlib repostitory (see GitHub for options)
git clone https://github.com/GiulioRossetti/cdlib.git
Change directory to CDlib
Run python setup.py install to build and install
If you don’t have permission to install software on your system, you can install into another directory using the –user, –prefix, or –home flags to setup.py.
For example
python setup.py install --prefix=/home/username/python
or
python setup.py install --home=~
or
python setup.py install --user
If you didn’t install in the standard Python site-packages directory you will need to set your PYTHONPATH variable to the alternate location. See http://docs.python.org/2/install/index.html#search-path for further details.
Requirements¶
Python¶
To use CDlib you need Python 3.6 or later.
The easiest way to get Python and most optional packages is to install the Enthought Python distribution “Canopy” or using Anaconda.
There are several other distributions that contain the key packages you need for scientific computing.
Tutorial¶
NClib is built upon networkx and is designed to extract, compare and evaluate network partitions.
You can find a few basilar examples to get started with cdlib
in this notebook
Reference¶
CDlib
composes of several modules, each one fulfilling a different task related to community detection.
Community Objects¶
cdlib
provides data structures and methods for storing communities.
The choice of community class depends on the structure of the community generated by the selected algorithm.
Which community should I use?¶
Community Type | cdlib class |
---|---|
Node Partition | NodeClustering, FuzzyNodeClustering, AttrNodeClustering, BiNodeClustering |
Edge Partition | EdgeClustering |
Community Types¶
Node Clustering¶
-
class
NodeClustering
(communities, graph, method_name, method_parameters=None, overlap=False)¶ Node Communities representation.
Parameters: - communities – list of communities
- graph – a networkx/igraph object
- method_name – community discovery algorithm name
- method_parameters – configuration for the community discovery algorithm used
- overlap – boolean, whether the partition is overlapping or not
-
adjusted_mutual_information
(clustering)¶ Adjusted Mutual Information between two clusterings.
Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score to account for chance. It accounts for the fact that the MI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared. For two clusterings \(U\) and \(V\), the AMI is given as:
AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [max(H(U), H(V)) - E(MI(U, V))]
This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.
This metric is furthermore symmetric: switching
label_true
withlabel_pred
will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.Be mindful that this function is an order of magnitude slower than other metrics, such as the Adjusted Rand Index.
Parameters: clustering – NodeClustering object Returns: AMI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.adjusted_mutual_information(leiden_communities)
Reference: - Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 2837-2854.
-
adjusted_rand_index
(clustering)¶ Rand index adjusted for chance.
The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.
The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)
The adjusted Rand index is thus ensured to have a value close to 0.0 for random labeling independently of the number of clusters and samples and exactly 1.0 when the clusterings are identical (up to a permutation).
ARI is a symmetric measure:
adjusted_rand_index(a, b) == adjusted_rand_index(b, a)
Parameters: clustering – NodeClustering object Returns: ARI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.adjusted_rand_index(leiden_communities)
Reference: - Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193-218.
-
average_internal_degree
(**kwargs)¶ The average internal degree of the algorithms set.
\[f(S) = \frac{2m_S}{n_S}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.average_internal_degree()
-
avg_odf
(**kwargs)¶ Average fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[\frac{1}{n_S} \sum_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> >>> communities = eva(g, alpha=alpha) >>> pur = communities.purity()
-
conductance
(**kwargs)¶ Fraction of total edge volume that points outside the algorithms.
\[f(S) = \frac{c_S}{2 m_S+c_S}\]where \(c_S\) is the number of algorithms nodes and, \(m_S\) is the number of algorithms edges
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.conductance()
-
cut_ratio
(**kwargs)¶ Fraction of existing edges (out of all possible edges) leaving the algorithms.
..math:: f(S) = frac{c_S}{n_S (n − n_S)}
where \(c_S\) is the number of algorithms nodes and, \(n_S\) is the number of edges on the algorithms boundary
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.cut_ratio()
-
edges_inside
(**kwargs)¶ Number of edges internal to the algorithms.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.edges_inside()
-
erdos_renyi_modularity
()¶ Erdos-Renyi modularity is a variation of the Newman-Girvan one. It assumes that vertices in a network are connected randomly with a constant probability \(p\).
\[Q(S) = \frac{1}{m}\sum_{c \in S} (m_S − \frac{mn_S(n_S −1)}{n(n−1)})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the Erdos-Renyi modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.erdos_renyi_modularity()
References: Erdos, P., & Renyi, A. (1959). On random graphs I. Publ. Math. Debrecen, 6, 290-297.
-
expansion
(**kwargs)¶ Number of edges per algorithms node that point outside the cluster.
\[f(S) = \frac{c_S}{n_S}\]where \(n_S\) is the number of edges on the algorithms boundary, \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.expansion()
-
f1
(clustering)¶ Compute the average F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: F1 score (harmonic mean of precision and recall) Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.f1(leiden_communities)
Reference: - Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth. In Complex Networks VII (pp. 133-144). Springer, Cham.
-
flake_odf
(**kwargs)¶ Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms.
\[f(S) = \frac{| \{ u:u \in S,| \{(u,v) \in E: v \in S \}| < d(u)/2 \}|}{n_S}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.flake_odf()
-
fraction_over_median_degree
(**kwargs)¶ Fraction of algorithms nodes of having internal degree higher than the median degree value.
\[f(S) = \frac{|\{u: u \in S,| \{(u,v): v \in S\}| > d_m\}| }{n_S}\]where \(d_m\) is the internal degree median value
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.fraction_over_median_degree()
-
get_description
(parameters_to_display=None, precision=3)¶ Return a description of the clustering, with the name of the method and its numeric parameters.
Parameters: - parameters_to_display – parameters to display. By default, all float parameters.
- precision – precision used to plot parameters. default: 3
Returns: a string description of the method.
-
internal_edge_density
(**kwargs)¶ The internal density of the algorithms set.
\[f(S) = \frac{m_S}{n_S(n_S−1)/2}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.internal_edge_density()
-
link_modularity
()¶ Quality function designed for directed graphs with overlapping communities.
Returns: the link modularity score Example: >>> from cdlib import evaluation >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.link_modularity()
-
max_odf
(**kwargs)¶ Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[max_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\) and \(d(u)\) is the degree of \(u\)
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.max_odf()
-
modularity_density
()¶ The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. The idea of this metric is to include the information about algorithms size into the expected density of algorithms to avoid the negligence of small and dense communities. For each algorithms \(C\) in partition \(S\), it uses the average modularity degree calculated by \(d(C) = d^{int(C)} − d^{ext(C)}\) where \(d^{int(C)}\) and \(d^{ext(C)}\) are the average internal and external degrees of \(C\) respectively to evaluate the fitness of \(C\) in its network. Finally, the modularity density can be calculated as follows:
\[Q(S) = \sum_{C \in S} \frac{1}{n_C} ( \sum_{i \in C} k^{int}_{iC} - \sum_{i \in C} k^{out}_{iC})\]where \(n_C\) is the number of nodes in C, \(k^{int}_{iC}\) is the degree of node i within \(C\) and \(k^{out}_{iC}\) is the deree of node i outside \(C\).
Returns: the modularity density score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.modularity_density()
References: Li, Z., Zhang, S., Wang, R. S., Zhang, X. S., & Chen, L. (2008). Quantitative function for algorithms detection. Physical review E, 77(3), 036109.
-
newman_girvan_modularity
()¶ Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model.
In the standard version of modularity, the null model preserves the expected degree sequence of the graph under consideration. In other words, the modularity compares the real network structure with a corresponding one where nodes are connected without any preference about their neighbors.
\[Q(S) = \frac{1}{m}\sum_{c \in S}(m_S - \frac{(2 m_S + l_S)^2}{4m})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the Newman-Girvan modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.newman_girvan_modularity()
References: Newman, M.E.J. & Girvan, M. Finding and evaluating algorithms structure in networks. Physical Review E 69, 26113(2004).
-
nf1
(clustering)¶ Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: MatchingResult instance Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.nf1(leiden_communities)
Reference: - Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth.
- Rossetti, G. (2017). : RDyn: graph benchmark handling algorithms dynamics. Journal of Complex Networks. 5(6), 893-912.
-
normalized_cut
(**kwargs)¶ Normalized variant of the Cut-Ratio
\[: f(S) = \frac{c_S}{2m_S+c_S} + \frac{c_S}{2(m−m_S )+c_S}\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms internal edges and \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.normalized_cut()
-
normalized_mutual_information
(clustering)¶ Normalized Mutual Information between two clusterings.
Normalized Mutual Information (NMI) is an normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). In this function, mutual information is normalized by
sqrt(H(labels_true) * H(labels_pred))
Parameters: clustering – NodeClustering object Returns: normalized mutual information score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.normalized_mutual_information(leiden_communities)
-
omega
(clustering)¶ Index of resemblance for overlapping, complete coverage, network clusterings.
Parameters: clustering – NodeClustering object Returns: omega index Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.omega(leiden_communities)
Reference: - Gabriel Murray, Giuseppe Carenini, and Raymond Ng. 2012. Using the omega index for evaluating abstractive algorithms detection. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics, Stroudsburg, PA, USA, 10-18.
-
overlapping_normalized_mutual_information_LFK
(clustering)¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by Lancichinetti et al.
Parameters: clustering – NodeClustering object Returns: onmi score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.overlapping_normalized_mutual_information_LFK(leiden_communities)
Reference: - Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 033015.
-
overlapping_normalized_mutual_information_MGH
(clustering, normalization='max')¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by McDaid et al. using a different normalization than the original LFR one. See ref. for more details.
Parameters: - clustering – NodeClustering object
- normalization – one of “max” or “LFK”. Default “max” (corresponds to the main method described in the article)
Returns: onmi score
Example: >>> from cdlib import evaluation, algorithms >>> g = nx.karate_club_graph() >>> louvain_communities = algorithms.louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> evaluation.overlapping_normalized_mutual_information_MGH(louvain_communities,leiden_communities) :Reference:
- McDaid, A. F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515. Chicago
-
significance
()¶ Significance estimates how likely a partition of dense communities appear in a random graph.
Returns: the significance score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.significance()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.
-
size
(**kwargs)¶ Size is the number of nodes in the community
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example:
>>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.size()
-
surprise
()¶ Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution.
According to the Surprise metric, the higher the score of a partition, the less likely it is resulted from a random realization, the better the quality of the algorithms structure.
Returns: the surprise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.surprise()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.
-
to_json
()¶ Generate a JSON representation of the algorithms object
Returns: a JSON formatted string representing the object
-
to_node_community_map
()¶ Generate a <node, list(communities)> representation of the current clustering
Returns: dict of the form <node, list(communities)>
-
triangle_participation_ratio
(**kwargs)¶ Fraction of algorithms nodes that belong to a triad.
\[f(S) = \frac{ | \{ u: u \in S,\{(v,w):v, w \in S,(u,v) \in E,(u,w) \in E,(v,w) \in E \} \not = \emptyset \} |}{n_S}\]where \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.triangle_participation_ratio()
-
variation_of_information
(clustering)¶ Variation of Information among two nodes partitions.
$$ H(p)+H(q)-2MI(p, q) $$
where MI is the mutual information, H the partition entropy and p,q are the algorithms sets
Parameters: clustering – NodeClustering object Returns: VI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.variation_of_information(leiden_communities)
Reference: - Meila, M. (2007). Comparing clusterings - an information based distance. Journal of Multivariate Analysis, 98, 873-895. doi:10.1016/j.jmva.2006.11.013
-
z_modularity
()¶ Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit. The concept of this version is based on an observation that the difference between the fraction of edges inside communities and the expected number of such edges in a null model should not be considered as the only contribution to the final quality of algorithms structure.
Returns: the z-modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.z_modularity()
References: Miyauchi, Atsushi, and Yasushi Kawase. Z-score-based modularity for algorithms detection in networks. PloS one 11.1 (2016): e0147805.
NodeClustering.to_json () |
Generate a JSON representation of the algorithms object |
NodeClustering.to_node_community_map () |
Generate a <node, list(communities)> representation of the current clustering |
NodeClustering.link_modularity () |
Quality function designed for directed graphs with overlapping communities. |
NodeClustering.normalized_cut (**kwargs) |
Normalized variant of the Cut-Ratio |
NodeClustering.internal_edge_density (**kwargs) |
The internal density of the algorithms set. |
NodeClustering.average_internal_degree (**kwargs) |
The average internal degree of the algorithms set. |
NodeClustering.fraction_over_median_degree (…) |
Fraction of algorithms nodes of having internal degree higher than the median degree value. |
NodeClustering.expansion (**kwargs) |
Number of edges per algorithms node that point outside the cluster. |
NodeClustering.cut_ratio (**kwargs) |
Fraction of existing edges (out of all possible edges) leaving the algorithms. |
NodeClustering.edges_inside (**kwargs) |
Number of edges internal to the algorithms. |
NodeClustering.conductance (**kwargs) |
Fraction of total edge volume that points outside the algorithms. |
NodeClustering.max_odf (**kwargs) |
Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself. |
NodeClustering.avg_odf (**kwargs) |
Average fraction of edges of a node of a algorithms that point outside the algorithms itself. |
NodeClustering.flake_odf (**kwargs) |
Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms. |
NodeClustering.triangle_participation_ratio (…) |
Fraction of algorithms nodes that belong to a triad. |
NodeClustering.newman_girvan_modularity () |
Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model. |
NodeClustering.erdos_renyi_modularity () |
Erdos-Renyi modularity is a variation of the Newman-Girvan one. |
NodeClustering.modularity_density () |
The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. |
NodeClustering.z_modularity () |
Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit. |
NodeClustering.surprise () |
Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution. |
NodeClustering.significance () |
Significance estimates how likely a partition of dense communities appear in a random graph. |
NodeClustering.normalized_mutual_information (…) |
Normalized Mutual Information between two clusterings. |
NodeClustering.overlapping_normalized_mutual_information_MGH (…) |
Overlapping Normalized Mutual Information between two clusterings. |
NodeClustering.overlapping_normalized_mutual_information_LFK (…) |
Overlapping Normalized Mutual Information between two clusterings. |
NodeClustering.omega (clustering) |
Index of resemblance for overlapping, complete coverage, network clusterings. |
NodeClustering.f1 (clustering) |
Compute the average F1 score of the optimal algorithms matches among the partitions in input. |
NodeClustering.nf1 (clustering) |
Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. |
NodeClustering.adjusted_rand_index (clustering) |
Rand index adjusted for chance. |
NodeClustering.adjusted_mutual_information (…) |
Adjusted Mutual Information between two clusterings. |
NodeClustering.variation_of_information (…) |
Variation of Information among two nodes partitions. |
Fuzzy Node Clustering¶
-
class
FuzzyNodeClustering
(communities, node_allocation, graph, method_name, method_parameters=None, overlap=False)¶ Fuzzy Node Communities representation.
Parameters: - communities – list of communities
- node_allocation – dictionary specifying for each node the allocation of probability toward the communities it is placed in
- graph – a networkx/igraph object
- method_name – community discovery algorithm name
- method_parameters – configuration for the community discovery algorithm used
- overlap – boolean, whether the partition is overlapping or not
-
adjusted_mutual_information
(clustering)¶ Adjusted Mutual Information between two clusterings.
Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score to account for chance. It accounts for the fact that the MI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared. For two clusterings \(U\) and \(V\), the AMI is given as:
AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [max(H(U), H(V)) - E(MI(U, V))]
This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.
This metric is furthermore symmetric: switching
label_true
withlabel_pred
will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.Be mindful that this function is an order of magnitude slower than other metrics, such as the Adjusted Rand Index.
Parameters: clustering – NodeClustering object Returns: AMI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.adjusted_mutual_information(leiden_communities)
Reference: - Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 2837-2854.
-
adjusted_rand_index
(clustering)¶ Rand index adjusted for chance.
The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.
The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)
The adjusted Rand index is thus ensured to have a value close to 0.0 for random labeling independently of the number of clusters and samples and exactly 1.0 when the clusterings are identical (up to a permutation).
ARI is a symmetric measure:
adjusted_rand_index(a, b) == adjusted_rand_index(b, a)
Parameters: clustering – NodeClustering object Returns: ARI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.adjusted_rand_index(leiden_communities)
Reference: - Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193-218.
-
average_internal_degree
(**kwargs)¶ The average internal degree of the algorithms set.
\[f(S) = \frac{2m_S}{n_S}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.average_internal_degree()
-
avg_odf
(**kwargs)¶ Average fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[\frac{1}{n_S} \sum_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> >>> communities = eva(g, alpha=alpha) >>> pur = communities.purity()
-
conductance
(**kwargs)¶ Fraction of total edge volume that points outside the algorithms.
\[f(S) = \frac{c_S}{2 m_S+c_S}\]where \(c_S\) is the number of algorithms nodes and, \(m_S\) is the number of algorithms edges
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.conductance()
-
cut_ratio
(**kwargs)¶ Fraction of existing edges (out of all possible edges) leaving the algorithms.
..math:: f(S) = frac{c_S}{n_S (n − n_S)}
where \(c_S\) is the number of algorithms nodes and, \(n_S\) is the number of edges on the algorithms boundary
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.cut_ratio()
-
edges_inside
(**kwargs)¶ Number of edges internal to the algorithms.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.edges_inside()
-
erdos_renyi_modularity
()¶ Erdos-Renyi modularity is a variation of the Newman-Girvan one. It assumes that vertices in a network are connected randomly with a constant probability \(p\).
\[Q(S) = \frac{1}{m}\sum_{c \in S} (m_S − \frac{mn_S(n_S −1)}{n(n−1)})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the Erdos-Renyi modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.erdos_renyi_modularity()
References: Erdos, P., & Renyi, A. (1959). On random graphs I. Publ. Math. Debrecen, 6, 290-297.
-
expansion
(**kwargs)¶ Number of edges per algorithms node that point outside the cluster.
\[f(S) = \frac{c_S}{n_S}\]where \(n_S\) is the number of edges on the algorithms boundary, \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.expansion()
-
f1
(clustering)¶ Compute the average F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: F1 score (harmonic mean of precision and recall) Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.f1(leiden_communities)
Reference: - Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth. In Complex Networks VII (pp. 133-144). Springer, Cham.
-
flake_odf
(**kwargs)¶ Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms.
\[f(S) = \frac{| \{ u:u \in S,| \{(u,v) \in E: v \in S \}| < d(u)/2 \}|}{n_S}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.flake_odf()
-
fraction_over_median_degree
(**kwargs)¶ Fraction of algorithms nodes of having internal degree higher than the median degree value.
\[f(S) = \frac{|\{u: u \in S,| \{(u,v): v \in S\}| > d_m\}| }{n_S}\]where \(d_m\) is the internal degree median value
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.fraction_over_median_degree()
-
get_description
(parameters_to_display=None, precision=3)¶ Return a description of the clustering, with the name of the method and its numeric parameters.
Parameters: - parameters_to_display – parameters to display. By default, all float parameters.
- precision – precision used to plot parameters. default: 3
Returns: a string description of the method.
-
internal_edge_density
(**kwargs)¶ The internal density of the algorithms set.
\[f(S) = \frac{m_S}{n_S(n_S−1)/2}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.internal_edge_density()
-
link_modularity
()¶ Quality function designed for directed graphs with overlapping communities.
Returns: the link modularity score Example: >>> from cdlib import evaluation >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.link_modularity()
-
max_odf
(**kwargs)¶ Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[max_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\) and \(d(u)\) is the degree of \(u\)
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.max_odf()
-
modularity_density
()¶ The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. The idea of this metric is to include the information about algorithms size into the expected density of algorithms to avoid the negligence of small and dense communities. For each algorithms \(C\) in partition \(S\), it uses the average modularity degree calculated by \(d(C) = d^{int(C)} − d^{ext(C)}\) where \(d^{int(C)}\) and \(d^{ext(C)}\) are the average internal and external degrees of \(C\) respectively to evaluate the fitness of \(C\) in its network. Finally, the modularity density can be calculated as follows:
\[Q(S) = \sum_{C \in S} \frac{1}{n_C} ( \sum_{i \in C} k^{int}_{iC} - \sum_{i \in C} k^{out}_{iC})\]where \(n_C\) is the number of nodes in C, \(k^{int}_{iC}\) is the degree of node i within \(C\) and \(k^{out}_{iC}\) is the deree of node i outside \(C\).
Returns: the modularity density score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.modularity_density()
References: Li, Z., Zhang, S., Wang, R. S., Zhang, X. S., & Chen, L. (2008). Quantitative function for algorithms detection. Physical review E, 77(3), 036109.
-
newman_girvan_modularity
()¶ Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model.
In the standard version of modularity, the null model preserves the expected degree sequence of the graph under consideration. In other words, the modularity compares the real network structure with a corresponding one where nodes are connected without any preference about their neighbors.
\[Q(S) = \frac{1}{m}\sum_{c \in S}(m_S - \frac{(2 m_S + l_S)^2}{4m})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the Newman-Girvan modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.newman_girvan_modularity()
References: Newman, M.E.J. & Girvan, M. Finding and evaluating algorithms structure in networks. Physical Review E 69, 26113(2004).
-
nf1
(clustering)¶ Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: MatchingResult instance Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.nf1(leiden_communities)
Reference: - Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth.
- Rossetti, G. (2017). : RDyn: graph benchmark handling algorithms dynamics. Journal of Complex Networks. 5(6), 893-912.
-
normalized_cut
(**kwargs)¶ Normalized variant of the Cut-Ratio
\[: f(S) = \frac{c_S}{2m_S+c_S} + \frac{c_S}{2(m−m_S )+c_S}\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms internal edges and \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.normalized_cut()
-
normalized_mutual_information
(clustering)¶ Normalized Mutual Information between two clusterings.
Normalized Mutual Information (NMI) is an normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). In this function, mutual information is normalized by
sqrt(H(labels_true) * H(labels_pred))
Parameters: clustering – NodeClustering object Returns: normalized mutual information score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.normalized_mutual_information(leiden_communities)
-
omega
(clustering)¶ Index of resemblance for overlapping, complete coverage, network clusterings.
Parameters: clustering – NodeClustering object Returns: omega index Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.omega(leiden_communities)
Reference: - Gabriel Murray, Giuseppe Carenini, and Raymond Ng. 2012. Using the omega index for evaluating abstractive algorithms detection. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics, Stroudsburg, PA, USA, 10-18.
-
overlapping_normalized_mutual_information_LFK
(clustering)¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by Lancichinetti et al.
Parameters: clustering – NodeClustering object Returns: onmi score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.overlapping_normalized_mutual_information_LFK(leiden_communities)
Reference: - Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 033015.
-
overlapping_normalized_mutual_information_MGH
(clustering, normalization='max')¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by McDaid et al. using a different normalization than the original LFR one. See ref. for more details.
Parameters: - clustering – NodeClustering object
- normalization – one of “max” or “LFK”. Default “max” (corresponds to the main method described in the article)
Returns: onmi score
Example: >>> from cdlib import evaluation, algorithms >>> g = nx.karate_club_graph() >>> louvain_communities = algorithms.louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> evaluation.overlapping_normalized_mutual_information_MGH(louvain_communities,leiden_communities) :Reference:
- McDaid, A. F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515. Chicago
-
significance
()¶ Significance estimates how likely a partition of dense communities appear in a random graph.
Returns: the significance score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.significance()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.
-
size
(**kwargs)¶ Size is the number of nodes in the community
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example:
>>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.size()
-
surprise
()¶ Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution.
According to the Surprise metric, the higher the score of a partition, the less likely it is resulted from a random realization, the better the quality of the algorithms structure.
Returns: the surprise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.surprise()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.
-
to_json
()¶ Generate a JSON representation of the algorithms object
Returns: a JSON formatted string representing the object
-
to_node_community_map
()¶ Generate a <node, list(communities)> representation of the current clustering
Returns: dict of the form <node, list(communities)>
-
triangle_participation_ratio
(**kwargs)¶ Fraction of algorithms nodes that belong to a triad.
\[f(S) = \frac{ | \{ u: u \in S,\{(v,w):v, w \in S,(u,v) \in E,(u,w) \in E,(v,w) \in E \} \not = \emptyset \} |}{n_S}\]where \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.triangle_participation_ratio()
-
variation_of_information
(clustering)¶ Variation of Information among two nodes partitions.
$$ H(p)+H(q)-2MI(p, q) $$
where MI is the mutual information, H the partition entropy and p,q are the algorithms sets
Parameters: clustering – NodeClustering object Returns: VI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.variation_of_information(leiden_communities)
Reference: - Meila, M. (2007). Comparing clusterings - an information based distance. Journal of Multivariate Analysis, 98, 873-895. doi:10.1016/j.jmva.2006.11.013
-
z_modularity
()¶ Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit. The concept of this version is based on an observation that the difference between the fraction of edges inside communities and the expected number of such edges in a null model should not be considered as the only contribution to the final quality of algorithms structure.
Returns: the z-modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.z_modularity()
References: Miyauchi, Atsushi, and Yasushi Kawase. Z-score-based modularity for algorithms detection in networks. PloS one 11.1 (2016): e0147805.
FuzzyNodeClustering.to_json () |
Generate a JSON representation of the algorithms object |
FuzzyNodeClustering.to_node_community_map () |
Generate a <node, list(communities)> representation of the current clustering |
FuzzyNodeClustering.link_modularity () |
Quality function designed for directed graphs with overlapping communities. |
FuzzyNodeClustering.normalized_cut (**kwargs) |
Normalized variant of the Cut-Ratio |
FuzzyNodeClustering.internal_edge_density (…) |
The internal density of the algorithms set. |
FuzzyNodeClustering.average_internal_degree (…) |
The average internal degree of the algorithms set. |
FuzzyNodeClustering.fraction_over_median_degree (…) |
Fraction of algorithms nodes of having internal degree higher than the median degree value. |
FuzzyNodeClustering.expansion (**kwargs) |
Number of edges per algorithms node that point outside the cluster. |
FuzzyNodeClustering.cut_ratio (**kwargs) |
Fraction of existing edges (out of all possible edges) leaving the algorithms. |
FuzzyNodeClustering.edges_inside (**kwargs) |
Number of edges internal to the algorithms. |
FuzzyNodeClustering.conductance (**kwargs) |
Fraction of total edge volume that points outside the algorithms. |
FuzzyNodeClustering.max_odf (**kwargs) |
Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself. |
FuzzyNodeClustering.avg_odf (**kwargs) |
Average fraction of edges of a node of a algorithms that point outside the algorithms itself. |
FuzzyNodeClustering.flake_odf (**kwargs) |
Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms. |
FuzzyNodeClustering.triangle_participation_ratio (…) |
Fraction of algorithms nodes that belong to a triad. |
FuzzyNodeClustering.newman_girvan_modularity () |
Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model. |
FuzzyNodeClustering.erdos_renyi_modularity () |
Erdos-Renyi modularity is a variation of the Newman-Girvan one. |
FuzzyNodeClustering.modularity_density () |
The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. |
FuzzyNodeClustering.z_modularity () |
Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit. |
FuzzyNodeClustering.surprise () |
Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution. |
FuzzyNodeClustering.significance () |
Significance estimates how likely a partition of dense communities appear in a random graph. |
Attributed Node Clustering¶
-
class
AttrNodeClustering
(communities, graph, method_name, coms_labels=None, method_parameters=None, overlap=False)¶ Attribute Node Communities representation.
Parameters: - communities – list of communities
- graph – a networkx/igraph object
- method_name – community discovery algorithm name
- coms_labels – dictionary specifying for each community the frequency of the attribute values
- method_parameters – configuration for the community discovery algorithm used
- overlap – boolean, whether the partition is overlapping or not
-
adjusted_mutual_information
(clustering)¶ Adjusted Mutual Information between two clusterings.
Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score to account for chance. It accounts for the fact that the MI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared. For two clusterings \(U\) and \(V\), the AMI is given as:
AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [max(H(U), H(V)) - E(MI(U, V))]
This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.
This metric is furthermore symmetric: switching
label_true
withlabel_pred
will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.Be mindful that this function is an order of magnitude slower than other metrics, such as the Adjusted Rand Index.
Parameters: clustering – NodeClustering object Returns: AMI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.adjusted_mutual_information(leiden_communities)
Reference: - Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 2837-2854.
-
adjusted_rand_index
(clustering)¶ Rand index adjusted for chance.
The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.
The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)
The adjusted Rand index is thus ensured to have a value close to 0.0 for random labeling independently of the number of clusters and samples and exactly 1.0 when the clusterings are identical (up to a permutation).
ARI is a symmetric measure:
adjusted_rand_index(a, b) == adjusted_rand_index(b, a)
Parameters: clustering – NodeClustering object Returns: ARI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.adjusted_rand_index(leiden_communities)
Reference: - Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193-218.
-
average_internal_degree
(**kwargs)¶ The average internal degree of the algorithms set.
\[f(S) = \frac{2m_S}{n_S}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.average_internal_degree()
-
avg_odf
(**kwargs)¶ Average fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[\frac{1}{n_S} \sum_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> >>> communities = eva(g, alpha=alpha) >>> pur = communities.purity()
-
conductance
(**kwargs)¶ Fraction of total edge volume that points outside the algorithms.
\[f(S) = \frac{c_S}{2 m_S+c_S}\]where \(c_S\) is the number of algorithms nodes and, \(m_S\) is the number of algorithms edges
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.conductance()
-
cut_ratio
(**kwargs)¶ Fraction of existing edges (out of all possible edges) leaving the algorithms.
..math:: f(S) = frac{c_S}{n_S (n − n_S)}
where \(c_S\) is the number of algorithms nodes and, \(n_S\) is the number of edges on the algorithms boundary
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.cut_ratio()
-
edges_inside
(**kwargs)¶ Number of edges internal to the algorithms.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.edges_inside()
-
erdos_renyi_modularity
()¶ Erdos-Renyi modularity is a variation of the Newman-Girvan one. It assumes that vertices in a network are connected randomly with a constant probability \(p\).
\[Q(S) = \frac{1}{m}\sum_{c \in S} (m_S − \frac{mn_S(n_S −1)}{n(n−1)})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the Erdos-Renyi modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.erdos_renyi_modularity()
References: Erdos, P., & Renyi, A. (1959). On random graphs I. Publ. Math. Debrecen, 6, 290-297.
-
expansion
(**kwargs)¶ Number of edges per algorithms node that point outside the cluster.
\[f(S) = \frac{c_S}{n_S}\]where \(n_S\) is the number of edges on the algorithms boundary, \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.expansion()
-
f1
(clustering)¶ Compute the average F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: F1 score (harmonic mean of precision and recall) Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.f1(leiden_communities)
Reference: - Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth. In Complex Networks VII (pp. 133-144). Springer, Cham.
-
flake_odf
(**kwargs)¶ Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms.
\[f(S) = \frac{| \{ u:u \in S,| \{(u,v) \in E: v \in S \}| < d(u)/2 \}|}{n_S}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.flake_odf()
-
fraction_over_median_degree
(**kwargs)¶ Fraction of algorithms nodes of having internal degree higher than the median degree value.
\[f(S) = \frac{|\{u: u \in S,| \{(u,v): v \in S\}| > d_m\}| }{n_S}\]where \(d_m\) is the internal degree median value
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.fraction_over_median_degree()
-
get_description
(parameters_to_display=None, precision=3)¶ Return a description of the clustering, with the name of the method and its numeric parameters.
Parameters: - parameters_to_display – parameters to display. By default, all float parameters.
- precision – precision used to plot parameters. default: 3
Returns: a string description of the method.
-
internal_edge_density
(**kwargs)¶ The internal density of the algorithms set.
\[f(S) = \frac{m_S}{n_S(n_S−1)/2}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.internal_edge_density()
-
link_modularity
()¶ Quality function designed for directed graphs with overlapping communities.
Returns: the link modularity score Example: >>> from cdlib import evaluation >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.link_modularity()
-
max_odf
(**kwargs)¶ Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[max_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\) and \(d(u)\) is the degree of \(u\)
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.max_odf()
-
modularity_density
()¶ The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. The idea of this metric is to include the information about algorithms size into the expected density of algorithms to avoid the negligence of small and dense communities. For each algorithms \(C\) in partition \(S\), it uses the average modularity degree calculated by \(d(C) = d^{int(C)} − d^{ext(C)}\) where \(d^{int(C)}\) and \(d^{ext(C)}\) are the average internal and external degrees of \(C\) respectively to evaluate the fitness of \(C\) in its network. Finally, the modularity density can be calculated as follows:
\[Q(S) = \sum_{C \in S} \frac{1}{n_C} ( \sum_{i \in C} k^{int}_{iC} - \sum_{i \in C} k^{out}_{iC})\]where \(n_C\) is the number of nodes in C, \(k^{int}_{iC}\) is the degree of node i within \(C\) and \(k^{out}_{iC}\) is the deree of node i outside \(C\).
Returns: the modularity density score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.modularity_density()
References: Li, Z., Zhang, S., Wang, R. S., Zhang, X. S., & Chen, L. (2008). Quantitative function for algorithms detection. Physical review E, 77(3), 036109.
-
newman_girvan_modularity
()¶ Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model.
In the standard version of modularity, the null model preserves the expected degree sequence of the graph under consideration. In other words, the modularity compares the real network structure with a corresponding one where nodes are connected without any preference about their neighbors.
\[Q(S) = \frac{1}{m}\sum_{c \in S}(m_S - \frac{(2 m_S + l_S)^2}{4m})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the Newman-Girvan modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.newman_girvan_modularity()
References: Newman, M.E.J. & Girvan, M. Finding and evaluating algorithms structure in networks. Physical Review E 69, 26113(2004).
-
nf1
(clustering)¶ Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: MatchingResult instance Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.nf1(leiden_communities)
Reference: - Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth.
- Rossetti, G. (2017). : RDyn: graph benchmark handling algorithms dynamics. Journal of Complex Networks. 5(6), 893-912.
-
normalized_cut
(**kwargs)¶ Normalized variant of the Cut-Ratio
\[: f(S) = \frac{c_S}{2m_S+c_S} + \frac{c_S}{2(m−m_S )+c_S}\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms internal edges and \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.normalized_cut()
-
normalized_mutual_information
(clustering)¶ Normalized Mutual Information between two clusterings.
Normalized Mutual Information (NMI) is an normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). In this function, mutual information is normalized by
sqrt(H(labels_true) * H(labels_pred))
Parameters: clustering – NodeClustering object Returns: normalized mutual information score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.normalized_mutual_information(leiden_communities)
-
omega
(clustering)¶ Index of resemblance for overlapping, complete coverage, network clusterings.
Parameters: clustering – NodeClustering object Returns: omega index Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.omega(leiden_communities)
Reference: - Gabriel Murray, Giuseppe Carenini, and Raymond Ng. 2012. Using the omega index for evaluating abstractive algorithms detection. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics, Stroudsburg, PA, USA, 10-18.
-
overlapping_normalized_mutual_information_LFK
(clustering)¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by Lancichinetti et al.
Parameters: clustering – NodeClustering object Returns: onmi score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.overlapping_normalized_mutual_information_LFK(leiden_communities)
Reference: - Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 033015.
-
overlapping_normalized_mutual_information_MGH
(clustering, normalization='max')¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by McDaid et al. using a different normalization than the original LFR one. See ref. for more details.
Parameters: - clustering – NodeClustering object
- normalization – one of “max” or “LFK”. Default “max” (corresponds to the main method described in the article)
Returns: onmi score
Example: >>> from cdlib import evaluation, algorithms >>> g = nx.karate_club_graph() >>> louvain_communities = algorithms.louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> evaluation.overlapping_normalized_mutual_information_MGH(louvain_communities,leiden_communities) :Reference:
- McDaid, A. F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515. Chicago
-
purity
()¶ Purity is the product of the frequencies of the most frequent labels carried by the nodes within the communities :return: FitnessResult object
-
significance
()¶ Significance estimates how likely a partition of dense communities appear in a random graph.
Returns: the significance score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.significance()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.
-
size
(**kwargs)¶ Size is the number of nodes in the community
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example:
>>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.size()
-
surprise
()¶ Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution.
According to the Surprise metric, the higher the score of a partition, the less likely it is resulted from a random realization, the better the quality of the algorithms structure.
Returns: the surprise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.surprise()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.
-
to_json
()¶ Generate a JSON representation of the algorithms object
Returns: a JSON formatted string representing the object
-
to_node_community_map
()¶ Generate a <node, list(communities)> representation of the current clustering
Returns: dict of the form <node, list(communities)>
-
triangle_participation_ratio
(**kwargs)¶ Fraction of algorithms nodes that belong to a triad.
\[f(S) = \frac{ | \{ u: u \in S,\{(v,w):v, w \in S,(u,v) \in E,(u,w) \in E,(v,w) \in E \} \not = \emptyset \} |}{n_S}\]where \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.triangle_participation_ratio()
-
variation_of_information
(clustering)¶ Variation of Information among two nodes partitions.
$$ H(p)+H(q)-2MI(p, q) $$
where MI is the mutual information, H the partition entropy and p,q are the algorithms sets
Parameters: clustering – NodeClustering object Returns: VI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.variation_of_information(leiden_communities)
Reference: - Meila, M. (2007). Comparing clusterings - an information based distance. Journal of Multivariate Analysis, 98, 873-895. doi:10.1016/j.jmva.2006.11.013
-
z_modularity
()¶ Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit. The concept of this version is based on an observation that the difference between the fraction of edges inside communities and the expected number of such edges in a null model should not be considered as the only contribution to the final quality of algorithms structure.
Returns: the z-modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.z_modularity()
References: Miyauchi, Atsushi, and Yasushi Kawase. Z-score-based modularity for algorithms detection in networks. PloS one 11.1 (2016): e0147805.
AttrNodeClustering.purity () |
Purity is the product of the frequencies of the most frequent labels carried by the nodes within the communities :return: FitnessResult object |
Biparite Node Clustering¶
-
class
BiNodeClustering
(left_communities, right_communities, graph, method_name, method_parameters=None, overlap=False)¶ Bipartite Node Communities representation.
Parameters: - left_communities – list of left communities
- right_communities – list of right communities
- graph – a networkx/igraph object
- method_name – community discovery algorithm name
- method_parameters – configuration for the community discovery algorithm used
- overlap – boolean, whether the partition is overlapping or not
-
adjusted_mutual_information
(clustering)¶ Adjusted Mutual Information between two clusterings.
Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score to account for chance. It accounts for the fact that the MI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared. For two clusterings \(U\) and \(V\), the AMI is given as:
AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [max(H(U), H(V)) - E(MI(U, V))]
This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.
This metric is furthermore symmetric: switching
label_true
withlabel_pred
will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.Be mindful that this function is an order of magnitude slower than other metrics, such as the Adjusted Rand Index.
Parameters: clustering – NodeClustering object Returns: AMI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.adjusted_mutual_information(leiden_communities)
Reference: - Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 2837-2854.
-
adjusted_rand_index
(clustering)¶ Rand index adjusted for chance.
The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.
The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)
The adjusted Rand index is thus ensured to have a value close to 0.0 for random labeling independently of the number of clusters and samples and exactly 1.0 when the clusterings are identical (up to a permutation).
ARI is a symmetric measure:
adjusted_rand_index(a, b) == adjusted_rand_index(b, a)
Parameters: clustering – NodeClustering object Returns: ARI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.adjusted_rand_index(leiden_communities)
Reference: - Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193-218.
-
average_internal_degree
(**kwargs)¶ The average internal degree of the algorithms set.
\[f(S) = \frac{2m_S}{n_S}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.average_internal_degree()
-
avg_odf
(**kwargs)¶ Average fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[\frac{1}{n_S} \sum_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> >>> communities = eva(g, alpha=alpha) >>> pur = communities.purity()
-
conductance
(**kwargs)¶ Fraction of total edge volume that points outside the algorithms.
\[f(S) = \frac{c_S}{2 m_S+c_S}\]where \(c_S\) is the number of algorithms nodes and, \(m_S\) is the number of algorithms edges
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.conductance()
-
cut_ratio
(**kwargs)¶ Fraction of existing edges (out of all possible edges) leaving the algorithms.
..math:: f(S) = frac{c_S}{n_S (n − n_S)}
where \(c_S\) is the number of algorithms nodes and, \(n_S\) is the number of edges on the algorithms boundary
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.cut_ratio()
-
edges_inside
(**kwargs)¶ Number of edges internal to the algorithms.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.edges_inside()
-
erdos_renyi_modularity
()¶ Erdos-Renyi modularity is a variation of the Newman-Girvan one. It assumes that vertices in a network are connected randomly with a constant probability \(p\).
\[Q(S) = \frac{1}{m}\sum_{c \in S} (m_S − \frac{mn_S(n_S −1)}{n(n−1)})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the Erdos-Renyi modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.erdos_renyi_modularity()
References: Erdos, P., & Renyi, A. (1959). On random graphs I. Publ. Math. Debrecen, 6, 290-297.
-
expansion
(**kwargs)¶ Number of edges per algorithms node that point outside the cluster.
\[f(S) = \frac{c_S}{n_S}\]where \(n_S\) is the number of edges on the algorithms boundary, \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.expansion()
-
f1
(clustering)¶ Compute the average F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: F1 score (harmonic mean of precision and recall) Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.f1(leiden_communities)
Reference: - Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth. In Complex Networks VII (pp. 133-144). Springer, Cham.
-
flake_odf
(**kwargs)¶ Fraction of nodes in S that have fewer edges pointing inside than to the outside of the algorithms.
\[f(S) = \frac{| \{ u:u \in S,| \{(u,v) \in E: v \in S \}| < d(u)/2 \}|}{n_S}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\), \(d(u)\) is the degree of \(u\) and \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.flake_odf()
-
fraction_over_median_degree
(**kwargs)¶ Fraction of algorithms nodes of having internal degree higher than the median degree value.
\[f(S) = \frac{|\{u: u \in S,| \{(u,v): v \in S\}| > d_m\}| }{n_S}\]where \(d_m\) is the internal degree median value
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.fraction_over_median_degree()
-
get_description
(parameters_to_display=None, precision=3)¶ Return a description of the clustering, with the name of the method and its numeric parameters.
Parameters: - parameters_to_display – parameters to display. By default, all float parameters.
- precision – precision used to plot parameters. default: 3
Returns: a string description of the method.
-
internal_edge_density
(**kwargs)¶ The internal density of the algorithms set.
\[f(S) = \frac{m_S}{n_S(n_S−1)/2}\]where \(m_S\) is the number of algorithms internal edges and \(n_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.internal_edge_density()
-
link_modularity
()¶ Quality function designed for directed graphs with overlapping communities.
Returns: the link modularity score Example: >>> from cdlib import evaluation >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.link_modularity()
-
max_odf
(**kwargs)¶ Maximum fraction of edges of a node of a algorithms that point outside the algorithms itself.
\[max_{u \in S} \frac{|\{(u,v)\in E: v \not\in S\}|}{d(u)}\]where \(E\) is the graph edge set, \(v\) is a node in \(S\) and \(d(u)\) is the degree of \(u\)
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.max_odf()
-
modularity_density
()¶ The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. The idea of this metric is to include the information about algorithms size into the expected density of algorithms to avoid the negligence of small and dense communities. For each algorithms \(C\) in partition \(S\), it uses the average modularity degree calculated by \(d(C) = d^{int(C)} − d^{ext(C)}\) where \(d^{int(C)}\) and \(d^{ext(C)}\) are the average internal and external degrees of \(C\) respectively to evaluate the fitness of \(C\) in its network. Finally, the modularity density can be calculated as follows:
\[Q(S) = \sum_{C \in S} \frac{1}{n_C} ( \sum_{i \in C} k^{int}_{iC} - \sum_{i \in C} k^{out}_{iC})\]where \(n_C\) is the number of nodes in C, \(k^{int}_{iC}\) is the degree of node i within \(C\) and \(k^{out}_{iC}\) is the deree of node i outside \(C\).
Returns: the modularity density score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.modularity_density()
References: Li, Z., Zhang, S., Wang, R. S., Zhang, X. S., & Chen, L. (2008). Quantitative function for algorithms detection. Physical review E, 77(3), 036109.
-
newman_girvan_modularity
()¶ Difference the fraction of intra algorithms edges of a partition with the expected number of such edges if distributed according to a null model.
In the standard version of modularity, the null model preserves the expected degree sequence of the graph under consideration. In other words, the modularity compares the real network structure with a corresponding one where nodes are connected without any preference about their neighbors.
\[Q(S) = \frac{1}{m}\sum_{c \in S}(m_S - \frac{(2 m_S + l_S)^2}{4m})\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms edges, \(l_S\) is the number of edges from nodes in S to nodes outside S.
Returns: the Newman-Girvan modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.newman_girvan_modularity()
References: Newman, M.E.J. & Girvan, M. Finding and evaluating algorithms structure in networks. Physical Review E 69, 26113(2004).
-
nf1
(clustering)¶ Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. Works on overlapping/non-overlapping complete/partial coverage partitions.
Parameters: clustering – NodeClustering object Returns: MatchingResult instance Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.nf1(leiden_communities)
Reference: - Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth.
- Rossetti, G. (2017). : RDyn: graph benchmark handling algorithms dynamics. Journal of Complex Networks. 5(6), 893-912.
-
normalized_cut
(**kwargs)¶ Normalized variant of the Cut-Ratio
\[: f(S) = \frac{c_S}{2m_S+c_S} + \frac{c_S}{2(m−m_S )+c_S}\]where \(m\) is the number of graph edges, \(m_S\) is the number of algorithms internal edges and \(c_S\) is the number of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.normalized_cut()
-
normalized_mutual_information
(clustering)¶ Normalized Mutual Information between two clusterings.
Normalized Mutual Information (NMI) is an normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). In this function, mutual information is normalized by
sqrt(H(labels_true) * H(labels_pred))
Parameters: clustering – NodeClustering object Returns: normalized mutual information score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.normalized_mutual_information(leiden_communities)
-
omega
(clustering)¶ Index of resemblance for overlapping, complete coverage, network clusterings.
Parameters: clustering – NodeClustering object Returns: omega index Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.omega(leiden_communities)
Reference: - Gabriel Murray, Giuseppe Carenini, and Raymond Ng. 2012. Using the omega index for evaluating abstractive algorithms detection. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics, Stroudsburg, PA, USA, 10-18.
-
overlapping_normalized_mutual_information_LFK
(clustering)¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by Lancichinetti et al.
Parameters: clustering – NodeClustering object Returns: onmi score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.overlapping_normalized_mutual_information_LFK(leiden_communities)
Reference: - Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 033015.
-
overlapping_normalized_mutual_information_MGH
(clustering, normalization='max')¶ Overlapping Normalized Mutual Information between two clusterings.
Extension of the Normalized Mutual Information (NMI) score to cope with overlapping partitions. This is the version proposed by McDaid et al. using a different normalization than the original LFR one. See ref. for more details.
Parameters: - clustering – NodeClustering object
- normalization – one of “max” or “LFK”. Default “max” (corresponds to the main method described in the article)
Returns: onmi score
Example: >>> from cdlib import evaluation, algorithms >>> g = nx.karate_club_graph() >>> louvain_communities = algorithms.louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> evaluation.overlapping_normalized_mutual_information_MGH(louvain_communities,leiden_communities) :Reference:
- McDaid, A. F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515. Chicago
-
significance
()¶ Significance estimates how likely a partition of dense communities appear in a random graph.
Returns: the significance score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.significance()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.
-
size
(**kwargs)¶ Size is the number of nodes in the community
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example:
>>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.size()
-
surprise
()¶ Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution.
According to the Surprise metric, the higher the score of a partition, the less likely it is resulted from a random realization, the better the quality of the algorithms structure.
Returns: the surprise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.surprise()
References: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816.
-
to_json
()¶ Generate a JSON representation of the algorithms object
Returns: a JSON formatted string representing the object
-
to_node_community_map
()¶ Generate a <node, list(communities)> representation of the current clustering
Returns: dict of the form <node, list(communities)>
-
triangle_participation_ratio
(**kwargs)¶ Fraction of algorithms nodes that belong to a triad.
\[f(S) = \frac{ | \{ u: u \in S,\{(v,w):v, w \in S,(u,v) \in E,(u,w) \in E,(v,w) \in E \} \not = \emptyset \} |}{n_S}\]where \(n_S\) is the set of algorithms nodes.
Parameters: summary – (optional, default True) if True, an overall summary is returned for the partition (min, max, avg, std); if False a list of community-wise score Returns: a FitnessResult object/a list of community-wise score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.triangle_participation_ratio()
-
variation_of_information
(clustering)¶ Variation of Information among two nodes partitions.
$$ H(p)+H(q)-2MI(p, q) $$
where MI is the mutual information, H the partition entropy and p,q are the algorithms sets
Parameters: clustering – NodeClustering object Returns: VI score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> leiden_communities = algorithms.leiden(g) >>> mod = communities.variation_of_information(leiden_communities)
Reference: - Meila, M. (2007). Comparing clusterings - an information based distance. Journal of Multivariate Analysis, 98, 873-895. doi:10.1016/j.jmva.2006.11.013
-
z_modularity
()¶ Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit. The concept of this version is based on an observation that the difference between the fraction of edges inside communities and the expected number of such edges in a null model should not be considered as the only contribution to the final quality of algorithms structure.
Returns: the z-modularity score Example: >>> from cdlib.algorithms import louvain >>> g = nx.karate_club_graph() >>> communities = louvain(g) >>> mod = communities.z_modularity()
References: Miyauchi, Atsushi, and Yasushi Kawase. Z-score-based modularity for algorithms detection in networks. PloS one 11.1 (2016): e0147805.
Edge Clustering¶
-
class
EdgeClustering
(communities, graph, method_name, method_parameters=None, overlap=False)¶ Edge Clustering representation.
Parameters: - communities – list of communities
- graph – a networkx/igraph object
- method_name – community discovery algorithm name
- method_parameters – configuration for the community discovery algorithm used
- overlap – boolean, whether the partition is overlapping or not
-
get_description
(parameters_to_display=None, precision=3)¶ Return a description of the clustering, with the name of the method and its numeric parameters.
Parameters: - parameters_to_display – parameters to display. By default, all float parameters.
- precision – precision used to plot parameters. default: 3
Returns: a string description of the method.
-
to_edge_community_map
()¶ Generate a <edge, list(communities)> representation of the current clustering
Returns: dict of the form <edge, list(communities)>
-
to_json
()¶ Generate a JSON representation of the algorithms object
Returns: a JSON formatted string representing the object
EdgeClustering.to_json () |
Generate a JSON representation of the algorithms object |
EdgeClustering.to_edge_community_map () |
Generate a <edge, list(communities)> representation of the current clustering |
Community Discovery algorithms¶
CDlib
collects implementations of several Community Discovery algorithms.
To maintain the library organization as clean and resilient as possible the approaches are grouped following a simple, two level, rationale:
- The first distinction is made on the object clustered, thus separating Node Clustering and Edge Clustering algorithms;
- The second distinction is made on the specific kind of partition each one of them generates: Crisp, Overlapping or Fuzzy.
This documentation follows the same rationale.
Node Clustering¶
Algorithms falling in this category generate communities composed by nodes. The communities can represent neat, crisp, partition as well as overlapping or even fuzzy ones.
Crisp Communities¶
A clustering is said to be a partition if each node belongs to one and only one community.
Methods in this subclass return as result a NodeClustering
object instance.
agdl (g_original, number_communities, …) |
AGDL is a graph-based agglomerative algorithm, for clustering high-dimensional data. |
aslpaw (g_original) |
ASLPAw can be used for disjoint and overlapping community detection and works on weighted/unweighted and directed/undirected networks. |
async_fluid (g_original, k) |
Fluid Communities (FluidC) is based on the simple idea of fluids (i.e., communities) interacting in an environment (i.e., a non-complete graph), expanding and contracting. |
cpm (g_original[, initial_membership, …]) |
CPM is a model where the quality function to optimize is: |
chinesewhispers (g_original[, weighting, …]) |
Fuzzy graph clustering that (i) creates an intermediate representation of the input graph, which reflects the “ambiguity” of its nodes, and (ii) uses hard clustering to discover crisp clusters in such “disambiguated” intermediate graph. |
der (g_original[, walk_len, threshold, …]) |
DER is a Diffusion Entropy Reducer graph clustering algorithm. |
edmot (g_original[, component_count, cutoff]) |
The algorithm first creates the graph of higher order motifs. |
eigenvector (g_original) |
Newman’s leading eigenvector method for detecting community structure based on modularity. |
em (g_original, k) |
EM is based on based on a mixture model. |
gdmp2 (g_original[, min_threshold]) |
Gdmp2 is a method for identifying a set of dense subgraphs of a given sparse graph. |
girvan_newman (g_original, level) |
The Girvan–Newman algorithm detects communities by progressively removing edges from the original graph. |
greedy_modularity (g_original[, weight]) |
The CNM algorithm uses the modularity to find the communities strcutures. |
infomap (g_original) |
Infomap is based on ideas of information theory. |
label_propagation (g_original) |
The Label Propagation algorithm (LPA) detects communities using network structure alone. |
leiden (g_original[, initial_membership, weights]) |
The Leiden algorithm is an improvement of the Louvain algorithm. |
louvain (g_original[, weight, resolution, …]) |
Louvain maximizes a modularity score for each community. |
markov_clustering (g_original[, expansion, …]) |
The Markov clustering algorithm (MCL) is based on simulation of (stochastic) flow in graphs. |
rber_pots (g_original[, initial_membership, …]) |
rber_pots is a model where the quality function to optimize is: |
rb_pots (g_original[, initial_membership, …]) |
Rb_pots is a model where the quality function to optimize is: |
scan (g_original, epsilon, mu) |
SCAN (Structural Clustering Algorithm for Networks) is an algorithm which detects clusters, hubs and outliers in networks. |
significance_communities (g_original[, …]) |
Significance_communities is a model where the quality function to optimize is: |
spinglass (g_original) |
Spinglass relies on an analogy between a very popular statistical mechanic model called Potts spin glass, and the community structure. |
surprise_communities (g_original[, …]) |
Surprise_communities is a model where the quality function to optimize is: |
walktrap (g_original) |
walktrap is an approach based on random walks. |
sbm_dl (g_original[, B_min, B_max, deg_corr]) |
Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. |
sbm_dl_nested (g_original[, B_min, B_max, …]) |
Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. |
Overlapping Communities¶
A clustering is said to be overlapping if any generic node can be assigned to more than one community.
Methods in this subclass return as result a NodeClustering
object instance.
angel (g_original, threshold[, …]) |
Angel is a node-centric bottom-up community discovery algorithm. |
big_clam (g_original[, dimensions, …]) |
BigClam is an overlapping community detection method that scales to large networks. |
conga (g_original, number_communities) |
CONGA (Cluster-Overlap Newman Girvan Algorithm) is an algorithm for discovering overlapping communities. |
congo (g_original, number_communities[, height]) |
CONGO (CONGA Optimized) is an optimization of the CONGA algortithm. |
danmf (g_original[, layers, pre_iterations, …]) |
The procedure uses telescopic non-negative matrix factorization in order to learn a cluster memmbership distribution over nodes. |
demon (g_original, epsilon[, min_com_size]) |
Demon is a node-centric bottom-up overlapping community discovery algorithm. |
ego_networks (g_original[, level]) |
Ego-networks returns overlapping communities centered at each nodes within a given radius. |
egonet_splitter (g_original[, resolution]) |
The method first creates the egonets of nodes. |
kclique (g_original, k) |
Find k-clique communities in graph using the percolation method. |
lais2 (g_original) |
LAIS2 is an overlapping community discovery algorithm based on the density function. |
lemon (g_original, seeds[, min_com_size, …]) |
Lemon is a large scale overlapping community detection method based on local expansion via minimum one norm. |
lfm (g_original, alpha) |
LFM is based on the local optimization of a fitness function. |
multicom (g_original, seed_node) |
MULTICOM is an algorithm for detecting multiple local communities, possibly overlapping, by expanding the initial seed set. |
nmnf (g_original[, dimensions, clusters, …]) |
The procedure uses joint non-negative matrix factorization with modularity based regul;arization in order to learn a cluster memmbership distribution over nodes. |
nnsed (g_original[, dimensions, iterations, seed]) |
The procedure uses non-negative matrix factorization in order to learn an unnormalized cluster membership distribution over nodes. |
node_perception (g_original, threshold, …) |
Node perception is based on the idea of joining together small sets of nodes. |
overlapping_seed_set_expansion (g_original, seeds) |
OSSE is an overlapping community detection algorithm optimizing the conductance community score The algorithm uses a seed set expansion approach; the key idea is to find good seeds, and then expand these seed sets using the personalized PageRank clustering procedure. |
percomvc (g_original) |
The PercoMVC approach composes of two steps. |
slpa (g_original[, t, r]) |
SLPA is an overlapping community discovery that extends tha LPA. |
wCommunity (g_original[, min_bel_degree, …]) |
Algorithm to identify overlapping communities in weighted graphs |
Fuzzy Communities¶
A clustering is said to be a fuzzy if each node can belongs (with a different degree of likelihood) to more than one community.
Methods in this subclass return as result a FuzzyNodeClustering
object instance.
frc_fgsn (g_original, theta, eps, r) |
Fuzzy-Rough Community Detection on Fuzzy Granular model of Social Network. |
Node Attribute¶
Methods in this subclass return as result a AttrNodeClustering
object instance.
eva (g_original, labels[, weight, …]) |
|
ilouvain (g_original, labels, id) |
|
Bipartite Graph Communities¶
Methods in this subclass return as result a BiNodeClustering
object instance.
bimlpa (g_original[, theta, lambd]) |
BiMLPA is designed to detect the many-to-many correspondence community in bipartite networks using multi-label propagation algorithm. |
Antichain Communities¶
Methods in this subclass are designed to extract communities from Directed Acyclic Graphs (DAG) and return as result a NodeClustering
object instance.
siblinarity_antichain (g_original[, …]) |
The algorithm extract communities from a DAG that (i) respects its intrinsic order and (ii) are composed of similar nodes. |
Edge Clustering¶
Algorithms falling in this category generates communities composed by edges.
They return as result a EdgeClustering
object instance.
hierarchical_link_community (g_original) |
HLC (hierarchical link clustering) is a method to classify links into topologically related groups. |
Ensemble Methods¶
Methods to automate the execution of multiple instances of community detection algorithm(s).
Configuration Objects¶
Ranges can be specified to automate the execution of a same method while varying (part of) its inputs.
Parameter
allows to specify ranges for numeric parameters, while BoolParamter
for boolean ones.
Parameter (name, start, end, step) |
|
BoolParameter (name, value) |
Multiple Instantiation¶
Two scenarios often arise when applying community discovery algorithms to a graph: 1. the need to compare the results obtained by a give algorithm while varying its parameters 2. the need to compare the multiple algorithms
cdlib
allows to do so by leveraging, respectively, grid_execution
and pool
.
grid_execution (graph, method, parameters) |
Instantiate the specified community discovery method performing a grid search on the parameter set. |
pool (graph, methods, configurations) |
Execute on a pool of community discovery internal on the input graph. |
Optimal Configuration Search¶
In some scenarios it could be helpful delegate to the library the selection of the method parameters to obtain a partition that optimize a given quality function.
cdlib
allows to do so using the methods grid_search
and random_search
.
Finally, pool_grid_filter
generalizes such approach allowing to obtain the optimal partitions from a pool of different algorithms.
grid_search (graph, method, parameters, …) |
Returns the optimal partition of the specified graph w.r.t. |
random_search (graph, method, parameters, …) |
Returns the optimal partition of the specified graph w.r.t. |
pool_grid_filter (graph, methods, …[, …]) |
Execute a pool of community discovery internal on the input graph. |
Evaluation¶
The evaluation of Community Discovery algorithms is not an easy task.
CDlib
implements two families of evaluation strategies:
- Internal evaluation through quality scores
- External evaluation through partitions comparison
Fitness Functions¶
Fitness functions allows to summarize the characteristics of a computed set of communities. CDlib
implements the following quality scores:
avg_distance (graph, communities, **kwargs) |
Average distance. |
avg_embeddedness (graph, communities, **kwargs) |
Average embeddedness of nodes within the community. |
average_internal_degree (graph, community, …) |
The average internal degree of the community set. |
avg_transitivity (graph, communities, **kwargs) |
Average transitivity. |
conductance (graph, community, **kwargs) |
Fraction of total edge volume that points outside the community. |
cut_ratio (graph, community, **kwargs) |
Fraction of existing edges (out of all possible edges) leaving the community. |
edges_inside (graph, community, **kwargs) |
Number of edges internal to the community. |
expansion (graph, community, **kwargs) |
Number of edges per community node that point outside the cluster. |
fraction_over_median_degree (graph, …) |
Fraction of community nodes of having internal degree higher than the median degree value. |
hub_dominance (graph, communities, **kwargs) |
Hub dominance. |
internal_edge_density (graph, community, **kwargs) |
The internal density of the community set. |
normalized_cut (graph, community, **kwargs) |
Normalized variant of the Cut-Ratio |
max_odf (graph, community, **kwargs) |
Maximum fraction of edges of a node of a community that point outside the community itself. |
avg_odf (graph, community, **kwargs) |
Average fraction of edges of a node of a community that point outside the community itself. |
flake_odf (graph, community, **kwargs) |
Fraction of nodes in S that have fewer edges pointing inside than to the outside of the community. |
scaled_density (graph, communities, **kwargs) |
Scaled density. |
significance (graph, communities, **kwargs) |
Significance estimates how likely a partition of dense communities appear in a random graph. |
size (graph, communities, **kwargs) |
Size is the number of nodes in the community |
surprise (graph, communities, **kwargs) |
Surprise is statistical approach proposes a quality metric assuming that edges between vertices emerge randomly according to a hyper-geometric distribution. |
triangle_participation_ratio (graph, …) |
Fraction of community nodes that belong to a triad. |
purity (communities) |
Purity is the product of the frequencies of the most frequent labels carried by the nodes within the communities |
Among the fitness function a well-defined family of measures is the Modularity-based one:
erdos_renyi_modularity (graph, communities, …) |
Erdos-Renyi modularity is a variation of the Newman-Girvan one. |
link_modularity (graph, communities, **kwargs) |
Quality function designed for directed graphs with overlapping communities. |
modularity_density (graph, communities, **kwargs) |
The modularity density is one of several propositions that envisioned to palliate the resolution limit issue of modularity based measures. |
newman_girvan_modularity (graph, communities, …) |
Difference the fraction of intra community edges of a partition with the expected number of such edges if distributed according to a null model. |
z_modularity (graph, communities, **kwargs) |
Z-modularity is another variant of the standard modularity proposed to avoid the resolution limit. |
Some measures will return an instance of FitnessResult
that takes together min/max/mean/std values of the computed index.
FitnessResult (min, max, score, std) |
Partition Comparisons¶
It is often useful to compare different graph partition to assess their resemblance (i.e., to perform ground truth testing).
CDlib
implements the following partition comparisons scores:
adjusted_mutual_information (first_partition, …) |
Adjusted Mutual Information between two clusterings. |
adjusted_rand_index (first_partition, …) |
Rand index adjusted for chance. |
f1 (first_partition, second_partition) |
Compute the average F1 score of the optimal algorithms matches among the partitions in input. |
nf1 (first_partition, second_partition) |
Compute the Normalized F1 score of the optimal algorithms matches among the partitions in input. |
normalized_mutual_information (…) |
Normalized Mutual Information between two clusterings. |
omega (first_partition, second_partition) |
Index of resemblance for overlapping, complete coverage, network clusterings. |
overlapping_normalized_mutual_information_LFK (…) |
Overlapping Normalized Mutual Information between two clusterings. |
overlapping_normalized_mutual_information_MGH (…) |
Overlapping Normalized Mutual Information between two clusterings. |
variation_of_information (first_partition, …) |
Variation of Information among two nodes partitions. |
Some measures will return an instance of MatchingResult
that takes together mean and standard deviation values of the computed index.
MatchingResult (score, std) |
Input-Output¶
Functions to save/load CDlib
communities to/from file.
CSV format¶
The easiest way to save the result of a community discovery algorithm is to organize it in a .csv file. The following methods allows to read/write communities to/from csv.
read_community_csv (path[, delimiter, nodetype]) |
Read community list from comma separated value (csv) file. |
write_community_csv (communities, path[, …]) |
Save community structure to comma separated value (csv) file. |
Note
CSV formatting allows only to save/retrieve NodeClustering object loosing most of the metadata present in the CD computation result - e.g., algorithm name, parameters, coverage…
JSON format¶
JSON format allows to store/load community discovery algorithm results in a more comprehensive way.
read_community_json (path) |
Read community list from JSON file. |
write_community_json (communities, path) |
Generate a JSON representation of the clustering object |
Note
JSON formatting allows only to save/retrieve all kind of Clustering object maintaining all their metadata - except for the graph object instance.
Visual Analytics¶
At the end of the analytical process is it often useful to visualize the obtained results.
CDlib
provides a few built-in facilities to ease such task.
Network Visualization¶
Visualizing a graph is always a good idea (if its size is reasonable).
plot_network_clusters (graph, partition[, …]) |
Plot a graph with node color coding for communities. |
plot_community_graph (graph, partition[, …]) |
Plot a algorithms-graph with node color coding for communities. |
Analytics plots¶
Community evaluation outputs can be easily used to generate a visual representation of the main partition characteristics.
plot_sim_matrix (clusterings, scoring) |
Plot a similarity matrix between a list of clusterings, using the provided scoring function. |
plot_com_stat (com_clusters, com_fitness) |
Plot the distribution of a property among all communities for a clustering, or a list of clusterings (violin-plots) |
plot_com_properties_relation (com_clusters, …) |
Plot the relation between two properties/fitness function of a clustering |
plot_scoring (graphs, ref_partitions, …[, …]) |
Plot the scores obtained by a list of methods on a list of graphs. |
Utilities¶
CDlib
exposes a few utilities to manipulate graph objects generated with igraph
and networkx
.
Graph Transformation¶
Transform igraph
to/from networkx
objects.
convert_graph_formats (graph, desired_format) |
Converts from/to networkx/igraph |
Identifier mapping¶
Remapping of graph nodes. It is often a good idea - to limit the memory usage - to use progressive integers as node labels.
CDlib
automatically - and transparently - makes the conversion for the user, however, this step can be costly: for such reason the library also exposes facilities to directly pre/post process the network/community data.
nx_node_integer_mapping (graph) |
Maps node labels from strings to integers. |
remap_node_communities (communities, node_map) |
Apply a map to the obtained communities to retreive the original node labels |
Developer Guide¶
Bibliography¶
CDlib
was developed for research purposes.
- Reference algorithms:
- Crisp Partition:
- Girvan-Newman: Girvan, Michelle, and Mark EJ Newman. Community structure in social and biological networks. Proceedings of the national academy of sciences 99.12 (2002): 7821-7826.
- EM: Newman, Mark EJ, and Elizabeth A. Leicht. Mixture community and exploratory analysis in networks. Proceedings of the National Academy of Sciences 104.23 (2007): 9564-9569.
- SCAN: Xu, X., Yuruk, N., Feng, Z., & Schweiger, T. A. (2007, August). Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 824-833)
- GDMP2: Chen, Jie, and Yousef Saad. Dense subgraph extraction with application to community detection. IEEE Transactions on Knowledge and Data Engineering 24.7 (2012): 1216-1230.
- Spinglass: Reichardt, Jörg, and Stefan Bornholdt. Statistical mechanics of community detection. Physical Review E 74.1 (2006): 016110.
- Eigenvector: Newman, Mark EJ. Finding community structure in networks using the eigenvectors of matrices. Physical review E 74.3 (2006): 036104.
- AGDL: Zhang, W., Wang, X., Zhao, D., & Tang, X. (2012, October). Graph degree linkage: Agglomerative clustering on a directed graph. In European Conference on Computer Vision (pp. 428-441). Springer, Berlin, Heidelberg.
- Louvain: Blondel, Vincent D., et al. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008.10 (2008): P10008.
- Leiden: Traag, Vincent, Ludo Waltman, and Nees Jan van Eck. From Louvain to Leiden: guaranteeing well-connected communities. arXiv preprint arXiv:1810.08473 (2018).
- Rb_pots:
- Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review E, 74(1), 016110. 10.1103/PhysRevE.74.016110
- Leicht, E. A., & Newman, M. E. J. (2008). Community Structure in Directed Networks. Physical Review Letters, 100(11), 118703. 10.1103/PhysRevLett.100.118703
- Rber_pots: Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review E, 74(1), 016110. 10.1103/PhysRevE.74.016110
- CPM: Traag, V. A., Van Dooren, P., & Nesterov, Y. (2011). Narrow scope for resolution-limit-free community detection. Physical Review E, 84(1), 016114. 10.1103/PhysRevE.84.016114
- Significance_communities: Traag, V. A., Krings, G., & Van Dooren, P. (2013). Significant scales in community structure. Scientific Reports, 3, 2930. 10.1038/srep02930 <http://doi.org/10.1038/srep02930>
- Surprise_communities: Traag, V. A., Aldecoa, R., & Delvenne, J.-C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2), 022816. 10.1103/PhysRevE.92.022816
- Greedy_modularity: Clauset, A., Newman, M. E., & Moore, C. Finding community structure in very large networks. Physical Review E 70(6), 2004
- Infomap: Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad SciUSA 105(4):1118–1123
- Walktrap: Pons, Pascal, and Matthieu Latapy. Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10.2 (2006): 191-218.
- Label_propagation: Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical review E, 76(3), 036106.
- Async_fluid: Ferran Parés, Dario Garcia-Gasulla, Armand Vilalta, Jonatan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, Toyotaro Suzumura T. Fluid Communities: A Competitive and Highly Scalable Community Detection Algorithm.
- DER: M. Kozdoba and S. Mannor, Community Detection via Measure Space Embedding, NIPS 2015
- FRC_FGSN: Kundu, S., & Pal, S. K. (2015). Fuzzy-rough community in social networks. Pattern Recognition Letters, 67, 145-152.
- SBM_dl: Tiago P. Peixoto, Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models , Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378.
- SBM_dl_nested: Tiago P. Peixoto, Hierarchical block structures and high-resolution model selection in large networks ,Physical Review X 4.1 (2014): 011047
- Edge clustering:
- hierarchical_link_community: Ahn, Yong-Yeol, James P. Bagrow, and Sune Lehmann. Link communities reveal multiscale complexity in networks. nature 466.7307 (2010): 761.
- Markov_clustering: Enright, Anton J., Stijn Van Dongen, and Christos A. Ouzounis. An efficient algorithm for large-scale detection of protein families. Nucleic acids research 30.7 (2002): 1575-1584.
- Overlapping partition:
- Demon:
- Coscia, M., Rossetti, G., Giannotti, F., & Pedreschi, D. (2012, August). Demon: a local-first discovery method for overlapping communities. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 615-623). ACM.
- Coscia, M., Rossetti, G., Giannotti, F., & Pedreschi, D. (2014). Uncovering hierarchical and overlapping communities with a local-first approach. ACM Transactions on Knowledge Discovery from Data (TKDD), 9(1), 6.
- Angel: Rossetti, G. (2019) Exorcising the Demon: Angel, Efficient Node-Centric Community Discovery. International Conference on Complex Networks and Their Applications. Springer, Cham.
- Node_perception: Sucheta Soundarajan and John E. Hopcroft. 2015. Use of Local Group Information to Identify Communities in Networks. ACM Trans. Knowl. Discov. Data 9, 3, Article 21 (April 2015), 27 pages. DOI=http://dx.doi.org/10.1145/2700404
- Overlapping_seed_set_expansion: Whang, J. J., Gleich, D. F., & Dhillon, I. S. (2013, October). Overlapping community detection using seed set expansion. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 2099-2108). ACM.
- Kclique: Gergely Palla, Imre Derényi, Illés Farkas1, and Tamás Vicsek, Uncovering the overlapping community structure of complex networks in nature and society Nature 435, 814-818, 2005, doi:10.1038/nature03607
- LFM: Lancichinetti, Andrea, Santo Fortunato, and János Kertész. Detecting the overlapping and hierarchical community structure in complex networks New Journal of Physics 11.3 (2009): 033015.
- Lais2: Baumes, Jeffrey, Mark Goldberg, and Malik Magdon-Ismail. Efficient identification of overlapping communities. International Conference on Intelligence and Security Informatics. Springer, Berlin, Heidelberg, 2005.
- Congo: Gregory, Steve. A fast algorithm to find overlapping communities in networks. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2008.
- Conga: Gregory, Steve. An algorithm to find overlapping community structure in networks. European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg, 2007.
- Lemon: Yixuan Li, Kun He, David Bindel, John Hopcroft Uncovering the small community structure in large networks: A local spectral approach. Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, 2015.
- SLPA: Xie Jierui, Boleslaw K. Szymanski, and Xiaoming Liu. Slpa: Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process. Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on. IEEE, 2011.
- Multicom: Hollocou, Alexandre, Thomas Bonald, and Marc Lelarge. Multiple Local Community Detection. ACM SIGMETRICS Performance Evaluation Review 45.2 (2018): 76-83.
- Big_clam: Yang, J., & Leskovec, J. (2013, February). Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining (pp. 587-596). ACM.
- Reference evaluation:
- Comparison:
- Omega: Gabriel Murray, Giuseppe Carenini, and Raymond Ng. 2012. Using the omega index for evaluating abstractive algorithms detection. In Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization. Association for Computational Linguistics, Stroudsburg, PA, USA, 10-18.
- f1: Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth. In Complex Networks VII (pp. 133-144). Springer, Cham.
- nf1:
- Rossetti, G., Pappalardo, L., & Rinzivillo, S. (2016). A novel approach to evaluate algorithms detection internal on ground truth.
- Rossetti, G. (2017). : RDyn: graph benchmark handling algorithms dynamics. Journal of Complex Networks. 5(6), 893-912.
- Adjusted_rand_index: Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of classification, 2(1), 193-218.
- Adjusted_mutual_information: Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 2837-2854.
- Variation_of_information: Meila, M. (2007). Comparing clusterings - an information based distance. Journal of Multivariate Analysis, 98, 873-895. doi:10.1016/j.jmva.2006.11.013
- Overlapping_normalized_mutual_information_MGH: McDaid, A. F., Greene, D., & Hurley, N. (2011). Normalized mutual information to evaluate overlapping community finding algorithms.. arXiv preprint arXiv:1110.2515. Chicago
- Overlapping_normalized_mutual_information_LFK: Lancichinetti, A., Fortunato, S., & Kertesz, J. (2009). Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3), 033015.
- Fitness:
- Newman_girvan_modularity: Newman, M.E.J. & Girvan, M. Finding and evaluating algorithms structure in networks. Physical Review E 69, 26113(2004).
- Erdos_renyi_modularity: Erdos, P., & Renyi, A. (1959). On random graphs I. Publ. Math. Debrecen, 6, 290-297.
- Modularity_density: Li, Z., Zhang, S., Wang, R. S., Zhang, X. S., & Chen, L. (2008). Quantitative function for algorithms detection. Physical review E, 77(3), 036109.
- Z_modularity: Miyauchi, Atsushi, and Yasushi Kawase. Z-score-based modularity for algorithms detection in networks. PloS one 11.1 (2016): e0147805.
- Surprise & Significance: Traag, V. A., Aldecoa, R., & Delvenne, J. C. (2015). Detecting communities using asymptotical surprise .. Physical Review E, 92(2), 022816.
- average_internal_degree: Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences, 101(9), 2658-2663.
- conductance: Shi, J., Malik, J.: Normalized cuts and image segmentation. Departmental Papers (CIS), 107 (2000)
- cut_ratio: Fortunato, S.: Community detection in graphs. Physics reports 486(3-5), 75–174 (2010)
- edges_inside: Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences, 101(9), 2658-2663.
- expansion: Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences, 101(9), 2658-2663.
- internal_edge_density: Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences, 101(9), 2658-2663.
- normalized_cut: Shi, J., Malik, J.: Normalized cuts and image segmentation. Departmental Papers (CIS), 107 (2000)
- fraction_over_median_degree: Yang, J and Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42(1), 181–213 (2015)
- max_odf: Flake, G.W., Lawrence, S., Giles, C.L., et al.: Efficient identification of web communities. In: KDD, vol. 2000, pp. 150–160 (2000)
- avg_odf: Flake, G.W., Lawrence, S., Giles, C.L., et al.: Efficient identification of web communities. In: KDD, vol. 2000, pp. 150–160 (2000)
- flake_odf: Flake, G.W., Lawrence, S., Giles, C.L., et al.: Efficient identification of web communities. In: KDD, vol. 2000, pp. 150–160 (2000)
- triangle_participation_ratio: Yang, J and Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42(1), 181–213 (2015)
- link_modularity: Nicosia, V., Mangioni, G., Carchiolo, V., Malgeri, M.: Extending the definition of modularity to directed graphs with overlapping communities. Journal of Statistical Mechanics: Theory and Experiment 2009(03), 03024 (2009)
So far it has been used as support to the following publications:
- Hubert, M. Master Thesis. (2020) Crawling and Analysing code review networks on industry and open source data
- Pister, A., Buono, P., Fekete, J. D., Plaisant, C., & Valdivia, P. (2020). Integrating Prior Knowledge in Mixed Initiative Social Network Clustering. arXiv preprint arXiv:2005.02972.
- Mohammadmosaferi, K. K., & Naderi, H. (2020). Evolution of communities in dynamic social networks: An efficient map-based approach. Expert Systems with Applications, 147, 113221.