cdlib.benchmark.XMark

XMark(n: int = 2000, gamma: float = 3, beta: float = 2, m_cat: tuple = ('auto', 'auto'), theta: float = 0.3, mu: float = 0.5, avg_k: int = 10, min_com: int = 20, type_attr: str = 'categorical') → [<class 'object'>, <class 'object'>]

Returns the XMark benchmark annotated graph and planted communities.

Parameters:
  • n – Number of nodes in the created graph.
  • gamma – Power law exponent for the degree distribution of the created graph. This value must be strictly greater than one.
  • beta – Power law exponent for the community size distribution in the created graph. This value must be strictly greater than one.
  • m_cat – If the attribute type is categorical, it is the number of values in the domain of the attribute.
  • m_cont – If the attribute type is continuous, it is the number of peaks in the distribution (at least a bimodal distirbution, i.e., m_cont=2).
  • theta – If the attribute type is categorical, it specifies the percentage of noise within a cluster.
  • sigma – If the attribute type is continuous, it is the standard deviation.
  • mu – Fraction of intra-community edges incident to each node. This value must be in the interval [0, 1].
  • avg_k – esired average degree of nodes in the created graph. This value must be in the interval [0, n]. Exactly one of this and min_degree must be specified, otherwise a NetworkXError is raised.
  • min_com – Minimum size of communities in the graph. If not specified, this is set to min_degree.
  • type_attr – The attribute type. It can be “categorical” or “continuous”.
Returns:

A networkx synthetic graph, the set of communities (NodeClustering object)

Example:
>>> from cdlib.benchmark import XMark
>>> N = 2000
>>> gamma = 3
>>> beta = 2
>>> m_cat = ["auto", "auto"]
>>> theta = 0.3
>>> mu = 0.5
>>> avg_k = 10
>>> min_com = 20
>>> g, coms = XMark(n=N, gamma=gamma, beta=beta, mu=mu,
>>>                           m_cat=m_cat,
>>>                           theta=theta,
>>>                           avg_k=avg_k, min_com=min_com,
>>>                           type_attr="categorical")
References:

Salvatore Citraro, and Giulio Rossetti. “XMark: A Benchmark For Node-Attributed Community Discovery Algorithms”, 2021 (to appear)

Note

Reference implementation: https://github.com/dsalvaz/XMark