***************************
Evaluation and Benchmarking
***************************

The evaluation of Community Discovery algorithms is not an easy task.
``cdlib`` implements two families of evaluation strategies:

- *Internal* evaluation through fitness scores;
- *External* evaluation through partition comparison.

Moreover, ``cdlib`` integrates both standard *synthetic network benchmarks* and *real networks with annotated ground truths*, thus allowing for testing identified communities against ground truths.

Finally, ``cdlib`` also provides a way to generate *rank* clustering results algorithms over a given input graph.


.. note::
    The following lists are aligned to CD evaluation methods available in the *GitHub main branch* of `cdlib`_.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Internal Evaluation: Fitness scores
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Fitness functions allow to summarize the characteristics of a computed set of communities. ``cdlib`` implements the following quality scores:

.. automodule:: cdlib.evaluation

.. autosummary::
    :toctree: generated/

    avg_distance
    avg_embeddedness
    average_internal_degree
    avg_transitivity
    conductance
    cut_ratio
    edges_inside
    expansion
    fraction_over_median_degree
    hub_dominance
    internal_edge_density
    normalized_cut
    max_odf
    avg_odf
    flake_odf
    scaled_density
    significance
    size
    surprise
    triangle_participation_ratio
    purity


Among the fitness function, a well-defined family of measures is the Modularity-based one:

.. autosummary::
    :toctree: generated/

    erdos_renyi_modularity
    link_modularity
    modularity_density
    modularity_overlap
    newman_girvan_modularity
    z_modularity


Some measures will return an instance of ``FitnessResult`` that takes together min/max/mean/std values of the computed index.

.. autosummary::
    :toctree: generated/

    FitnessResult

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
External Evaluation: Partition Comparisons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is often useful to compare different graph partitions to assess their resemblance.
``cdlib`` implements the following partition comparisons scores:

.. autosummary::
    :toctree: generated/

    adjusted_mutual_information
    mi
    rmi
    normalized_mutual_information
    overlapping_normalized_mutual_information_LFK
    overlapping_normalized_mutual_information_MGH
    variation_of_information
    rand_index
    adjusted_rand_index
    omega
    f1
    nf1
    southwood_index
    rogers_tanimoto_index
    sorensen_index
    dice_index
    czekanowski_index
    fowlkes_mallows_index
    jaccard_index
    sample_expected_sim
    overlap_quality
    geometric_accuracy
    classification_error
    ecs


Some measures will return an instance of ``MatchingResult`` that takes together the computed index's mean and standard deviation values.

.. autosummary::
    :toctree: generated/

    MatchingResult


^^^^^^^^^^^^^^^^^^^^
Synthetic Benchmarks
^^^^^^^^^^^^^^^^^^^^

External evaluation scores can be fruitfully used to compare alternative clusterings of the same network and to assess to what extent an identified node clustering matches a known *ground truth* partition.

To facilitate such a standard evaluation task, ``cdlib`` exposes a set of standard synthetic network generators providing topological community ground truth annotations.

In particular, ``cdlib`` make available benchmarks for:

- *static* community discovery;
- *dynamic* community discovery;
- *feature-rich* (i.e., node-attributed) community discovery.

All details can be found on the dedicated page.

.. toctree::
   :maxdepth: 1

   benchmark.rst


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Networks With Annotated Communities
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Although evaluating a topological partition against an annotated "semantic" one is not among the safest paths to follow [Peel17]_, ``cdlib`` natively integrates well-known medium-size network datasets with ground-truth communities.

Due to the non-negligible sizes of such datasets, we designed a simple API to gather them transparently from a dedicated remote repository.

All details on remote datasets can be found on the dedicated page.

.. toctree::
   :maxdepth: 1

   datasets.rst


.. _`cdlib`: https://github.com/GiulioRossetti/cdlib

.. [Peel17] Peel, Leto, Daniel B. Larremore, and Aaron Clauset. "The ground truth about metadata and community detection in networks." Science Advances 3.5 (2017): e1602548.