Community Events and LifeCycle¶
Community events describe the changes in the community structure of a network over time. The community structure of a network can change due to the arrival or departure of nodes, the creation or dissolution of communities, or the merging or splitting of communities.
The cdlib library provides a set of tools to analyze the evolution of communities over time, including the detection of community events and the analysis of community life cycles.
The interface of the library is designed to be as simple as possible, allowing users to easily analyze the evolution of communities in their networks.
Check the LifeCycle class for more details:
Clustering with Explicit LifeCycle¶
Some dynamic community detection algorithms (e.g., Temporal trade-off ones) provide an explicit representation of the life cycle of communities. In this case it is not necessary to detect community events as post-processing, as the life cycle of communities is already available.
To analyze such pre-computed events apply the following snippet:
from cdlib import LifeCycle
from cdlib import algorithms
import dynetx as dn
dg = dn.DynGraph()
for x in range(10):
g = nx.erdos_renyi_graph(200, 0.05)
dg.add_interactions_from(list(g.edges()), t=x)
coms = algorithms.tiles(dg, 2)
lc = LifeCycle(coms)
lc.compute_events_from_explicit_matching()
Clustering without Explicit LifeCycle¶
In case the dynamic community detection algorithm does not provide an explicit representation of the life cycle of communities, the library provides a set of tools to detect community events and analyze the life cycle of communities. In particular, the library allows to identify events following four different strategies:
Facets events definition [Failla24]
Greene events definition [Greene2010]
Asur events definition [Asur2009]
Custom events definition
The first three strategies are based on the definition of community events proposed in the literature, while the last one allows users to define their own events.
To apply one of the first three strategies, use the following snippet:
from cdlib import LifeCycle
from networkx.generators.community import LFR_benchmark_graph
tc = TemporalClustering()
for t in range(0, 10):
g = LFR_benchmark_graph(
n=250,
tau1=3,
tau2=1.5,
mu=0.1,
average_degree=5,
min_community=20,
seed=10,
)
coms = algorithms.louvain(g) # here any CDlib algorithm can be applied
tc.add_clustering(coms, t)
events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"
Note
Each strategy has its parameters that can be specified passing a dictionary to the compute_events method.
In particular, the facets strategy requires the specification of the min_branch_size parameter (default 1), while greene and asur require the specification of the threshold parameter (default 0.1).
To define custom events, use the following snippet:
from cdlib import LifeCycle
from networkx.generators.community import LFR_benchmark_graph
tc = TemporalClustering()
for t in range(0, 10):
g = LFR_benchmark_graph(
n=250,
tau1=3,
tau2=1.5,
mu=0.1,
average_degree=5,
min_community=20,
seed=10,
)
coms = algorithms.louvain(g) # here any CDlib algorithm can be applied
tc.add_clustering(coms, t)
events = LifeCycle(tc)
jaccard = lambda x, y: len(set(x) & set(y)) / len(set(x) | set(y))
events.compute_events_with_custom_matching(jaccard, threshold=0.3, two_sided=True)
In the above snippet, the jaccard function is used to define the similarity between two communities.
The threshold parameter is used to define the minimum similarity required to consider two communities one an evolution of the other.
Changing the similarity function and the threshold allows users to define their own matching strategies.
Analyzing Events and Flows¶
Once the community events have been detected, the library provides a set of tools to analyze them. Each event is characterized by a set of properties, such as the type of event, the communities involved, the nodes involved, and the time of occurrence.
Note
The library assigns a unique identifier to each community of the form t_c where t is the time of occurrence and c is the community identifier.
E.g., the community with identifier 2_3 is the community with identifier 3 at time 2.
Each tracking strategy defines a different set of events (e.g., creation, dissolution, merging, splitting).
However, cdlib generalize the concept of event breaking it down into four components. For each generic temporal community t_c it provide access to:
In flow: the set of nodes that have entered the community
t_cfrom clusters of timet-1;Out flow: the set of nodes that will leave the community
t_cat timet+1;From Events: the set of events that generate the community observed at
ttha involved clusters at timet-1;To Events: the set of events community
t_cstarts at timetthat will affect clusters at timet+1;
All these information can be summarized in a community temporal-dependency digraph called polytree.
Here an example of how to analyze community events and flows:
from cdlib import LifeCycle
from networkx.generators.community import LFR_benchmark_graph
tc = TemporalClustering()
for t in range(0, 10):
g = LFR_benchmark_graph(
n=250,
tau1=3,
tau2=1.5,
mu=0.1,
average_degree=5,
min_community=20,
seed=10,
)
coms = algorithms.louvain(g) # here any CDlib algorithm can be applied
tc.add_clustering(coms, t)
events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"
event_types = events.get_event_types() # provide the list of available events for the detected method (in this case for 'facets')
ev = events.get_event("1_2") # to compute events for all communities use the get_events() method
print(ev.out_flow) # to get the out flow of the community 1_2
print(ev.in_flow) # to get the in flow of the community 1_2
print(ev.from_event) # to get the from events of the community 1_2
print(ev.to_event) # to get the to events of the community 1_2
out_flow = events.analyze_flow("1_2", "+") # if the community id is not specified all the communities are considered
in_flow = events.analyze_flow("1_2", "-")
Each event is characterized by its degree of importance for the actual status of the community.
In particular, facets events are fuzzy events (more than one can occur at the same time) while greene and asur events are crisp events (only one can occur at the same time).
Note
Following the facets terminology, analyze_flow and analyze_flows returns a dictionary describing the flow in terms of its Unicity, Identity and Outflow.
For a detailed description of such measures refer to [Failla24]
In addition, if the temporal network comes with attributes associated to the nodes (either dynamically changing or not - i.e., political leanings), the library provides a set of tools to analyze the typicality of the events.
Setting and retreiving node attributes is straightforward:
from cdlib import LifeCycle
from networkx.generators.community import LFR_benchmark_graph
def random_leaning():
attrs = {}
for i in range(250): # 250 nodes
attrs[i] = {}
for t in range(10): # 10 time steps
attrs[i][t] = random.choice(["left", "right"])
return attrs
tc = TemporalClustering()
for t in range(0, 10):
g = LFR_benchmark_graph(
n=250,
tau1=3,
tau2=1.5,
mu=0.1,
average_degree=5,
min_community=20,
seed=10,
)
coms = algorithms.louvain(g) # here any CDlib algorithm can be applied
tc.add_clustering(coms, t)
events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"
events.set_attribute(random_leaning(), "political_leaning")
attrs = events.get_attribute("political_leaning")
events.analyze_flow("1_1", "+", attr="political_leaning") # to analyze the flow of political leaning in the community 1_1
Attributes are stored as a dictionary of dictionaries where the first key is the node id and the second key is the time step.
If such information is available, the analyze_flow method will integrate in its analysis an evaluation of flow-attribute entropy.
Visualizing Events and Flows¶
The library provides a set of tools to visualize the events and flows detected in the community structure of a network.
Note
The library uses the networkx library to represent the community structure of a network and the matplotlib / plotly library to visualize it.
Here an example of how to visualize community events, flows and polytree:
from cdlib import LifeCycle
from cdlib.viz import (
plot_flow,
plot_event_radar,
plot_event_radars,
typicality_distribution,
)
from networkx.generators.community import LFR_benchmark_graph
tc = TemporalClustering()
for t in range(0, 10):
g = LFR_benchmark_graph(
n=250,
tau1=3,
tau2=1.5,
mu=0.1,
average_degree=5,
min_community=20,
seed=10,
)
coms = algorithms.louvain(g) # here any CDlib algorithm can be applied
tc.add_clustering(coms, t)
events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"
fig = plot_flow(events)
fig.show()
fig = plot_event_radar(events, "1_2", direction="+") # only out events
fig.show()
fig = plot_event_radars(events, "1_2") # both in and out events
fig.show()
fig = typicality_distribution(events, "+")
fig.show()
dg = events.polytree()
fig = nx.draw_networkx(dg, with_labels=True)
fig.show()
For a detailed description of the available methods and parameters, check the Visual Analytics section of the cdlib reference guide.
Validating Flows¶
The library provides a set of tools to statistically validate the observed flows against null models.
Here an example of how to validate the observed flows:
from cdlib import LifeCycle
from cdlib.lifecycles.validation import validate_flow, validate_all_flows
from networkx.generators.community import LFR_benchmark_graph
tc = TemporalClustering()
for t in range(0, 10):
g = LFR_benchmark_graph(
n=250,
tau1=3,
tau2=1.5,
mu=0.1,
average_degree=5,
min_community=20,
seed=10,
)
coms = algorithms.louvain(g) # here any CDlib algorithm can be applied
tc.add_clustering(coms, t)
events = LifeCycle(tc)
events.compute_events("facets") # or "greene" or "asur"
cf = events.flow_null("1_2", "+", iterations=1000) # validate the out flow of community 1_2. Iterations define the number of randomizations to perform.
vf = events.all_flows_null("+", iterations=1000) # validate all out flows
Both validation methods return a dictionary keyed by set identifier and valued by mean, std, and p-value of the observed flow against the null model.
|
Compare the flow with a null model. |
|
Compare all flows with null models. |
Andrea Failla, Rémy Cazabet, Giulio Rossetti, Salvatore Citraro . “Redefining Event Types and Group Evolution in Temporal Data.”, arXiv preprint arXiv:2403.06771. 2024
Sitaram Asur, Parthasarathy Srinivasan, Ucar Duygu. “An event-based framework for characterizing the evolutionary behavior of interaction graphs.” ACM Transactions on Knowledge Discovery from Data (TKDD) 3.4 (2009): 1-36.
Derek Greene, Doyle Donal, Cunningham, Padraig. “Tracking the evolution of communities in dynamic social networks.” 2010 international conference on advances in social networks analysis and mining. IEEE, 2010.