Community detection in a network in a nutshell
The network is a branch of the complex system. Let us have a look at the basics of a network first.
- What is a network?
A network is made of node and link, or vertex and edge, e.g., social network, internet, power grid. it can be classified based on if the link has direction or weight
2. Network representation
an adjacency matrix is a simple way to represent a network, and most adjacency matrix of real-world example is the sparse matrix:
Aij = 1 if there is a link pointing from node j to node i
Aij = 0 if nodes i and j are not connected
3. The properties of a network
a) degree distribution. see the following example, where Pk is the ratio of the number of nodes with k links over the total node number
Fig1, from Network Science by Albert-László Barabási
another example, where c is a log-log plot
Fig2, from Network Science by Albert-László Barabási
b) clustering coefficient, a variable on a node and its neighborhood, defined like this:
Where Li is the total links between all the neighbors of node I, Ki is the degree of node i.
C of a graphic the average of Ci, see the following examples:
Fig3, from Network Science by Albert-László Barabási
Besides these two properties, there is something else, like the ‘hub,’ ‘giant component,’ and so on.
3. Random network and real word network
the random network was proposed by Pál Erdős (random network also be called Pál Erdős network). The model says a random network G (N, L) has N node and L link, and nodes are connected with a probability P.
Easy to prove, Pk of a random network follows a binomial distribution.
when N is huge, it becomes Poisson distribution
However, as you can see from the table, a real-world network does not follow a Poisson distribution.
Table1, From Network Science by Albert-László Barabási
Instead, most networks have a power law.
4. What else can we get from a network?
Community detection!
Some nodes have more connections in a network, and just like a sub-group, we usually say it is a community. A natural question is how to find the community from a network, precisely what community detection does.
2003, MJ Newman published a paper, Fast algorithm for detecting community structure in networks, which used modularity to partition/cut a network
modularity is defined as follows, quoted from the article.
with modularity Q defined, we can have a greedy procedure like the following
Initially, every single node is a community. For every step, we combine a pair of communities that maximize the Q, and finally, we look back, finding out the community structure with a max Q
Newman’s greedy algorithm is commonly used in network libs like networkx and igraph.
5. Can we do better?
Yes, stochastic block model!
Stochastic blockmodels and community structure in networks, PHYSICAL REVIEW E 83, 016107 (2011)
A more modern method for community detection.
It has a likelihood function.
We need to know how many communities there are first, unfortunately. Let’s assume n community. mrs is the link number between community r and s,kr and ks are the degrees of r and s,g is community assignment, G is the graph. We randomly assign nodes into n communities and then swap nodes to maximize L (G | g).