Title: | Hierarchical Community Detection by Recursive Partitioning |
---|---|
Description: | Hierarchical community detection on networks by a recursive spectral partitioning strategy, which is shown to be effective and efficient in Li, Lei, Bhattacharyya, Sarkar, Bickel, and Levina (2018) <arXiv:1810.01509>. The package also includes a data generating function for a binary tree stochastic block model, a special case of stochastic block model that admits hierarchy between communities. |
Authors: | Tianxi Li [aut, cre], Lihua Lei [aut], Sharmodeep Bhattacharyya [aut], Purna Sarkar [aut], Peter Bickel [aut], Elizeveta Levina [aut] |
Maintainer: | Tianxi Li <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2024-12-19 03:52:34 UTC |
Source: | https://github.com/cran/HCD |
The package provides the implementation of the recursive partitioning strategy to clustering network nodes in a hierarchical way. It also includes the mechanism of generating networks from a binary tree stochastic block model.
Package: | HCD |
Type: | Package |
Version: | 1.0 |
Date: | 2024-01-28 |
License: | GPL (>= 2) |
Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Purnamrita Sarkar, Peter Bickel, and Elizaveta Levina.
Maintainer: Tianxi Li <[email protected]>
Li, T., Lei, L., Bhattacharyya, S., Van den Berge, K., Sarkar, P., Bickel, P.J. and Levina, E., 2022. Hierarchical community detection by recursive partitioning. Journal of the American Statistical Association, 117(538), pp.951-968.
Generates networks from binary tree stochastic block model, with provided sequence of connection probability along the tree
BTSBM(n, d, a.seq, lambda, alpha = NULL, N = 1)
BTSBM(n, d, a.seq, lambda, alpha = NULL, N = 1)
n |
number of nodes in the network |
d |
number of layers until leaves (excluding the root) |
a.seq |
the connection probability sequence along the tree, a_r, see details in the paper |
lambda |
average node degree, only used when alpha is not provided |
alpha |
the common scaling of the a_r sequence. So at the end, essentially the a_r sequence is a.seq*alpha |
N |
the number of networks to generate from the same model |
A list of objections of
A.list |
the generated network adjacency matrices |
B |
the connection probability matrix between K communities, where K = 2^d |
label |
the vector of community labels for n nodes |
P |
the connection probability matrix between the n nodes. It is the expectation of adjacency matrices, except on the diagonal |
comm.sim.mat |
the binary string similarity matrix between communities |
node.sim.mat |
the binary string similarity matrix between nodes |
Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Purnamrita Sarkar, Peter Bickel, and Elizaveta Levina.
Maintainer: Tianxi Li <[email protected]>
Li, T., Lei, L., Bhattacharyya, S., Van den Berge, K., Sarkar, P., Bickel, P.J. and Levina, E., 2022. Hierarchical community detection by recursive partitioning. Journal of the American Statistical Association, 117(538), pp.951-968.
dt <- BTSBM(n=1600,d=4,a.seq=0.2^seq(0,4),lambda=50) A <- dt$A.list[[1]]
dt <- BTSBM(n=1600,d=4,a.seq=0.2^seq(0,4),lambda=50) A <- dt$A.list[[1]]
Generates an adjacency matrix from a given probability matrix, according independent Bernoulli – the so-called inhomogeneous Erdos-Renyi model. It is used to generate new networks from a given model.
gen.A.from.P(P, undirected = TRUE)
gen.A.from.P(P, undirected = TRUE)
P |
connection probability between nodes |
undirected |
logic value. FALSE (default) if the network is undirected, so the adjacency matrix will be symmetric with only upper diagonal entries being generated as independent Bernoulli. |
An adjacency matrix
Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Purnamrita Sarkar, Peter Bickel, and Elizaveta Levina.
Maintainer: Tianxi Li <[email protected]>
Li, T., Lei, L., Bhattacharyya, S., Van den Berge, K., Sarkar, P., Bickel, P.J. and Levina, E., 2022. Hierarchical community detection by recursive partitioning. Journal of the American Statistical Association, 117(538), pp.951-968.
Hierarchical community by recursive spectral partitioning. It includes the splitting methods of spectral clustering and sign splitting, as well stopping rules for fixed stopping, non-backtracking matrix checking and edge cross-validation.
HCD(A, method = "SS", stopping = "NB", reg = FALSE, n.min = 25, D = NULL,notree=TRUE)
HCD(A, method = "SS", stopping = "NB", reg = FALSE, n.min = 25, D = NULL,notree=TRUE)
A |
adjacency matrix. Can be standard R matrix or dsCMatrix (or other type in package Matrix) |
method |
splitting method. "SS" (default) for sign splitting, "SC" for spectral clustering |
stopping |
stopping rule. "NB" (default) for non-backtracking matrix spectrum, "ECV" for edge cross-validation, "Fix"for fixed D layers of partitioning (needs D value) |
reg |
logic value on whether regularization is needed. By default it is FALSE.Set it to be TRUE will add reguarlization, which help the performance on sparse networks, but it will make the computation slower. |
n.min |
integer number. The algorithm will stop splitting if the current size is <= 2*n.min. |
D |
the number of layers to partition, if stopping=="Fix". |
notree |
logical value on whether the tree and the corresponding similarity will be computed. If TRUE (default), will not produce the data.tree object or the community similarity matrix. Only the cluster label and the tree path strings will be returned. This typically makes the runing faster. |
For stopping rules, ECV is nonparametric rank evaluation by cross-validation, a more generally applicable approach without assuming SBM or its variants. ECV is also applicable for weighted networks.So it is believed to be more robust than NB but less effective if the true model is close to BTSBM. However, the ECV is computationally much more intensive.
Notice that the algorithm does not reply on the assumption of the BTSBM. But the estimated probabiilty matrix from the output is based on the BTSBM.
A list of the following objects:
labels |
detected community labels of nodes |
ncl |
number of clusters from the algorithm |
cluster.tree |
a data.tree object for the binary tree between communities |
P |
estimated connection probability matrix between n nodes, according to BTSBM |
node.bin.sim.mat |
binary string similarity between nodes |
comm.bin.sim.mat |
binary string similarity between communities |
tree.path |
a list of strings to describe the path from root to each community along the tree |
Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Purnamrita Sarkar, Peter Bickel, and Elizaveta Levina.
Maintainer: Tianxi Li <[email protected]>
Li, T., Lei, L., Bhattacharyya, S., Van den Berge, K., Sarkar, P., Bickel, P.J. and Levina, E., 2022. Hierarchical community detection by recursive partitioning. Journal of the American Statistical Association, 117(538), pp.951-968.
dt <- BTSBM(n=1600,d=4,a.seq=0.2^seq(0,4),lambda=50) A <- dt$A.list[[1]] # you can try various versions of the algorithm as below: the Fix is fastest and ECV is slowest. system.time(HCD.result <- HCD(A,method="SC",stopping="Fix",D=4))
dt <- BTSBM(n=1600,d=4,a.seq=0.2^seq(0,4),lambda=50) A <- dt$A.list[[1]] # you can try various versions of the algorithm as below: the Fix is fastest and ECV is slowest. system.time(HCD.result <- HCD(A,method="SC",stopping="Fix",D=4))
Generate dendrogram of the HCD result.
HCDplot(hcd,mode="community",labels=NULL,main=NULL,label.cex=1)
HCDplot(hcd,mode="community",labels=NULL,main=NULL,label.cex=1)
hcd |
The result of an HCD call. |
mode |
plotting community hierarchy or node hierarchy. The default value is "community", indicating plotting hierarchy between communities. Alternatively, the plot is for all nodes, which is not recommended because usually there are too many of them. |
labels |
the labels of the each leaf of the tree. By default, the community/node index is used. The user can also specify another sequence of characters. |
main |
title of the plot. |
label.cex |
size of the leaf label in the plot. When plotting node hierarchy, typically there are too many nodes so the labels will seriously overlap. Use a smaller size (say, label.cex=0.3) may help. |
No return value, called for visualization.
Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Purnamrita Sarkar, Peter Bickel, and Elizaveta Levina.
Maintainer: Tianxi Li <[email protected]>
Li, T., Lei, L., Bhattacharyya, S., Van den Berge, K., Sarkar, P., Bickel, P.J. and Levina, E., 2022. Hierarchical community detection by recursive partitioning. Journal of the American Statistical Association, 117(538), pp.951-968.
dt <- BTSBM(n=80,d=4,a.seq=0.2^seq(0,4),lambda=20) A <- dt$A.list[[1]] system.time(HCD.result <- HCD(A,method="SC",stopping="Fix",D=4,notree=FALSE,n.min=5)) HCDplot(HCD.result,mode="community",main="Community Tree")
dt <- BTSBM(n=80,d=4,a.seq=0.2^seq(0,4),lambda=20) A <- dt$A.list[[1]] system.time(HCD.result <- HCD(A,method="SC",stopping="Fix",D=4,notree=FALSE,n.min=5)) HCDplot(HCD.result,mode="community",main="Community Tree")