| Title: | Multivariate Inference of Network Moments by Subsampling |
|---|---|
| Description: | Implements node subsampling methods for multivariate inference on network moments (rescaled motif counts), including: uniform node subsampling to approximate the joint distribution of multiple network moments (Algorithm 1); externally sparsified moments for density-matched comparisons (Algorithm 2); and a two-sample test for unmatchable networks with unequal edge densities via a split-and-sparsify subsampling procedure (Algorithm 3). Built-in support for V-shape (2-star), triangle, and 3-star motifs, with a user-extensible interface for arbitrary additional motifs. Parallel execution is supported via 'doParallel' and 'foreach'. Based on Qi, Hua, Li and Zhou (2024) <doi:10.48550/arXiv.2409.01599>. |
| Authors: | Mingyu Qi [aut], Chen-Wei Hua [aut], Tianxi Li [aut, cre], Wen Zhou [aut] |
| Maintainer: | Tianxi Li <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-05-22 08:05:19 UTC |
| Source: | https://github.com/cran/netsubsamp |
netsubsamp implements the three subsampling algorithms from Qi, Hua, Li and Zhou (2024) for multivariate inference on network moments (rescaled counts of graph motifs such as V-shapes, triangles, and stars) under a sparse graphon model.
Algorithm 1 – subsample_moments: Uniform node
subsampling.
Draws random node-induced subgraphs of size
from a single network and computes the vector of network
moments for each subsample. The resulting empirical distribution
consistently approximates the joint distribution of multiple network moments
for networks of size drawn from the same graphon. This provides
the foundation for any downstream inference task: confidence regions,
goodness-of-fit tests, or visualization of the joint moment distribution.
Algorithm 2 – sparsify_moments: Externally
sparsified moments.
Given two independently sampled subgraphs and and a
target density , uses to estimate the
required edge-removal probability and then randomly removes edges from
to produce a graph at the target density. Returns the
density-normalised moment vector
. This building block handles
the practical complication that the true graphon density is unknown.
Algorithm 3 – two_sample_test: Two-sample test for
unmatchable networks with unequal densities.
Tests for two unmatchable networks (size
) and (size ) that may have different edge
densities. The nodes of are split randomly into two halves;
Algorithm 2 is applied to compute an observed statistic. Node subsampling
from (again via Algorithm 2) generates a reference distribution.
A final multivariate test—either a Mahalanobis distance test or the
Cauchy combination test of Liu and Xie (2020)—yields a p-value. This
is the first subsampling-based procedure that is valid for unmatchable
networks with unequal densities.
Three motifs are built in: "v_shape" (2-star, 2 edges),
"triangle" (3 edges), and "star3" (3-star, 3 edges).
All three functions accept a motifs argument that can include
user-supplied specifications: a list with element fn, a
function(A) returning a named numeric vector of one or more
moment values, and element n_edges, an integer vector of the
corresponding edge counts used for density normalisation.
The subsampling loop in subsample_moments
and two_sample_test can be parallelised by setting
parallel = TRUE (uses doParallel and foreach).
Parallel mode requires the netsubsamp package to be installed on
all worker nodes.
Maintainer: Tianxi Li [email protected]
Authors:
Mingyu Qi
Chen-Wei Hua
Wen Zhou
Qi M, Hua C-W, Li T, Zhou W (2024). Multivariate Inference of Network Moments by Subsampling. Biometrika, 111(1), 1–18. arXiv:2409.01599.
Liu Y, Xie J (2020). Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. Journal of the American Statistical Association, 115, 393–402.
library(randnet) ## ---- Algorithm 1: approximate the joint moment distribution ---------- set.seed(1) G <- BlockModel.Gen(lambda = 15, n = 200, K = 3)$A res <- subsample_moments(G, b = 60, N_sub = 500) # Estimated density res$rho # First few rows of the N_sub x 3 moment matrix head(res$moments) ## ---- Algorithm 2: externally sparsified moments ---------------------- set.seed(2) G1 <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A G2 <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A sparsify_moments(G1, G2, rho_target = 0.08) ## ---- Algorithm 3: two-sample test under unequal densities ----------- set.seed(3) G <- BlockModel.Gen(lambda = 20, n = 400, K = 3)$A G_prime <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A # Choose rho_target as 0.7 * min(density(G), density(G_prime)) rho_G <- sum(G) / (400 * 399) rho_G_prime <- sum(G_prime) / (100 * 99) rho_target <- 0.7 * min(rho_G, rho_G_prime) res <- two_sample_test(G, G_prime, rho_target = rho_target, N_sub = 500, test = "mahalanobis") res$p_valuelibrary(randnet) ## ---- Algorithm 1: approximate the joint moment distribution ---------- set.seed(1) G <- BlockModel.Gen(lambda = 15, n = 200, K = 3)$A res <- subsample_moments(G, b = 60, N_sub = 500) # Estimated density res$rho # First few rows of the N_sub x 3 moment matrix head(res$moments) ## ---- Algorithm 2: externally sparsified moments ---------------------- set.seed(2) G1 <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A G2 <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A sparsify_moments(G1, G2, rho_target = 0.08) ## ---- Algorithm 3: two-sample test under unequal densities ----------- set.seed(3) G <- BlockModel.Gen(lambda = 20, n = 400, K = 3)$A G_prime <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A # Choose rho_target as 0.7 * min(density(G), density(G_prime)) rho_G <- sum(G) / (400 * 399) rho_G_prime <- sum(G_prime) / (100 * 99) rho_target <- 0.7 * min(rho_G, rho_G_prime) res <- two_sample_test(G, G_prime, rho_target = rho_target, N_sub = 500, test = "mahalanobis") res$p_value
Uses one adjacency matrix (G1) to estimate the sparsification
probability and a second independent adjacency matrix (G2) to
compute density-normalised network moments after random edge removal to a
common target density. The resulting statistic
is the building block for the
two-sample test in two_sample_test.
sparsify_moments( G1, G2, rho_target, motifs = list("v_shape", "triangle", "star3") )sparsify_moments( G1, G2, rho_target, motifs = list("v_shape", "triangle", "star3") )
G1 |
A square symmetric binary adjacency matrix (dense or sparse
|
G2 |
A square symmetric binary adjacency matrix from which moments are computed after edge removal. |
rho_target |
Numeric in |
motifs |
A list of motif specifications; see
|
A named numeric vector of density-normalised network moments, or
NA values if the graph after edge removal has zero density.
Estimate .
Randomly remove edges from : each edge is retained
independently with probability , yielding
.
Return for each motif .
Qi, Hua, Li and Zhou (2024). Multivariate Inference of Network Moments by Subsampling. Biometrika, 111(1), 1–18. doi:10.1093/biomet/asad056.
library(randnet) set.seed(2) G1 <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A G2 <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A sparsify_moments(G1, G2, rho_target = 0.08)library(randnet) set.seed(2) G1 <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A G2 <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A sparsify_moments(G1, G2, rho_target = 0.08)
Repeatedly draws random node-induced subgraphs of size b from
network G and computes network moments for each subsample. The
resulting empirical distribution approximates the joint distribution of
network moments for networks of size b drawn from the same
underlying graphon model, providing the foundation for downstream inference.
subsample_moments( G, b, N_sub, motifs = list("v_shape", "triangle", "star3"), parallel = FALSE, n_cores = NULL )subsample_moments( G, b, N_sub, motifs = list("v_shape", "triangle", "star3"), parallel = FALSE, n_cores = NULL )
G |
A square symmetric binary adjacency matrix (dense |
b |
Integer; the subsampling size. Must satisfy |
N_sub |
Integer; number of subsamples to draw. |
motifs |
A list of motif specifications. Each element is either:
Defaults to |
parallel |
Logical; if |
n_cores |
Integer or |
A list with:
rhoEstimated edge density of G.
momentsAn numeric matrix
of subsampled network moments. Rows for which the subsampled graph has
zero density contain NA.
motif_namesCharacter vector of moment names, in column
order of moments.
bThe subsampling size used.
Qi, Hua, Li and Zhou (2024). Multivariate Inference of Network Moments by Subsampling. Biometrika, 111(1), 1–18. doi:10.1093/biomet/asad056.
library(randnet) set.seed(1) G <- BlockModel.Gen(lambda = 15, n = 200, K = 3)$A res <- subsample_moments(G, b = 60, N_sub = 500) head(res$moments) res$rholibrary(randnet) set.seed(1) G <- BlockModel.Gen(lambda = 15, n = 200, K = 3)$A res <- subsample_moments(G, b = 60, N_sub = 500) head(res$moments) res$rho
Tests whether two unmatchable networks — differing in node set, size, and possibly edge density — arise from the same underlying graphon model. The procedure avoids the density-mismatch problem via a split-and-sparsify strategy: each network is randomly split into two halves, one used to estimate the sparsification probability and the other to compute density-normalised moments at a shared target density. Node subsampling from the larger network then generates a reference distribution against which the observed statistic is compared.
two_sample_test( G, G_prime, rho_target, N_sub = 2000L, motifs = list("v_shape", "triangle", "star3"), test = c("mahalanobis", "cauchy"), parallel = FALSE, n_cores = NULL )two_sample_test( G, G_prime, rho_target, N_sub = 2000L, motifs = list("v_shape", "triangle", "star3"), test = c("mahalanobis", "cauchy"), parallel = FALSE, n_cores = NULL )
G |
A square symmetric binary adjacency matrix (dense or sparse
|
G_prime |
A square symmetric binary adjacency matrix of size
|
rho_target |
Numeric in |
N_sub |
Integer; number of subsampling replicates. Default 2000. |
motifs |
A list of motif specifications; see
|
test |
Character; the multivariate test to apply. Either
|
parallel |
Logical; whether to parallelise the subsampling loop.
Default |
n_cores |
Integer or |
A list with:
p_valueThe p-value of the test.
test_statisticObserved test statistic: Mahalanobis
distance when test = "mahalanobis", or Cauchy
combination statistic when test = "cauchy".
observed_momentsNamed numeric vector; the observed
statistic.
null_moments matrix of
null statistics (including any NA rows from
degenerate subsamples).
null_statisticNumeric vector of per-replicate null
Mahalanobis distances (or marginal p-values for "cauchy").
marginal_p_values(Only for test = "cauchy".) Named
numeric vector of per-motif marginal p-values.
methodThe test method used.
motif_namesCharacter vector of moment names.
Randomly partition the nodes of G_prime into two equal
halves and ; compute the observed statistic
via Algorithm 2.
For : independently draw
two sets of nodes from and compute
via Algorithm 2.
Compare the observed statistic against the empirical null
distribution using the selected test.
rho_target
A recommended default is
,
with . Values close to 0 discard too many
edges and inflate variance; values too close to
leave little room for edge removal
and can break the density-matching property. A value of
(corresponding to
rho_target = 0.7 * min(rho_hat_G, rho_hat_G_prime)) works well
across a range of settings.
Computes a multivariate distance that fully exploits the joint distribution of all moments. The p-value is the proportion of null distances exceeding the observed distance. This test is more powerful when the signal lies in the joint rather than the marginal distribution of the moments.
Combines marginal p-values via the Cauchy combination test of Liu & Xie (2020). Controls type I error under arbitrary dependence but ignores cross-moment correlation.
Qi, Hua, Li and Zhou (2024). Multivariate Inference of Network Moments by Subsampling. Biometrika, 111(1), 1–18. doi:10.1093/biomet/asad056.
Liu, Y. and Xie, J. (2020). Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. Journal of the American Statistical Association, 115, 393–402. doi:10.1080/01621459.2018.1554485.
library(randnet) set.seed(3) G <- BlockModel.Gen(lambda = 20, n = 400, K = 3)$A G_prime <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A # Choose rho_target as 0.7 * min(density(G), density(G_prime)) rho_G <- sum(G) / (400 * 399) rho_G_prime <- sum(G_prime) / (100 * 99) rho_target <- 0.7 * min(rho_G, rho_G_prime) res <- two_sample_test(G, G_prime, rho_target = rho_target, N_sub = 500, test = "mahalanobis") res$p_valuelibrary(randnet) set.seed(3) G <- BlockModel.Gen(lambda = 20, n = 400, K = 3)$A G_prime <- BlockModel.Gen(lambda = 15, n = 100, K = 3)$A # Choose rho_target as 0.7 * min(density(G), density(G_prime)) rho_G <- sum(G) / (400 * 399) rho_G_prime <- sum(G_prime) / (100 * 99) rho_target <- 0.7 * min(rho_G, rho_G_prime) res <- two_sample_test(G, G_prime, rho_target = rho_target, N_sub = 500, test = "mahalanobis") res$p_value