TY - JOUR

T1 - Robust subgraph counting with distribution-free random graph analysis

AU - Van Leeuwaarden, Johan S.H.

AU - Stegehuis, Clara

PY - 2021/10

Y1 - 2021/10

N2 - Subgraphs such as cliques, loops, and stars play a crucial role in real-world networks. Random graph models can provide estimates for how often certain subgraphs appear, which can be tested against real-world networks. These estimated subgraph counts, however, crucially depend on the assumed degree distribution. Fitting a degree distribution to network data is challenging, in particular, for scale-free networks with power-law degrees. Therefore, in this paper we develop robust subgraph counts that do not depend on the entire degree distribution but only on its mean and mean absolute deviation (MAD), summary statistics that are easy to obtain for most real-world networks. By solving an optimization problem, we provide tight (the sharpest possible) bounds for the subgraph counts, for all possible subgraphs, and for all networks with degree distributions that share the same mean and MAD. We identify the extremal random graph that attains these tight bounds as the graph with a specific three-point degree distribution. We leverage the bounds on the maximal subgraph counts to obtain robust scaling laws for how the number of subgraphs grows as a function of the network size. The scaling laws indicate that sparse power-law networks are not the most extreme networks in terms of subgraph counts but dense power-law networks are. The robust bounds are also shown to hold for several real-world data sets.

AB - Subgraphs such as cliques, loops, and stars play a crucial role in real-world networks. Random graph models can provide estimates for how often certain subgraphs appear, which can be tested against real-world networks. These estimated subgraph counts, however, crucially depend on the assumed degree distribution. Fitting a degree distribution to network data is challenging, in particular, for scale-free networks with power-law degrees. Therefore, in this paper we develop robust subgraph counts that do not depend on the entire degree distribution but only on its mean and mean absolute deviation (MAD), summary statistics that are easy to obtain for most real-world networks. By solving an optimization problem, we provide tight (the sharpest possible) bounds for the subgraph counts, for all possible subgraphs, and for all networks with degree distributions that share the same mean and MAD. We identify the extremal random graph that attains these tight bounds as the graph with a specific three-point degree distribution. We leverage the bounds on the maximal subgraph counts to obtain robust scaling laws for how the number of subgraphs grows as a function of the network size. The scaling laws indicate that sparse power-law networks are not the most extreme networks in terms of subgraph counts but dense power-law networks are. The robust bounds are also shown to hold for several real-world data sets.

U2 - 10.1103/PhysRevE.104.044313

DO - 10.1103/PhysRevE.104.044313

M3 - Article

SN - 2470-0045

VL - 104

JO - Physical Review E

JF - Physical Review E

IS - 4

M1 - 044313

ER -