CS224W-Machine Learning with Graph-Scaling Up GNNs

Standard SGD cannot effectively train GNNs

l(θ)=1Ni=0N1li(θ)\mathcal{l} (\theta) = \frac{1}{N}\sum_{i=0}^{N-1}\mathcal{l}_i(\theta)

In mini-batch, we sample M(<<N)M(<<N) nodes independently:

Naive full-batch implementation: Generate embeddings of all the nodes at the same time:

H(k+1)=σ(A~H(k)WkT+H(k)BkT)H^{(k+1)} = \sigma(\tilde{A}H^{(k)}W_k^T+H^{(k)}B_k^T)

Full-batch implementation is not feasible for a large graphs. → Because we want to use GPU for fast training, but GPU memory is extremely limited (10GB - 80GB). The entire graph and the features can not be loaded on GPU.

Two methods perform message-passing over small subgraph in each mini-batch; only the subgraphs need to be loaded on a GPU at a time.

One method simplifies a GNN into feature-preprocessing operation (can be efficiently performed even on a CPU)

Sampling: Scaling up GNNs

Recall: Computational Graph

Computing Node Embeddings

Stochastic Training of GNNs

Issue with Stochastic Training

Neighborhood Sampling

Key idea: Construct the computational graph by (randomly) sampling at most HH neighbors at each hop.

We can use the pruned computational graph to more efficiently compute node embeddings.(剪枝操作)

Neighborhood Sampling Algorithm

Neighbor sampling for KK-layer GNN

Remarks on Neighbor Sampling

Scaling Up GNNs

Issues with Neighbor Sampling

Recall: Full Batch GNN

Update for all vVv \in V

hv(l)=COMBINE(hv(l1),AGGR({hu(l1)}uN(v)))h_v^{(l)} = COMBINE(h_v^{(l-1)},AGGR(\{h_u^{(l-1)}\}_{u \in N(v)}))

Insight from Full-batch GNN

Subgraph Sampling

Subgraph Sampling: Case Study

Exploiting Community Structure

Real-world graph exhibits community structure

Key insight: Sample a community as a subgraph. Each subgraph retains essential local connectivity pattern of the original graph.

Cluster-GCN: Overview

Cluster-GCN: Pre-processing

Cluster-GCN: Mini-batch Training

lsub(θ)=1VcvVclv(θ)l_{sub}(\theta) = \frac{1}{|V_c|}\cdot \sum_{v\in V_c}l_{v} (\theta)

Issues with Cluster-GCN

Advanced Cluster-GCN

Similar to vanilla Cluster-GCN, advanced Cluster-GCN also follows a 2-step approach.

Comparison of Time Complexity

Scaling up by Simplifying GNN Architecture

Roadmap of Simplifying GCN

Quick Overview of LightGCN

Differences to LightGCN

Simplified GCN: “SimplGCN”

Comparison with Other Methods

Potential Issue of Simplified GCN

Graph Homophily

When does Simplified GCN Work?