CS224W-Machine Learning with Graph-Fast Neural Subgraph

Subgraphs and Motifs

Definition:

The best definition depends on the domain!

The preceding definitions define subgraphs when VVV' \subseteq V and EEE' \subseteq E, i.e. nodes and edges are taken from the original graph GG.

Graph Isomorphism

Graph isomorphism problem: Check whether two graphs are identical:

  • G1=(V1,E1)G_1 = (V_1,E_1) and G2=(V2,E2)G_2 = (V_2 ,E_2) are isomorphic if there exists a bijection f:V1V2f: V_1 \longrightarrow V_2 such that (u,v)E1(u,v) \in E_1 iff (f(u),f(v))E2(f(u),f(v)) \in E_2
    • ff is called the isomorphism

We do not know if graph isomorphism is NP-hard, nor is any polynomial algorithm found for solving graph isomorphism.

  • G2G_2 is subgraph-isomorphic to G1G_1 if some subgraph of G2G_2 is isomorphic to G1G_1
    • We also commonly say G1G_1 is a subgraph of G2G_2
    • We can use either the node-induced or edge-induced definition of subgraph
    • This problem is NP-hard
  • Case Example of Subgraphs
    • All non-isomorphic, connected, undirected graphs of size 4
    • All non-isomorphic, connected, directed graphs of size 3

Network Motifs

Motifs: Induced Subgraphs

the red region has 3 edges.

Why do we need motifs?

Subgraphs Frequency

Frequency of GQG_Q in GTG_T: number of unique subsets of nodes VTV_T of GTG_T for which the subgraph of GTG_T induced by the nodes VTV_T is isomorphic to GQG_Q

What if the dataset contains multiple graphs, and we want to compute frequency of subgraphs in the dataset?

Defining Motif Significance

Defining Random Graphs

Erdos-Renyi (ER) random graphs:

New Model: Configuration Model

Alternative for Spokes: Switching

Motif Significance Overview

Z-score for Statistical Significance

Significance Profile

Neural Subgraph Representations

Subgraph Matching

Overview of the Approach

We are going to work with node-anchored definitions:

We are going to work with node-anchored neighborhoods:

Use GNN to obtain representations of uu and vv

Predict if node uu’s neighborhood is isomorphic to node vsv's neighborhood:

Why Anchor?

Decomposing GTG_T into Neighborhoods

Idea: Order Embedding Space

Map graph AA to a point zAz_A into a high-dimensional (e.g. 64-dim) embedding space, such that zAz_A is non-negative in all dimensions.

Subgraph Order Embedding Space

Why Order Embedding Space?

Subgraph isomorphism relationship can be nicely encoded in order embedding space

Order Constraint

Order Constraint: Loss Function

Training Neural Subgraph Matching

Training Example Construction

Details:

Subgraph Predictions on New Graphs

Finding Frequent Subgraphs

Generally, finding the most frequent size-kk motifs requires solving two challenges:

Finding frequent subgraph patterns is computationally hard

Representation learning can tackle these challenges:

Representation learning can tackle these challenges:

Problem Setup: Frequent Motif Mining

SPMiner\textbf{SPMiner}

SPMiner: Key Idea

Motif Frequency Estimation

SPMiner Search Procedure

Results: Small Motifs