CS224W-Machine Learning with Graph-Heterogeneous

How to handle graphs with multiple nodes or edge types (a.k.a heterogeneous graphs)?

Goal: Learning with heterogeneous graphs

Motivation

Heterogeneous Graphs\text{Heterogeneous Graphs}

A heterogeneous graph is defined as

G=(V,E,τ,ϕ)G = (V,E,\tau,\phi)

There are other definitions for heterogeneous graphs as well - describe graphs with node & edge types.

Many Graphs are Heterogeneous Graphs\text{Many Graphs are Heterogeneous Graphs}

Observation: We can also treat types of nodes and edges as features

When do we need a heterogeneous graph?

Heterogeneous graph is a more expressive graph representation!

But it also comes with costs

There are many ways to convert a heterogeneous graph to a standard graph(that is, a homogeneous graph(同构))

Classical GNN Layers: GCN

hv(l)=σ(W(l)uN(v)hu(l1)N(v))\textbf{h}_v^{(l)} = \sigma \left(\textbf{W}^{(l)} \sum_{u\in N(v)}\frac{\textbf{h}_u^{(l-1)}}{|N(v)|}\right)

Relational GCN\text{Relational GCN}

We will extend GCN to handle heterogeneous graphs with multiple edge /relation types

We stat with a directed graph with one relation

  • How do we run GCN and update the representation of the target node A on this graph?

What is the graph has multiple relation types?

Relational GCN: Definition

hv(l+1)=σ(rRuNur1cv,rWr(l)hu(l)+W0(l)hv(l))\textbf{h}_v^{(l+1)} = \sigma \left( \sum_{r\in R}\sum_{u\in N^r_u} \frac{1}{c_{v,r}}\textbf{W}^{(l)}_r \textbf{h}_u^{(l)}+\textbf{W}_0^{(l)}\textbf{h}_v^{(l)}\right)

Normalized by node degree of the relation cv,r=Nvrc_{v,r} = |N^r_v|

RGCN: Scalability (可扩展性)

Example: Entity/Node Classification\text{Example: Entity/Node Classification}
Example: Link Prediction\text{Example: Link Prediction}

……更详细内容见其他。

Benchmark for Heterogeneous Graphs

Summary of RGCN

Heterogeneous Graph Transformer

Graph Attention Networks(GAT)

hv(l)=σ(uN(v)αvuW(l)hu(l1))\textbf{h}_v^{(l)} = \sigma (\sum_{u\in N(v)}\alpha_{vu}\textbf{W}^{(l)} \textbf{h}_u^{(l-1)})

Not all node’s neighbors are equally important

Heterogeneous Graph Transformer

Basics: Attention in Transformer

Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q,K,V) = \text{softmax}(\frac{QK^T}{\sqrt{d_k}})V

How do we obtain Q,K,VQ,K,V?

Heterogeneous Mutual Attention

Recall: Applying GAT to a homogeneous graph

Hl[t]AggregatesN(t),eE(s,t)(Attention(s,t)Message(s))H^l[t] \leftarrow \text{Aggregate}_{\forall s \in N(t), \forall e\in E(s,t)}(Attention(s,t)\cdot Message(s))

ATTheadi(s,e,t)=(Ki(s)Wϕ(e)ATTQi(t)T)Ki(s)=KLinearτ(s)i(H(l1)[s])Qi(t)=QLinearτ(t)i(H(l1)[t])\begin{align*} ATT-head^i(s,e,t) &= (K^i(s)W^{ATT}_{\phi(e)}Q^{i}(t)^T) \\ K^i(s) &=K-Linear^i_{\tau(s)}(H^{(l-1)}[s])\\ Q^i(t) &= Q-Linear^i_{\tau(t)}(H^{(l-1)}[t]) \end{align*}

A full HGT layer

H~(l)[t]=sN(t)(AttentionHGT(s,e,t)MessageHGT(s,e,t))\widetilde{H}^{(l)}[t] = \oplus_{\forall s \in N(t)}(\textbf{Attention}_{HGT}(s,e,t)\cdot \textbf{Message}_{HGT}(s,e,t))

Similarly, HGT decomposes weights with node & edge types in the message computation

MessageHGT(s,e,t)=i[1,h] MSGheadi(s,e,t)\textbf{Message}_{HGT}(s,e,t) = ||_{i \in [1,h]} \ MSG-head^i(s,e,t)
MSGheadi(s,e,t)=MLinearτ(s)i(H(l1)[s])Wϕ(e)MSGMSG-head^i(s,e,t) = M-Linear^i_{\tau(s)} (H^{(l-1)}[s])W^{MSG}_{\phi(e)}

Design Space of Heterogenous GNNs\text{Design Space of Heterogenous GNNs}

Heterogeneous message computation

Heterogeneous Aggregation

Heterogeneous GNN Layers

Heterogeneous Graph Manipulation

Heterogeneous Prediction Heads

y^v=Headnode,T(v)(hv(L))=WT(v)(H)hv(L)\hat{\textbf{y}}_v = \text{Head}_{node,T(v)}(\textbf{h}_v^{(L)}) = \textbf{W}_{T(v)}^{(H)}\textbf{h}_v^{(L)}
y^uv=Headedge,r(hu(L),hv(L))=Linearr(Conact(hu(L),hv(L)))\hat{\textbf{y}}_{uv} = \text{Head}_{edge,r} (\textbf{h}_u^{(L)},\textbf{h}_v^{(L)}) = \text{Linear}_r (\text{Conact}(\textbf{h}_u^{(L)}, \textbf{h}_v^{(L)}))
y^G=AGG(Headgraph,i({hv(L)Rd,T(v)=i}))\hat{\textbf{y}}_G = \text{AGG}(\text{Head}_{graph,i}(\{\textbf{h}_v^{(L)}\in \mathbb{R}^d,\forall T(v)=i\}))