CS224W-Machine Learning with Graph-GNNs for Recommender Systems

Recommender Systems: Task and Evaluation

Recommender system can be naturally modeled as a bipartite graph

Recommendation Task

Modern Recommender System

Top-KK Recommendation

For each user, we recommend KK items.

The goal is to include as many positive items as possible in the top-KK recommended items.

Evaluation metric: Recall @ KK (defined next)

Recommender Systems: Embedding - Based Models

Notation:

Score Function

To get the top-KK items, we need a score function for user-item interaction:

Embedding - Based Models

We consider embedding-based models for scoring user-item interactions.

Training Objective

Embedding-based models have three kinds of parameters:

Training objective: Optimize the model parameters to achieve high recall @ KK on seen (i.e., training) user-item interactions

Surrogate Loss Functions

The original training objective (recall @ KK) is not differentiable.

Two surrogate loss functions are widely-used to enable efficient gradient-based.

Surrogate losses are differentiable and should align will with the original training objective.

Binary Loss

Desirable Surrogate Loss

Loss Function: BPR Loss

Why Embedding Models Work?

Neural Graph Collaborative Filtering

Conventional Collaborative Filtering

Limitations of Shallow Encoders

NCGF\textbf{NCGF}

NGCF: Overview

NGCF Framework

Initial Node Embeddings

Neighbor Aggregation

hv(k+1)=COMBINE(hv(k),AGGR({hu(k)}uN(v)))hu(k+1)=COMBINE(hu(k),AGGR({hv(k)}uN(u)))h_v^{(k+1)} = \text{COMBINE}(h_v^{(k)},AGGR(\{h_u^{(k)}\}_{u \in N(v)})) \\ h_u^{(k+1)} = \text{COMBINE}(h_u^{(k)},AGGR(\{h_v^{(k)}\}_{u \in N(u)}))

High-order graph structure is captured through iterative neighbor aggregation.

Different architecture choices are possiblle for AGGR and COMBINE

Final Embeddings and Score Function

uhu(K), vhv(K)u \leftarrow h_u^{(K)},\ v \leftarrow h_v ^{(K)}
score(u,v)=uTvscore (u,v) = u^Tv

LightGCN\textbf{LightGCN}

Adjacency and Embedding Matrices

Matrix Formulation of GCN

A~=D12AD12\tilde{A} = D^{\frac{1}{2}}A D ^{\frac{1}{2}}
E(k+1)=ReLU(A~E(k)W(k))E^{(k+1)} = ReLU (\tilde{A}E^{(k)}W^{(k)})

Note: Different from the original GCN, self-connection is omitted here.

Simplifying GCN

E(k+1)=A~E(k)W(k)E^{(k+1)} = \tilde{A}E^{(k)}W^{(k)}
E(K)=A~KEWW=W(0)W(K1)E^{(K)} = \tilde{A}^K E W \\ W = W^{(0)} \cdots W^{(K-1)}

Multi-Scale Diffusion

α0E(0)+α1E(1)++αKE(K)\alpha_0 E^{(0)} + \alpha_1 E^{(1)} + \cdots + \alpha_KE^{(K)}

LightGCN: Model Overview

Given:

Iteratively diffuse embedding matrix EE using A~\tilde{A}. For k=0,,K1k = 0,\cdots,K-1

Average the embedding matrices at different scales.

Score function:

LightGCN: Intuition

LightGCN and GCN

hv(k+1)=uN(v)1dudvhu(k)h_v^{(k+1)} = \sum_{u \in N(v)} \frac{1}{\sqrt{d_u}\sqrt{d_v}} \cdot h_u^{(k)}

LightGCN and MF: Comparison

PinSAGE\textbf{PinSAGE}

Motivation

P2P recommendation

PinSAGE: Pin Embedding

Application: Pinterest

PinSage graph convolutional network:

Harnessing Pins and Boards

PinSAGE: Graph Neural Network

PinSAGE: Methods for Scaling Up

In addition to the GNN model, the PinSAGE introduces several methods to scale the GNN to a billion-scale recommender system (e.g., Pinterest).

PinSAGE Model

Training Data

Shared Negative Samples

PinSAGE: Curriculum Learning

Hard Negatives 1

Hard Negatives 2

PinSAGE: Negative Sampling

(q,p)(q,p) positive pairs are given but various methods to sample negative to form (q,p,n)(q,p,n)