CS224W-Machine Learning with Graph-Introduction

Introduction to Machine Learning for Graphs

Definition Graphs are a general language for describing and analyzing entities with relations/interactions.

图的数据种类:

Event Graphs, Computer Networks, Disease Pathways, Food Webs, Particle Networks, Underground Networks, Social Networks, Economic Networks, Communication Networks, Citation Networks, Internet, Networks of Neurons, Knowledge Graphs, Regulatory Networks, Scene Graphs, Code Graphs, Molecules, 3D Shapes.

Complex domains have a rich relational structure, which can be represented as a relational graph.

为什么图深度学习难?难点在哪里?

Networks are complex.

Definition A heterogeneous graph is defined as

G=(V,E,R,T)G = (V,E,R,T)

Heterogeneous Graphs(异构图)

有向图和无向图 Directed vs. Undirected Graphs

Undirected

Directed

其他相关因素: Weights, Types, Properties, Attributes

二分图 Bipartite Graph

Graph Machine Learning Tasks

Graph Machine Learning Tasks

Node-Level Tasks

Node classification Example

Node-Level Network Structure

Goal: Characterize the structure and position of a node in the network

  • Node degree (节点度)
  • Node importance & position (重要性和位置)
    • Number of shortest paths passing through a node
    • Avg. shortest path length to other nodes
  • Substructures around the node (节点的周围的结构)

Node’s Subgraphs: Graphlets (图基元, 类似于同一种形状的子图的数量)

Graphlets are defined as small induced subgraphs of a large network that appear at any frequency.

Graphlets: A count vector of rooted subgraphs at a given node.

我们可以看到节点uu, 整个图GG, 所有的graphlets 的形状如右侧的3个小图。

So, Graphlets instances of node uu:

Graphlets of node uu:

a,b,c,d[2,1,0,2]\begin{align*} &a,b,c,d\\ &[2,1,0,2] \end{align*}

Example: AlphaFold-Solving Protein Folding

Key idea: “Spatial graph”

  • Nodes: Amino acids in a protein sequence
  • Edges: Proximity between amino acids (residues)

Link-Level Prediction Task

Link-level prediction task example

图的边预测 (Link Prediction as a Task)

Two formulations of the link prediction task:

  1. Links missing at random:

    Remove a random set of links and then aim to predict them

  1. Links over time:

    Given G[t0,t0]G[t_0,t'_0] a graph defined by edges up to time t0t'_0, output a ranked list LL of edges (not in G[t0,t0]G[t_0,t'_0]) that are predicted to appear in time G[t1,t1]G[t_1,t'_1]

    Evaluation:

    • n=Enewn = |E_{new}|: # new edges that appear during the test period [t1,t1][t_1,t'_1]
    • Take top nn elements of LL and count correct edges

Example 1: Recommender Systems

  • Users interacts with items
    • Watch movies, buy merchandise, listen to music
    • Nodes: Users and items
    • Edges: User-item interactions
  • Goal: Recommend items users might like

Example 2: Biomedical Graph Link Prediction

Query: How likely will Simvastatin and Ciprofloxacin, when taken together, break down muscle tissue?

  • Nodes: Drugs & Proteins
  • Edges: Interaction

Graph-Level Tasks

Graph-Level Features

Goal: We want make a prediction for an entire graph or a subgraph of the graph.

Example: Road Network as a Graph