CS224W-Machine Learning with Graph- Knowledge Graph

Knowledge Graph Embeddings

Heterogeneous graphs: a graph with multiple relation types.

Knowledge in graph from: capture entities, types, and relationships

Nodes are entities.

Nodes are labeled with their types.

Edges between two nodes capture relationships between entities.

Knowledge graph is an example of a heterogeneous graph.

\text{KG Example}

Bibiographic Networks

Node types: paper, title, author, conference, year

Relation types: pubWhere, pubYear, hasTitle, hasAuthor, cite

Bio Knowledge Graphs

Node types: drug, disease, adverse event, protein, pathway

Relation types: has_func, causes, assoc, treats, is_a

Applications of Knowledge Graphs

Serving information

Question answering and conversation agents

Knowledge Graph Dataset

Publicly available KGs:
- FreeBase, Wikidata, Dbpedia, YAGO, NELL, etc.

Common characteristics:
- Massive: Millions of nodes and edges
- Incomplete: Many. true edges are missing

Knowledge Graph Completion

KG Representation

Edge in KG are represented as triples $(h,r,t)$
- head $(h)$ has relation $(r)$ with tail $(t)$

Key Idea:
- Model entities and relations in embedding space $\mathbb{R}^d$
  - Associate entities and relations with shallow embeddings (we do not learn a GNN here!)
- Given a triple $(h,r,t)$ , the goal is that the embedding of $(h,r)$ should be close to the embedding of $t$ .
  - How to embed $(h,r)$ ?
  - How to define score $f_r(h,t)$ ?
    - Score $f_r$ is high if $(h,r,t)$ exists, else $f_r$ is low.
$\text{Many KG Embedding Models}$

Knowledge Graph Completion: TransE

Intuition: Translation
- For a triple $(h,r,t)$ , let $\textbf{h,r,t} \in \mathbb{R}^d$ be embedding vectors

TransE: $\textbf{h} + \textbf{r} \approx \textbf{t}$ if the given link exists else $\textbf{h} + \textbf{r} \neq \textbf{t}$

Entity scoring function:
$f_r(h,t) = -||\textbf{h}+\textbf{r}-\textbf{t}||$

TransE : How to learn?

Relations in a heterogeneous KG have different properties:

Example:
- Symmetry: If the edge $(h,"Roommate",t)$ exists in KG, then the edge $(t,"Rommate",h)$ should also exist.
- Inverse relation: If the edge $(h,"Advisor",t)$ exists in KG, then the edge $(t,"Advisee",h)$ should also exist.

Four Relation Patterns

Symmetric (Antisymmetric) Relation:
- Example:
  - Symmetric: Family, Roommate
  - Antisymmetric: Hypernym (a word with a broader meaning: poodle vs. dog)

r(h,t) \Longrightarrow r(t,h) \ (r(h,t)\Longrightarrow \neg r(t,h)) \ \forall h,t

Inverse Relation:
- Example: (Advisor, Advisee)

r_2(h,t) \Longrightarrow r_1(t,h)

Composition (Transitive) Relation:
- Example: My mother’s husband is my father

r_1(x,y) \land r_2(y,z) \Longrightarrow r_3(x,z) \ \forall x,y,z

1-to-N relations:
- Example: $r$ is “StudentsOf”

r(h,t_1),r(h,t_2),\cdots,r(h,t_n)

are all True.

Antisymmetric Relations in TransE

Antisymmetric Relations:
- Example: Hypernym (a word with a broader meaning: poodle vs. dog)

r(h,t) \Longrightarrow \neg r(t,h) \ \forall h,t

TransE can model antisymmetric relations.
- $\textbf{h}+ \textbf{r} = \textbf{t}$ , but $\textbf{t} + \textbf{r} \neq \textbf{h}$

Inverse Relations in TransE

Inverse Relation:
- Example: (Advisor, Advisee)

r_2(h,t) \Longrightarrow r_1(t,h)

TransE can model inverse relations.
- $\textbf{h} + \textbf{r}_2 = \textbf{t}$ , we can set $\textbf{r}_1 = - \textbf{r}_2$

Composition in TransE

Composition (Transitive) Relation:
- Example: My mother’s husband is my father.

r_1(x,y) \land r_2(y,z) \Longrightarrow r_3(x,z) \ \forall x,y,z

TransE can mode composition relation.
$\textbf{r}_3 = \textbf{r}_1 + \textbf{r}_2$

Symmetric Relation: Limitation

Symmetric Relation:
- Example: Family, Roommate

r(h,t) \Longrightarrow r(t,h) \ \forall h,t

TransE cannot model symmetric relations only if $\textbf{r} =0, \textbf{h} =\textbf{t}$

1-to-N Relations: Limitation

1-to-N Relations:
- Example: $(h,r,t_1)$ and $(h,r,t_2)$ both exist in the knowledge graph, e.g. $r$ is “StudentsOf”

TransE cannot model 1-to-N relations
- $\textbf{t}_1$ and $\textbf{t}_2$ will map to the same vector, although they are different entities.
- $\textbf{t}_1 = \textbf{h} + \textbf{r} = \textbf{t}_2$
- $\textbf{t}_1 \neq \textbf{t}_2$

Knowledge Graph Completion: TransR

TransE models translation of any relation in the same embedding space.

TransR: model entities as vectors in the entity space $\mathbb{R}^d$ and model each relation as vector in relation space $\textbf{r} \in \mathbb{R}^k$ with $\textbf{M}_r \in \mathbb{R}^{k \times d}$ as the projection matrix.

TransR

$\textbf{h}_{\perp} = \textbf{M}_r \textbf{h} , \ \textbf{t}_{\perp} = \textbf{M}_r \textbf{t}$

Score function: $f_r(h,t) = - ||\textbf{h}_{\perp} + \textbf{r} - \textbf{t}_{\perp}||$

Symmetric Relations in TransR

TransR can model symmetric relations

\textbf{r} =0, \textbf{h}_{\perp} = \textbf{M}_r\textbf{h} = \textbf{M}_r \textbf{t} = \textbf{t}_{\perp}

Antisymmetric Relations in TransR

TransR can model antisymmetric relations:

\begin{align*} \textbf{r}\neq 0 , \textbf{M}_r\textbf{h} + \textbf{r} &= \textbf{M}_r\textbf{t}, \\ \Longrightarrow \textbf{M}_r\textbf{t} + \textbf{r} &\neq \textbf{M}_r \textbf{h} \end{align*}

1-to-N Relations in TransR

TransR can model 1-to-N relations
- We can learn $\textbf{M}_r$ so that $\textbf{t}_{\perp} = \textbf{M}_r \textbf{t}_1 = \textbf{M}_r \textbf{t}_2$
- Note that $\textbf{t}_1$ does not need to be equal to $\textbf{t}_2$ !

Inverse Relations in TransR

TransR can model inverse relations

\textbf{r}_2 = - \textbf{r}_1, \textbf{M}_{r_1} = \textbf{M}_{r_2}

then

\textbf{M}_{r_1} \textbf{t} + \textbf{r}_1 = \textbf{M}_{r_1} \textbf{h}

and

\textbf{M}_{r_2} \textbf{h} + \textbf{r}_2 = \textbf{M}_{r_2}\textbf{t}

Composition Relations in TransR

TransR can model composition relations.

TransR models a triple with linear functions. Linear functions are chainable!
- If $f(x)$ and $g(x)$ are linear, then $f(g(x))$ is also linear:
  - Let: $f(x) =a\cdot x+b, g(x) =c \cdot x + d:$ then $f(g(x)) = a(c \cdot x+d)+b$

Background:
- Def: Kernel space of a matrix $M$ :
$h \in Ker(M), \text{then}\ M \cdot h =0$

Assume $M_{r_1}g_1 = r_1$ and $M_{r_2}g_2 =r_2$
- For $r_1(x,y):$
$r_1(x,y) \ \text{exists} \Longrightarrow M_{r_1}x + r_1 = M_{r_1}y \\ \Longrightarrow M_{r_1}(y-x) = r_1, \ y-x \in g_1 + Ker(M_{r_1}) \\ \Longrightarrow y \in x+ g_1 + Ker(M_{r_1})$
- Same for $r_2(y,z)$
$r_2(y,z) \ \text{exists} \Longrightarrow M_{r_2}y+r_2 = M_{r_2}z \\ \Longrightarrow z-y \in g_2 + Ker(M_{r_2}) \Longrightarrow z \in y+g_2 + Ker(M_{r_2})$
- Then, we have
$z \in x+g_1+g_2+Ker(M_{r_1}) + Ker(M_{r_2})$
- Construct $M_{r_3},$ s.t.
$Ker(M_{r_3}) = Ker(M_{r_1})+ Ker(M_{r_2})$
- Since:
  - $dim(Ker(M_{r_3})) \geq dim(Ker(M_{r_1}))$
  - $M_{r_3}$ has the same shape as $M_{r_1}$
  we know $M_{r_3}$ exists!
- Set $r_3 = M_{r_3} (g_1+g_2)$
- We have $M_{r_3} x + r_3 = M_{r_3}z$

Knowledge Graph Completion: DistMult

New Idea: Bilinear Modeling

So far: The scoring function $f_r(h,t)$ is negative of $L_1/L_2$ distance in TransE and TransR

Idea: Use bilinear modeling: Score function
$f_r(h,t) = h \cdot A \cdot t \ \ h,t \in \mathbb{R}^k,A\in \mathbb{R}^{k\times k}$
- Problem: Too general and prone to overfitting
  - Matrix A is too expressive
- Fix: Limit A to be diagonal
  - This is called DistMult

New Idea: Bilinear Modeling

DistMult: Entities & relations are vectors in $\mathbb{R}^k$

Score function:
- $h,r,t \in \mathbb{R}^k$

f_r(h,t) = <h,r,t> = \sum_{i}h_r\cdot r_i \cdot t_i

Can be viewed as a cosine similarity between $h\cdot r$ and $t$ , where $h \cdot r$ is defined as $[h \cdot r]_i = h_i \cdot r_i$

Example:

1-to-N Relations in DistMult

1-to-N Relation:
- If $(h,r,t_1)$ and $(h,r,t_2)$ exist in the knowledge graph

DistMult can model 1-to-N relations
$<h,r,t_1> = <h,r,t_2>$

Symmetric Relations in DistMult

DistMult can naturally model symmetric relations
- Due to the commutative property of multiplication

f_r(h,t) = <h,r,t> = \sum_{i} h_i \cdot r_i \cdot t_i = <t,r,h> = f_r(t,h)

Limitation: Antisymmetric Relations

DisMult can not model antisymmetric relations
- $r(h,t)$ and $r(t,h)$ always have same score!

f_r(h,t) = <h,r,t> = <t,r,h> = f_r(t,h)

Limitation: Inverse Relations

DistMult can not model inverse relations
- Assume DistMult does model inverse relations:
$f_{r_2}(h,t) = <h,r_2,t> = <t,r_1,h> = f_{r_1}(t,h)$
- For example, $r_2 =r_1$ solves this
- But semantically this does not make sense: The embedding of “Advisor” relation should not be the same as “Advisee” relation.

Limitation: Composition Relations

DistMult can not model composition of relations
- Because dot product is commutative $(a\cdot b = b \cdot a)$ DisMult does not distinguish between head and tail entities, so it cannot model composition.

Knowledge Graph Completion: ComplEx

Based on DistMult, ComplEx embeds entities and relations in Complex vector space

ComplEx: model entities and relations using vector in $\mathbb{C}^k$

Score function

\begin{align*} f_r(h,t) &= Re(\sum_i h_i \cdot r_i \cdot \overline{t}_i) \\ & = <Re(h_i),Re(r_i),Re(t_i)> + <Re(h_i),Im(r_i),Im(t_i)> \\ &+<Im(h_i),Re(r_i),Im(t_i)> - <Im(h_i),Im(r_i),Re(t_i)> \end{align*}

Antisymmetric Relations in ComplEx

ComplEx can model antisymmetric relations
- The model is expressive enough to learn
  - High $f_r(h,t) = Re(\sum_i h_i \cdot r_i \cdot \overline{t}_i)$
  - Low $f_r(t,h) = Re(\sum_i t_i \cdot r_i \cdot \overline{h}_i)$
  Due to the asymmetric modeling using complex conjugate

Symmetric Relations in ComplEx

ComplEx can model symmetric relations
- When $Im(r) = 0$ , we have
$\begin{align*} f_r(h,t) &= Re(\sum_i h_i \cdot r_i \cdot \overline{t}_i) \\ &= \sum_iRe(r_i \cdot h_i \cdot \overline{t}_i)\\ &=\sum_i r_i \cdot Re(h_i\cdot \overline{t}_i) \\ &= \sum_i r_i \cdot Re(\overline{h}_i\cdot t_i) \\ &= \sum_i Re(r_i \cdot \overline{h}_i \cdot t_i)\\ &= f_r(t,h) \end{align*}$

Inverser Relations in ComplEX

ComplEx can model inverse relations
- $r_1 = \overline{r_2}$
- ComplEx conjugate of
  - $r_2 = \argmax_r Re(<h,r,\overline{t}>)$ is exactly
  - $r_1 = \argmax_r Re(<t,r,\overline{h}>)$

Composition and 1-to-N in ComplEx

ComplEx share the same property with DistMult
- Can not model composition relations
- Can model 1-to-N relations