CS224W-Machine Learning with Graph-Reasoning over Knowledge Graphs

Goal: How to perform multi-hop reasoning over KGs?

Can we do multi-hop reasoning, i.e., answer complex queries on an incomplete, massive KG?

Predictive One-hop Queries

Path Queries

q=(va,(r1,,rn))q = (v_a,(r_1,\cdots,r_n))

Query Plan of q:q: (Query plan of path queries is a chain.)

Question: “What proteins are associated with adverse events caused by Fulvestrant?”

Traversing KG (遍历)

Start from the nodes {”Brain Bleeding”, ”Short of Breath”, ”Kidney Infection”, ”Headache”} and traverse the KG by the relation “Assoc”, we reach entities {”CASP8”, “BIRC2”, ”PIM1”}. These are the answers.

KGs are incomplete and unknown: Many relations between entities are missing or are incomplete

Due to KG incompleteness, one is not able to identify all the answer entities.

Can we first do KG completion and then traverse the completed (probabilistic) KG?

Given entity embeddings, how do we answer an arbitrary query?

How do we train the embeddings?

Answering Predictive Queries on Knowledge Graphs

General Idea

Map queries into embedding space. Learn to reason in that space.

Idea: Traversing KG in Vector Space

Embed path queries in vector space.

Follow the query plan:

Can we answer more complex queries with logic conjunction operation?

Question: Each intermediate node represents a set of entities, how do we represent it? How do we define the intersection operation in the latent space?

How can we use embeddings to implicitly impute the missing (ESR2, Assoc, Breast Cancer)?

ESR2 interacts with both BRCA1 and ESR1. Both proteins are associated with breast cancer.

Query2Box: Reasoning over KGs Using Box Embeddings

How can we answer more complex queries with logical conjunction operation?

Query plan:

  1. Each intermediate node represents a set of entities; how do we represent it?
  1. How do we define the intersection operation in the latent space?

Box Embeddings

For example, we can embed the adverse events of Fulvestrant with a box that enclose all the answer entities.

Key Insight: Intersection

Embed with Box Embedding

Things to figure out:

Notation: dd out degree, V:|V|: # entities, R|R|: # relations

Projection Operator

Cen(q)=Cen(q)+Cen(r)Off(q)=Off(q)+Off(r)\begin{align*} Cen(q') &= Cen(q) + Cen(r) \\ Off(q') &= Off(q) + Off(r) \end{align*}

“x” (cross) means the projection operator is a relation from any box and relation to a new box

How do we take intersection of boxes?

Geometric Intersection Operator

Cen(qinter)=iwiCen(qi)wi=exp(fcen(Cen(qi)))jexp(fcen(Cen(qj)))Cen(q_{inter}) = \sum_i w_i \odot Cen(q_i) \\ w_i = \frac{exp(f_{cen}(Cen(q_i)))}{\sum_j exp(f_{cen}(Cen(q_j)))}

Cen(qi)Rd,wiRdCen(q_i) \in \mathbb{R}^d, w_i \in \mathbb{R}^d

wiRdw_i \in \mathbb{R}^d is calculated by a neural network fcenf_{cen} (with trainabnle weights)

wiw_i represents a “self-attention” score for the center of each input Cen(qi)Cen(q_i)

Off(qinter)=min(Off(q1),,Off(qn))σ(foff(Off(q1),,Off(qn)))Off(q_{inter}) = min (Off(q_1),\cdots,Off(q_n)) \odot \sigma (f_{off}(Off(q_1),\cdots,Off(q_n)))

Entity-to-Box Distance

(fq(v)f_q(v) captures inverse distance of a node vv as answer to qq)

dbox(q,v)=dout(q,v)+αdin(q,v)d_{box}(q,v) = d_{out}(q,v) + \alpha \cdot d_{in}(q,v)

where 0<α<10< \alpha <1.

Extending to Union Operation

E.g.: “What drug can treat breast cancer or lung cancer?”

Example:

Example 2:

Can we embed AND-OR queries in low-dimensional vector space?

Since we cannot embed AND-OR queries in low-dimensional space, can we still handle them?

Disjunctive Normal Form

q=q1q2qmq = q_1 \lor q_2 \lor \cdots \lor q_m

where qiq_i is a conjunctive query.

Distance Between qq and an Entity

dbox(q,v)=min(dbox(q1,v),,dbox(qm,v))d_{box} (q,v) = min (d_{box}(q_1,v),\cdots,d_{box}(q_m,v))

How to train Query2box

Training Overview

Training: Details

Query Generation from Templates

How to instantiate a query template given a KG?

Overview: Start from instantiating the answer node of the query template and then iteratively instantiate the other edges and nodes until we ground all the anchor nodes.

Randomly pick one entity from KG as the root node, e.g., we pick Fulvestrant

Now we look at intersection. What we have is that the intersection of the sets of entities is Fulvestrant, then naturally the two should also contain Fulvestrant.

Example of Query2box

Example: “List male instrumentalists who play string instruments”