Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) are neural networks designed to operate directly on graph-structured data. They are powerful because many real-world systems can be naturally represented as graphs.

Examples of things that can be modeled as graphs include:

Social networks (users as nodes, friendships as edges).
Molecules (atoms as nodes, chemical bonds as edges).
Knowledge graphs (entities as nodes, relations as edges).
Transportation networks (stations as nodes, routes as edges).
Recommendation systems (users and items as nodes, interactions as edges).

In this blog post, we will focus on GNNs that work with graphs having a static structure and static features.

In st-gcn-introduction (loading), we explore GNNs that handle static structure but dynamic features, i.e., time-series graphs.

GNNs are also often referred to as Graph Convolutional Networks (GCNs), since they rely on a mechanism similar to Convolutional Neural Networks (CNNs).

Graph Convolutional Networks (GCNs)

The most basic type of GNN is the Graph Convolutional Network, which can be described by the following update rule:

h_i^{(l+1)} = \sigma \left( \sum_{j \in N(i)} \frac{1}{C_{ij}} \, h_j^{(l)} W^{(l)} \right)

Where:

$h_i^{(l)}$ is the feature vector of node $i$ at layer $l$ .
$\sigma$ is a non-linear activation function.
$N(i)$ is the set of neighbors of node $i$ .
$C_{ij}$ is a normalization constant (often based on node degrees).
$W^{(l)}$ is the learnable weight matrix at layer $l$ .

Applying this operation to all nodes in the graph constitutes one GCN layer.
Stacking $L$ such layers allows information to propagate across the graph, so that after $L$ layers, each node’s representation incorporates information from nodes up to $L$ hops away.

Relational Graph Convolutional Networks (R-GCNs)

When graphs contain multiple types of edges (relations), we use Relational GCNs (R-GCNs). Their update rule is:

h_i^{(l+1)} = \sigma \left( h_i^{(l)} W_0^{(l)} + \sum_{r \in R} \sum_{j \in N_i^r} \frac{1}{C_{i,r}} \, h_j^{(l)} W_r^{(l)} \right)

Where:

$R$ is the set of relation (edge) types.
$N_i^r$ is the set of neighbors of node $i$ connected via relation $r$ .
$W_r^{(l)}$ is the weight matrix for relation type $r$ at layer $l$ .
$W_0^{(l)}$ is a self-loop weight matrix, allowing a node to preserve and transform its own features.

Here, each relation type has its own weight matrix, since different edge types carry different semantic meanings.

Graph Attention Networks (GATs)

Graph Attention Networks introduce an attention mechanism to learn the importance of neighboring nodes, instead of simply averaging their features. The update rule is:

h_i^{(l+1)} = \sigma \left( \sum_{j \in N(i)} \alpha_{ij}^{(l)} \, h_j^{(l)} W^{(l)} \right)

Where:

$\alpha_{ij}^{(l)}$ is the attention coefficient indicating the importance of node $j$ ’s features to node $i$ .

Computing Attention Coefficients

The attention coefficients $\alpha_{ij}^{(l)}$ can be computed in different ways:

1. Simple dot-product attention (no learning):

\alpha_{ij}^{(l)} = h_i^{(l)} \cdot h_j^{(l)}

2. Learnable attention mechanism:

\alpha_{ij}^{(l)} = \text{Softmax}_{k \in N(i)} \left( e_{ij}^{(l)} \right), \quad e_{ij}^{(l)} = \text{LeakyReLU} \left( a^\top [z_i^{(l)} \, ; \, z_j^{(l)}] \right)

with

z_i^{(l)} = W^{(l)} h_i^{(l)}

Where:

$a^\top$ is a learnable attention vector.
$[z_i^{(l)} \, ; \, z_j^{(l)}]$ denotes the concatenation of the two vectors.

This mechanism allows the network to adaptively focus on the most relevant neighbors for each node.

Graphs and Graph Neural Networks

Graph Neural Networks (GNNs)

Graph Convolutional Networks (GCNs)

Relational Graph Convolutional Networks (R-GCNs)

Graph Attention Networks (GATs)

Computing Attention Coefficients