data sciencegraphsaiconvolution

Graphs and Graph Neural Networks

Learn the basics of Graph Neural Networks (GNNs) and how they process graph-structured data. Explore GCNs, R-GCNs, and GATs with clear explanations and formulas.
4/15/2025
β€’
5 minutes
Learn the basics of Graph Neural Networks (GNNs) and how they process graph-structured data. Explore GCNs, R-GCNs, and GATs with clear explanations and formulas.

Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) are neural networks designed to operate directly on graph-structured data. They are powerful because many real-world systems can be naturally represented as graphs.

Examples of things that can be modeled as graphs include:

  • Social networks (users as nodes, friendships as edges).
  • Molecules (atoms as nodes, chemical bonds as edges).
  • Knowledge graphs (entities as nodes, relations as edges).
  • Transportation networks (stations as nodes, routes as edges).
  • Recommendation systems (users and items as nodes, interactions as edges).

In this blog post, we will focus on GNNs that work with graphs having a static structure and static features.

In st-gcn-introduction (loading), we explore GNNs that handle static structure but dynamic features, i.e., time-series graphs.

GNNs are also often referred to as Graph Convolutional Networks (GCNs), since they rely on a mechanism similar to Convolutional Neural Networks (CNNs).

Graph Convolutional Networks (GCNs)

The most basic type of GNN is the Graph Convolutional Network, which can be described by the following update rule:

hi(l+1)=Οƒ(βˆ‘j∈N(i)1Cij hj(l)W(l))h_i^{(l+1)} = \sigma \left( \sum_{j \in N(i)} \frac{1}{C_{ij}} \, h_j^{(l)} W^{(l)} \right)

Where:

  • hi(l)h_i^{(l)} is the feature vector of node ii at layer ll.
  • Οƒ\sigma is a non-linear activation function.
  • N(i)N(i) is the set of neighbors of node ii.
  • CijC_{ij} is a normalization constant (often based on node degrees).
  • W(l)W^{(l)} is the learnable weight matrix at layer ll.

Applying this operation to all nodes in the graph constitutes one GCN layer.
Stacking LL such layers allows information to propagate across the graph, so that after LL layers, each node’s representation incorporates information from nodes up to LL hops away.

Relational Graph Convolutional Networks (R-GCNs)

When graphs contain multiple types of edges (relations), we use Relational GCNs (R-GCNs). Their update rule is:

hi(l+1)=Οƒ(hi(l)W0(l)+βˆ‘r∈Rβˆ‘j∈Nir1Ci,r hj(l)Wr(l))h_i^{(l+1)} = \sigma \left( h_i^{(l)} W_0^{(l)} + \sum_{r \in R} \sum_{j \in N_i^r} \frac{1}{C_{i,r}} \, h_j^{(l)} W_r^{(l)} \right)

Where:

  • RR is the set of relation (edge) types.
  • NirN_i^r is the set of neighbors of node ii connected via relation rr.
  • Wr(l)W_r^{(l)} is the weight matrix for relation type rr at layer ll.
  • W0(l)W_0^{(l)} is a self-loop weight matrix, allowing a node to preserve and transform its own features.

Here, each relation type has its own weight matrix, since different edge types carry different semantic meanings.

Graph Attention Networks (GATs)

Graph Attention Networks introduce an attention mechanism to learn the importance of neighboring nodes, instead of simply averaging their features. The update rule is:

hi(l+1)=Οƒ(βˆ‘j∈N(i)Ξ±ij(l) hj(l)W(l))h_i^{(l+1)} = \sigma \left( \sum_{j \in N(i)} \alpha_{ij}^{(l)} \, h_j^{(l)} W^{(l)} \right)

Where:

  • Ξ±ij(l)\alpha_{ij}^{(l)} is the attention coefficient indicating the importance of node jj’s features to node ii.

Computing Attention Coefficients

The attention coefficients Ξ±ij(l)\alpha_{ij}^{(l)} can be computed in different ways:

1. Simple dot-product attention (no learning):

Ξ±ij(l)=hi(l)β‹…hj(l)\alpha_{ij}^{(l)} = h_i^{(l)} \cdot h_j^{(l)}

2. Learnable attention mechanism:

Ξ±ij(l)=Softmaxk∈N(i)(eij(l)),eij(l)=LeakyReLU(a⊀[zi(l) ; zj(l)])\alpha_{ij}^{(l)} = \text{Softmax}_{k \in N(i)} \left( e_{ij}^{(l)} \right), \quad e_{ij}^{(l)} = \text{LeakyReLU} \left( a^\top [z_i^{(l)} \, ; \, z_j^{(l)}] \right)

with

zi(l)=W(l)hi(l)z_i^{(l)} = W^{(l)} h_i^{(l)}

Where:

  • a⊀a^\top is a learnable attention vector.
  • [zi(l) ; zj(l)][z_i^{(l)} \, ; \, z_j^{(l)}] denotes the concatenation of the two vectors.

This mechanism allows the network to adaptively focus on the most relevant neighbors for each node.

Β© Raideno.