Topic 13: Graph Neural Networks Flashcards

Question 1

Q

Is real-world data often euclidean?

Answer

A

No it’s not, there is no specific distance to describe real-world data such as social networks, or molecules. real-world data is irregular.
Non-euclidean data often lacks the regular grid structure that is required for traditional neural networks

Question 2

Q

Name the 3 geometric priors

Answer

A

Symmetry: for a stable representation of a ML or DL component, is fulfilling of having invariance or equivariance, so transformations in such architectures are either invariance or equivariance or both.
Geometric stability: we have a stable symmetry group, and around it we have a measureable corridor of acceptable deformations and those we can even quantify. For which we then know that in our symmetric group, these distortions would not change the results
Scale separation: Our DL architecture is able to identify, extract and separate different scales of information. This can even lead to multi-scale representation (so different scales of resolution has been extracted). Traditional tools like Fourier transforms capture global frequency content, while wavelets provide localized, multi-scale representations. If a neural network architecture supports this, it can more effectively model complex data with hierarchical structure, such as images, audio, or graphs.

Question 3

Q

Name some architectures that follow the geometric priors

Answer

A

these geometric priors are fulfilled in these architectures:
CNNs, because they fulfill the invariance and equivariance in the translation network:
- Data is in a regular grid
- The information flows through the local neighbours, so we have an inherent structure (the neighbouring pixel has a relation to the current one)
- CNNs are often used in computer vision

RNNs, LSTMS, allow for time warping as a symmetry (meaning they treat two different time scales):
- We have the data in an ordered sequence (a 1D grid)
- The information flows sequentially, as we have the characteristic of time in RNNs
- The time aspect is used in language processing

Transformer with attention, we have permutations of invariance and equivariance, depending on how we define the task:
- The data is in an unordered set, or a chain of transformations between unordered sets
- This does so the information flows rather dynamically and controlled by the network (key and queries)
- This is used in e.g. language processing, and we might even have both invariance and equivariance used in this task

Graph Neural Network, they also come with permutation invariance or equivariance, depending on how we define the graph:
- the data is defined as a fixed graph structure
- Information flows along the fixed edges. these edges are meaningful in the structure, but also the nodes, as these are specific vector embeddings
- Used in recommender systems or molecule prediction

Question 4

Q

What is the geometric deep learning blueprint?

Answer

A

The blueprint are the building blocks for learning stable representations.
(Local) equivariant maps
(global) invariant maps
nonlinearity
coarsening operators.
all of this can be added together and will create the GNN.
equivariant layer = computes node-wise features
local pooling layer = graph coarsening
equivariant layer = computes node-wise features
invariant global pooling = global readout

Question 5

Q

Give a GNN example

Answer

A

Say we have an image, that image would have a graph (or be converted to a graph), the values of the graph could be:
V = colours values (pixels)
E = connectivity grid (euclidean 2D grid)
U = empty or “class”

A transformation would be like in a CNN, so arbitrary, so graph blocks inbetween which leads to a different graph.
Then we can put a classification layer on top, to then get the prediction, i.e. the global prediction (i.e. classifying images: grid of pixels as graphs with global prediction)

Question 6

Q

What is a Graph Neural Network (GNN)?

Answer

A

the U, V, E representations are inherent embeddings.
Layer N has:
U_n = universal attributes, e.g. number of nodes, longest path
V_n = vertices or node attributes, number of neighbours
E_n = edge or link attributes, edge weight

The next layer, N+1, that updated, using the update function f, has:
U_n+1 = universal attributes, e.g. number of nodes, longest path for layer N+1
V_n+1 = vertices or node attributes, number of neighbours for layer N+1
E_n+1 = edge or link attributes, edge weight for layer N+1

The input and output could be the same representation, (e.g. an adjacency list). And there could be an arbitrary number of stacked layers of graph transformations

Question 7

Q

What is the task of graph classification?

Answer

A

It’s the most straightforward.
Graph classification is a machine learning task where the model predicts overall properties of an entire graph structure.
- Input: A whole graph (e.g., a molecule)
- Goal: Predict a label or category for the entire graph

Examples:
- “Does this molecule smell?”
- “Does the graph contain two rings?”

Key operations:
- Pooling: Combines node embeddings into one graph-level vector
- Readout: Generates the final classification from the pooled vector

Analogy: Like image or document classification, but on graph-structured data

A practical example is predicting molecular properties like odor. Graphs that look similar may behave differently, and even small structural tweaks can completely change the outcome. So, GNNs must learn to capture subtle, task-relevant patterns, not just global similarity.

Question 8

Q

What is the task of edge/link classification?

Answer

A

Goal: To predict/classify links e.g. by connection strength or a label. It’s important for relation discovery/completion

How: Use message passing/graph convolution to generate embeddings for all nodes.

Example: In a scene graph, classify the relationship between people and objects in an image (e.g., “watching”, “standing on”, “holding”).

Key concept: An edge’s label depends on both the features of the connected nodes and their local graph context, it’s about relationships, not just individuals.

Analogy: Like predicting relationships in a sentence (“subject–verb–object”) or connections in a network diagram, it’s about understanding how parts relate, not just what the parts are.

Question 9

Q

What is the task of node classification?

Answer

A

Node classification predicts the role or identity of each individual node in a graph using neighborhood structure and feature propagation.

Goal: Assign a label to each node based on its local context
How: Uses deep information propagation (e.g., message passing or graph convolution)

Example:
- In a social network, predict group membership
- In language, classify part-of-speech tags using word co-occurrence graphs

Key Concept: A node’s label often depends on its neighbors

Analogy: Like image segmentation (pixel labels) or POS tagging (word labels)

Case Example: Karate Club – predict which group (John A or Mr. Hi) a member joins

The point is to have a deep propagation of information (perhaps also in pooling mechanisms).

Question 10

Q

How do you do GNN predictions using pooling information?

Answer

A

We want to do an overall prediction of the classification
- Often done using a simple linear classifier applied to the final node or graph embeddings.
- Can use dense or task-constrained layers depending on the model architecture and objective.

The output could be:
- Node-level predictions or graph-level predictions

Pooling information on the graphs:
Pooling is how we combine embeddings from nodes, edges or other graph parts into a unified representation
- pooling might be very specific to a task
- We gather the embeddings for the pooled items (V/E/U) of interest into a concatenated matrix
- Then we aggregate the gathered embeddings

Question 11

Q

What are some graph tasks?

Answer

A

Node-classification: we can infer some characteristic of nodes. we can also integrate the structure, so how the nodes are connected to each other
Link prediction: on the edges, we can identify new labels or new links between nodes
Graph classification: We can infer some properties of e.g. molecules, for example, if they’re toxic
Influence maximisation: we can infer specific structural characteristics, e.g. what is the degree of a node to others, maybe also specific embeddings or specific strengths to each other
Node clustering: we can look at the connectivity from another perspective, and try to identify if there are clusters of strongly connected nodes, or nodes connected with specific properties

Question 12

Q

What are graph embeddings?

Answer

A

The main challenge of using graphs in machine learning is that graphs lack a natural, fixed representation.
- Graphs can have arbitrary size and structure, making standard inputs like vectors or grids inapplicable.
- Representing them with adjacency matrices is often space-inefficient, especially for large or sparse graphs.
- Furthermore, different node orderings can produce different matrix representations of the same graph, leading to inconsistent outputs if the model isn’t permutation-invariant.

Question 13

Q

How do you do GNN predictions using pooling information for special cases?

Answer

A

We perhaps are trying to predict binary edge information. we therefore integrate nodes into edges. We can perhaps also have edge-level features and are trying to predict binary node information.
- we can use pooling to route where the information needs to go

Question 14

Q

How does convolution on graphs work?

Answer

A

We use polynomial filters
Spectral-based: Based on graph signal processing
- Uses eigendecomposition of the graph Laplacian
- Filters are defined in the frequency domain
- Purpose: remove noise from graph signals
- Think: Like applying a low-pass filter on a graph

Spatial-based: More common in modern GNNs
- Operates directly on the node neighborhood
- Each node updates its representation by aggregating information from its neighbors
- Filters act via information propagation distance (e.g., 1-hop, 2-hop)
- Think: Like combining info from your friends and friends-of-friends

Question 15

Q

Describe spectral convolution in a nutshell

Answer

A

We want to leverage the inherent structure in the graph, to perform some more meaningful and effetive convolution operations

Core Idea: Leverages the graph’s structure (via the Laplacian) to define convolution.
Operates in the frequency domain (like Fourier transforms for graphs).

Key Concept: Graph Laplacian 𝐿:
- Used to quantify how smooth a feature vector 𝑥 is across the graph 𝐺
- Smoothness = how much 𝑥 varies between connected nodes
- A smooth signal means similar values on neighboring nodes

Process Overview:
- Start in spatial domain: The graph is represented via adjacency or degree matrices
- Transform to spectral domain: Apply eigendecomposition of the Laplacian
𝐿 = 𝑈 Λ 𝑈^𝑇
(analogous to Fourier basis)
- Filter in spectral domain: Modify the signal (e.g., suppress noise or high frequencies)
- Transform back: Apply inverse transform to return to spatial domain

Spectral graph convolution transforms node features into the graph’s frequency space using the Laplacian, applies a filter there, and maps the result back — enabling control over signal smoothness and capturing global structure.

Question 16

Q

What is the Graph Laplacian and why is it useful in GNNs?

Answer

Study These Flashcards

A

The Graph Laplacian is a matrix that captures the structure of a graph and is used to measure the smoothness of signals over it.

Defined as 𝐿 = 𝐷 - 𝐴, where:
𝐷: degree matrix
𝐴: adjacency matrix

The normalised version helps with scale and interpretability
Symmetric for undirected graphs
Used in spectral methods to:
Perform Fourier-like transforms on graphs
Filter signals (e.g., remove noise, measure variation)
Support spectral convolution in GNNs

The Laplacian’s eigenvectors act as a basis for smooth functions over the graph.

Question 17

Q

What is message passing in GNNs?

Answer

Study These Flashcards

A

The basic approach is that each node updates itself by pooling info from neighbors. But stacking too many layers can cause the model to lose useful detail, so depth is limited when using only pooling.

Therefore we also add message passing, so we expand the spatial convolution:
1. We do a specific information transformation for all the edges and nodes. we try to learn more about what happens in the intermediate layers
2. We will also integrate some universal information, so a context vector as the main node. this will make the network’s updates aware of the global structure and not just the local neighbours

The advanced approach enhances message passing by combining local neighborhood features with a global context vector, enabling more expressive and globally-informed node updates.

Step 1: gather the embeddings from all the neighbouring nodes of the given node (these are the messages)
Step 2: Then we combine the gatherings, using an aggregation function, e.g. a sum
Step 3: Now we update the node’s embeddings, by using the aggregated message and maybe the previous node’s embedding, and this is the result for the new node

Question 18

Q

Name the advanced GNNs and how they work

Answer

Study These Flashcards

A

Graph Convolution Networks (ConGNN or CGN): Let’s consider a node embedding h of e.g. an input node. The node embedding could change through a transformation of the graph, as a specfic function we define.
A CGN updates each node by averaging the features from its neighbors, applying a learned weight, and combining it with its own previous features, layer by layer.
This is like saying: Each node listens to its neighbors, blends their opinions, and slightly adjusts its own.

GraphSAGE:
Step 1: Sample from the neighbourhood of nodes. We could e.g. say we only sample. fixed size, such as we only sample from the neighbourhood of nodes that has a degree of 3
Step 2: Aggregate all the neighbours’ feature information. We could use the sum, but we could also use the mean
Step 3: Predict the graph’s context and labels

Graph Attention Networks (GAT):
This is basically what we did in the transformer. We have a second connection to a node. We create a network that assigns some importance weights to its neighbours

Question 19

Q

What is the full architecture of GNNs/GCNs?

Answer

Study These Flashcards

A

Node Classification:
1. Graph + node features X, are the input
2. Its passed through multiple GCN layers
- Each layer aggregates neighbour features (message passing)
- Then we apply an activation function
3. The final output is a prediction for each node
Think: “Each node is updated layer-by-layer using its neighborhood → then we classify each node individually.”

Graph classification:
1. Graph + node features X, are the input
2. Pass it through one or more of the graph convolutional layers
3. We apply a pooling operation
- This reduces the graph by summarising local node groups
4. Then we apply a readout operation (like a sum or a mean)
- We convert all the node embeddings into a single vector which represents the entire graph
5. This vector goes into an MLP + softmwax to predict the graph-level label

Think: “First learn node features → combine them into one vector → classify the entire graph.”

Question 20

Q

What is a Graph AutoEncoder (GAE)?

Answer

Study These Flashcards

A

A Graph AutoEncoder (GAE) is an unsupervised learning model designed to learn node embeddings or latent representations from graph-structured data. It is inspired by the classical autoencoder architecture but adapted to operate on graphs.

For an autoencoder, remember that all we need to do, is built it in a way that from input x, towards an output, could be y, then all we need to do is to built an encoder and a decoder, to end up with a specfic latent representation.
perhaps in a way that we have a bottleneck again, which we then can use for another downstream task, as an input node for generation, so we can come up with a latent representation space, that we can indeed use to sample from.

What the GAE would add to the AE, is a bit more complexity on the representation. we have this inherent divide on this structured signal, and then we do some convolution or pooling again on either of the components to whatever information we want to extract. the decoder could e.g. be a deconvolution

Question 21

Q

Explain AlphaFold2

Answer

Study These Flashcards

A

AlphaFold2 is a deep learning model by DeepMind that predicts the 3D structure of proteins from their amino acid sequence. It uses:

Database search for similar sequences
Sequence alignment to find co-evolving residues
A Transformer-based Evoformer to model residue interactions
Iterative structure generation to output a likely 3D protein shape

It solves a 50-year biology challenge, enabling faster, cheaper, and highly accurate structure predictions, critical for biology, medicine, and drug design.

Question 22

Q

Explain the mechanism of message passing in graph neural networks (in a form of choice)

Answer

Study These Flashcards

A

The goal of GraphSAGE is to learn a representation for every node based on some combination of its neighbouring nodes, parametrised by h.

GraphSAGE:
Step 1: Sample from the neighbourhood of nodes. We could e.g. say we only sample. fixed size, such as we only sample from the neighbourhood of nodes that has a degree of 3
Step 2: Aggregate all the neighbours’ feature information. We could use the sum, but we could also use the mean
Step 3: Predict the graph’s context and labels

Each node sends and receives messages to and from its neighbors, aggregates these messages, and updates its feature based on the result.

Topic 13: Graph Neural Networks Flashcards

(22 cards)