Graph Neural Networks GNN Flashcards
GNN pooling
- if missing node info, pool edge info, etc…
- aggregation - usually sum()
- like global pool is like CNN global average pooling
message passing between parts of graphs
pool neighboring info to influence eachothers updated embeddings
- gather all neighboring “messages”
- aggregate all messages via agg. function (like sum)
- messages passed through an update function
XAI GNN Methods
- Gradient: use gradient as approximation of feature importance
- Perturbation: use output variation with respect to the perturbation of input (what attributes need to be kept to reconstruct graph)
- Surrogate: train a surrogate model using neighboring areas
- Decomposition: decompose prediction into several scores representing importance score
- Generation: learn to generate graphs that achieve optimal prediction scores
Perturbation GNN method
use output variation with respect to the perturbation of input (what attributes need to be kept to reconstruct graph)
Surrogate GNN method
train a surrogate model using neighboring areas
Decomposition GNN method
decompose prediction into several scores representing importance score
ordering GNN update
can do a weave fashion: node→node(linear), edge→edge(linear), edge→node(edge layer), node→edge(node layer)
global representations
- nodes far apart cannot transfer info (k-layers will propagate at most k-steps)
- overcome using the master node global context vector connected to all other nodes and edges
multigraphs
multi-edge graphs where a pair of nodes can share multiple types of edges
nested graph
a hypernode graph where nodes represent a graph
- useful for representing hierarchical info
- example - node representing molecules and edges represent interactions if way of transforming
hypergraph
edge connected to multiple nodes
- example - hyper-edge that connects to all nodes in a community
sampling/batching GNN
- poses a problem b/c context matters in sub-selection
-
method 1: to preserve the structure, you could randomly sample uniform num nodes
- then add neighboring nodes of distance, including their edges
- each neighborhood is considered a graph and GNN trained on batches of these subgraphs
- mask to consider node-set
-
method 2: randomly sample a single node, expand its neighborhood to distance k, then pick the other node within the expanded set
- operations terminated once a certain number of edges or subgraphs constructed
- constant size neighborhoods, picking the initial node-set, then subsampling a constant num node (e.g. random walk or Metropolis)
4 ways of sampling a graph
inductive bias in graphs
- identify patterns in the data and adding modeling components to take advantage of these attributes
- GNN should preserve explicit relationships (adjacency matrix)
- GNN should also preserve graph symmetries (permutation invariance)
- graphs structure great where interactions b/w entities is important
- GNN should transform on sets → operation order on nodes and edges should not matter
aggregation operation
- want similar inputs providing similar aggregated outputs and vice versa
- take a variable number of inputs and output the same
- sum, mean, max, variance
- mean() is good when you need normalized views
- max() is good when you want to highlight salient features
- sum() is a good balance and is used in practice more often
bag of subgraphs
graphs dual
- edge and node predictions often reduce to the same problem - “an edge prediction task on a graph G can be phrased as a node-level prediction on Gs dual
- to obtain Gs dual, convert nodes to edges (and edges to nodes)
- example - to solve edge classification on G, we can think about doing graph convolutions on Gs dual
significance of matrix multiplication on a graph
- applying multiplication multiply times propogates information
- Example - A^2, all connected nodes continue to receive information as a_i1*
graph attention network
- use weighted sum in permutation invariant fashion to communicate info between graph attributes
- method 1: scalar scoring function that assigns weights based on pairs b/w nodes (how relevant neightboring node is to center node
- method 2: inner product and nodes are transformed before scoring into query and key vectors via linear map
- scoring weights can be used as a measure of importance of an edge in relation to a task
relationship b/w transformer and GNN
- the transformer models several elements (e.g. tokens) as nodes in a fully connected graph
- attention mechanism is assigning edge embeddings to each node-pair which are used to compute attention weights
- the difference - GNN assumes sparse pattern
GNN XAI
- varies from graph to graph
- GNNExplainer extracts most relevant subgraph
- attribution assign ranked importance values to parts of the graph
generative graph modeling
- method 1: generate new graphs by sampling from learned distributions or completing a graph given a starting point
- example - novel molecular graphs with desirable attributes
- example solution - graphVAE learns connectivity by treating adjacency matrix like an image
- method 2: build graph sequentially starting w/ a graph and applying addition, subtraction, …. of nodes and edges repeatedly
node-order equivariance
- algorithms should not depend on the ordering of the nodes
- graphs have no inherent ordering present amonst nodes (e.g. contrasting with images with coordinates)
problems that can be defined over graphs