if missing node info, pool edge info, etc… aggregation - usually sum() like global pool is like CNN global average pooling

1. Gradient: use gradient as approximation of feature importance 2. Perturbation: use output variation with respect to the perturbation of input (what attributes need to be kept to reconstruct graph) 3. Surrogate: train a surrogate model using neighboring areas 4. Decomposition: decompose prediction into several scores representing importance score 5. Generation: learn to generate graphs that achieve optimal prediction scores

a hypernode graph where nodes represent a graph useful for representing hierarchical info example - node representing molecules and edges represent interactions if way of transforming

Graph Neural Networks GNN Flashcards by Rory Donovan

GNN pooling

if missing node info, pool edge info, etc…
aggregation - usually sum()
like global pool is like CNN global average pooling

How well did you know this?

Not at all

Perfectly

message passing between parts of graphs

pool neighboring info to influence eachothers updated embeddings

gather all neighboring “messages”
aggregate all messages via agg. function (like sum)
messages passed through an update function

How well did you know this?

Not at all

Perfectly

XAI GNN Methods

Gradient: use gradient as approximation of feature importance
Perturbation: use output variation with respect to the perturbation of input (what attributes need to be kept to reconstruct graph)
Surrogate: train a surrogate model using neighboring areas
Decomposition: decompose prediction into several scores representing importance score
Generation: learn to generate graphs that achieve optimal prediction scores

How well did you know this?

Not at all

Perfectly

Perturbation GNN method

use output variation with respect to the perturbation of input (what attributes need to be kept to reconstruct graph)

How well did you know this?

Not at all

Perfectly

Surrogate GNN method

train a surrogate model using neighboring areas

How well did you know this?

Not at all

Perfectly

Decomposition GNN method

decompose prediction into several scores representing importance score

How well did you know this?

Not at all

Perfectly

ordering GNN update

can do a weave fashion: node→node(linear), edge→edge(linear), edge→node(edge layer), node→edge(node layer)

How well did you know this?

Not at all

Perfectly

global representations

nodes far apart cannot transfer info (k-layers will propagate at most k-steps)
overcome using the master node global context vector connected to all other nodes and edges

How well did you know this?

Not at all

Perfectly

multigraphs

multi-edge graphs where a pair of nodes can share multiple types of edges

How well did you know this?

Not at all

Perfectly

nested graph

a hypernode graph where nodes represent a graph

useful for representing hierarchical info
example - node representing molecules and edges represent interactions if way of transforming

How well did you know this?

Not at all

Perfectly

hypergraph

edge connected to multiple nodes

example - hyper-edge that connects to all nodes in a community

How well did you know this?

Not at all

Perfectly

sampling/batching GNN

poses a problem b/c context matters in sub-selection
method 1: to preserve the structure, you could randomly sample uniform num nodes
- then add neighboring nodes of distance, including their edges
- each neighborhood is considered a graph and GNN trained on batches of these subgraphs
- mask to consider node-set
method 2: randomly sample a single node, expand its neighborhood to distance k, then pick the other node within the expanded set
- operations terminated once a certain number of edges or subgraphs constructed
- constant size neighborhoods, picking the initial node-set, then subsampling a constant num node (e.g. random walk or Metropolis)

How well did you know this?

Not at all

Perfectly

4 ways of sampling a graph

How well did you know this?

Not at all

Perfectly

inductive bias in graphs

identify patterns in the data and adding modeling components to take advantage of these attributes
GNN should preserve explicit relationships (adjacency matrix)
GNN should also preserve graph symmetries (permutation invariance)
graphs structure great where interactions b/w entities is important
GNN should transform on sets → operation order on nodes and edges should not matter

How well did you know this?

Not at all

Perfectly

aggregation operation

want similar inputs providing similar aggregated outputs and vice versa
- take a variable number of inputs and output the same
sum, mean, max, variance
- mean() is good when you need normalized views
- max() is good when you want to highlight salient features
- sum() is a good balance and is used in practice more often

How well did you know this?

Not at all

Perfectly

bag of subgraphs

How well did you know this?

Not at all

Perfectly

graphs dual

Study These Flashcards

edge and node predictions often reduce to the same problem - “an edge prediction task on a graph G can be phrased as a node-level prediction on Gs dual
to obtain Gs dual, convert nodes to edges (and edges to nodes)
example - to solve edge classification on G, we can think about doing graph convolutions on Gs dual

significance of matrix multiplication on a graph

Study These Flashcards

applying multiplication multiply times propogates information
Example - A^2, all connected nodes continue to receive information as a_i1*

graph attention network

Study These Flashcards

use weighted sum in permutation invariant fashion to communicate info between graph attributes
method 1: scalar scoring function that assigns weights based on pairs b/w nodes (how relevant neightboring node is to center node
method 2: inner product and nodes are transformed before scoring into query and key vectors via linear map
scoring weights can be used as a measure of importance of an edge in relation to a task

relationship b/w transformer and GNN

Study These Flashcards

the transformer models several elements (e.g. tokens) as nodes in a fully connected graph
attention mechanism is assigning edge embeddings to each node-pair which are used to compute attention weights
the difference - GNN assumes sparse pattern

GNN XAI

Study These Flashcards

varies from graph to graph
GNNExplainer extracts most relevant subgraph
attribution assign ranked importance values to parts of the graph

generative graph modeling

Study These Flashcards

method 1: generate new graphs by sampling from learned distributions or completing a graph given a starting point
example - novel molecular graphs with desirable attributes
example solution - graphVAE learns connectivity by treating adjacency matrix like an image
method 2: build graph sequentially starting w/ a graph and applying addition, subtraction, …. of nodes and edges repeatedly

node-order equivariance

Study These Flashcards

algorithms should not depend on the ordering of the nodes
graphs have no inherent ordering present amonst nodes (e.g. contrasting with images with coordinates)

problems that can be defined over graphs

Study These Flashcards

node representation learning

map individual neodes to fixed-size real-valued vectors * generally, compute nodes calculated in iterative process * each iteration equals a layer in standard ANNs * params 1. gaph G 2. nodes V 3. edges E 4. node v at k iteration h_v^(k) 5. feature x for node v

graph Laplacian

* Laplacian is square (n x n) * L = D - A * shows up on random walks, spectral cluster, and many more

degree of a node

* is the number of edges incident at v * D = Sigma_u(A)

polynomial of the Laplacian

polynomial feature vector convolution

Graph Convolutional Network (GCN)

Graph Attention Networks (GAT)

Graph Isomorphic Network (GIN)

Global Propagation via Embeddings

* graph-level embeddings constructed using pooling tends to ignore underlying topology * spectral convolutions do not

spectral embedding

GNNs in Practice

* can update equations using sparse matrix-vector product * allows efficient vectorized implementations of GNNs on GPUs * can use regularization such as dropout, but methods such as drop edge can boost for many GNNs

GNN Pooling Methods

1. SortPool: Sort vertices of the graph to get a fixed-size, and then apply any standard neural network architecture. 2. DiffPool: Learn to cluster vertices, build a coarser graph over clusters instead of nodes, then apply a GNN over the coarser graph. Repeat until only one cluster is left. 3. SAGPool: Apply a GNN to learn node scores, then keep only the nodes with the top scores, throwing away the rest. Repeat until only one node is left.

GNN Pooling Methods

Node Classification

* L(y_v,y^_v)= -∑y_vc log y^_vc * L_G = ∑L(y_v,y^_v)/|Labeled(G)|

Link Prediction

Node Clustering

* method 1: train GNN to predict local and global graph properties * method 2: random walk like node2vec and DeepWalk: L=∑∑log(exp(zz)/∑exp(zz) * or Noise Contrastive Estimation for large graphs

Graph Neural Networks GNN Flashcards

(40 cards)