semisupervised gcn example

yuehhua · yuehhua · commit 2f69b0434519 · 2022-02-20T06:48:50.000+08:00
diff --git a/docs/make.jl b/docs/make.jl
@@ -21,6 +21,10 @@ makedocs(
                 "Building layers" => "basics/layers.md",
                 "Graph passing" => "basics/passgraph.md"],
              "Cooperate with Flux layers" => "cooperate.md",
+             "Tutorials" =>
+                [
+                  "Semi-supervised learning with GCN" => "tutorials/semisupervised_gcn.md",
+                ],
              "Abstractions" =>
                ["Message passing scheme" => "abstractions/msgpass.md",
                 "Graph network block" => "abstractions/gn.md"],
diff --git a/docs/src/tutorials/semisupervised_gcn.md b/docs/src/tutorials/semisupervised_gcn.md
@@ -0,0 +1,135 @@
+# Semi-supervised Learning with Graph Convolution Networks (GCN)
+
+Graph convolution networks (GCN) have been considered as the first step to graph neural networks (GNN). This example will go through how to train a vanilla GCN.
+
+## Semi-supervised Learning in Graph Neural Networks
+
+The semi-supervised learning task defines a learning by given features and labels for only partial nodes in a graph. We train features and labels for partial nodes, and test the model for another partial nodes in graph.
+
+## Node Classification task
+
+In this task, we learn a node classification task which learns a model to predict labels for each node in a graph. In GCN network, node features are given and the model outputs node labels.
+
+## Step 1: Load Dataset
+
+GeometricFlux provides planetoid dataset in `GeometricFlux.Datasets`, which is provided by GraphMLDatasets. Planetoid dataset has three sub-datasets: Cora, Citeseer, PubMed. We demonstrate Cora dataset in this example. `traindata` provides the functionality for loading training data from various kinds of datasets. Dataset can be specified by the first argument, and the second for sub-datasets.
+
+```julia
+using GeometricFlux.Datasets
+
+train_X, train_y = traindata(Planetoid(), :cora)
+```
+
+`traindata` returns a pre-defined training features and labels. These features are node features.
+
+```julia
+train_X, train_y = map(x->Matrix(x), traindata(Planetoid(), :cora))
+```
+
+We can load graph from `graphdata`, and the graph is preprocessed into `SimpleGraph` type, which is provided by Graphs.
+
+```julia
+g = graphdata(Planetoid(), :cora)
+train_idx = train_indices(Planetoid(), :cora)
+```
+
+We need node indices to index a subgraph from original graph. `train_indices` gives node indices for training.
+
+## Step 2: Wrapping Graph and Features into `FeaturedGraph`
+
+`FeaturedGraph` is a container for holding a graph, node features, edge features and global features. It is provided by GraphSignals. To wrap graph and node features into `FeaturedGraph`, graph `g` should be placed as the first argument and `nf` is to specify node features.
+
+```julia
+using GraphSignals
+
+FeaturedGraph(g, nf=train_X)
+```
+
+If we want to get a subgraph from a `FeaturedGraph` object, we call `subgraph` and provide node indices `train_idx` as second argument.
+
+```julia
+subgraph(FeaturedGraph(g, nf=train_X), train_idx)
+```
+
+## Step 3: Build a GCN model
+
+A GCn model is composed of two layers of `GCNConv` and the activation function for first layer is `relu`. In the middle, a `Dropout` layer is placed. We need a `GraphParallel` to integrate with regular Flux layer, and it specifies node features go to `node_layer=Dropout(0.5)`.
+
+```julia
+model = Chain(
+    GCNConv(input_dim=>hidden_dim, relu),
+    GraphParallel(node_layer=Dropout(0.5)),
+    GCNConv(hidden_dim=>target_dim),
+    node_feature,
+)
+```
+
+Since the model input is a `FeaturedGraph` object, the model output a `FeaturedGraph` object as well. In the end of model, we get node features out from a `FeaturedGraph` object using `node_feature`.
+
+## Step 4: Loss Functions and Accuracy
+
+Then, since it is a node classification task, we define the model loss by `logitcrossentropy`, and a L2 regularization is used. In the vanilla GCN, only first layer is applied to L2 regularization and can be adjusted by hyperparameter `λ`.
+
+```julia
+l2norm(x) = sum(abs2, x)
+
+function model_loss(model, λ, batch)
+    loss = 0.f0
+    for (x, y) in batch
+        loss += logitcrossentropy(model(x), y)
+        loss += λ*sum(l2norm, Flux.params(model[1]))
+    end
+    return loss
+end
+```
+
+Accuracy for a batch and for data loader are provided.
+
+```julia
+function accuracy(model, batch::AbstractVector)
+    return mean(mean(onecold(softmax(cpu(model(x)))) .== onecold(cpu(y))) for (x, y) in batch)
+end
+
+accuracy(model, loader::DataLoader, device) = mean(accuracy(model, batch |> device) for batch in loader)
+```
+
+## Step 5: Training GCN Model
+
+We train the model with the same process as training a Flux model.
+
+```julia
+train_loader, test_loader = load_data(:cora, args.batch_size)
+
+# optimizer
+opt = ADAM(args.η)
+    
+# parameters
+ps = Flux.params(model)
+
+# training
+train_steps = 0
+@info "Start Training, total $(args.epochs) epochs"
+for epoch = 1:args.epochs
+    @info "Epoch $(epoch)"
+
+    for batch in train_loader
+        grad = gradient(() -> model_loss(model, args.λ, batch |> device), ps)
+        Flux.Optimise.update!(opt, ps, grad)
+        train_steps += 1
+    end
+end
+```
+
+So far, we complete a basic tutorial for training a GCN model!
+
+For the complete example, please check the script `examples/semisupervised_gcn.jl`.
+
+## Acceleration by Pre-computing Normalized Adjacency Matrix
+
+The training process can be slow in this example. Since we place the graph and features together in `FeaturedGraph` object, `GCNConv` will need to compute a normalized adjacency matrix in the training process. This behavior will lead to long training time. We can accelerate training process by pre-compute normalized adjacency matrix for all `FeaturedGraph` objects. To do so, we can call the following function and it will compute normalized adjacency matrix for `fg` before training. This will reduce the training time.
+
+```julia
+GraphSignals.normalized_adjacency_matrix!(fg)
+```
+
+Since the normalized adjacency matrix is used in `GCNConv`, we could pre-compute normalized adjacency matrix for it. If a layer doesn't require a normalized adjacency matrix, this step will lead to error.
diff --git a/examples/gcn_with_fixed_graph.jl b/examples/gcn_with_fixed_graph.jl
@@ -0,0 +1,129 @@
+using CUDA
+using Flux
+using Flux: onehotbatch, onecold
+using Flux.Losses: logitcrossentropy
+using Flux.Data: DataLoader
+using GeometricFlux
+using GeometricFlux.Datasets
+using GraphSignals
+using Logging: with_logger
+using Parameters: @with_kw
+using ProgressMeter: Progress, next!
+using Statistics
+using Random
+
+function load_data(dataset, batch_size)
+    # (train_X, train_y) dim: (num_features, target_dim) × 1708
+    train_X, train_y = map(x -> Matrix(x), alldata(Planetoid(), dataset))
+    # (test_X, test_y) dim: (num_features, target_dim) × 1000
+    test_X, test_y = map(x -> Matrix(x), testdata(Planetoid(), dataset))
+    g = graphdata(Planetoid(), dataset)
+    train_idx = 1:size(train_X, 2)
+    test_idx = test_indices(Planetoid(), dataset)
+
+    # padding zeros
+    tr_X = zeros(Float32, size(train_X, 1), size(train_X, 2) + size(test_X, 2))
+    te_X = zeros(Float32, size(test_X, 1), size(train_X, 2) + size(test_X, 2))
+    tr_y = zeros(Float32, size(train_y, 1), size(train_y, 2) + size(test_y, 2))
+    te_y = zeros(Float32, size(test_y, 1), size(train_y, 2) + size(test_y, 2))
+    tr_X[:, train_idx] .= train_X
+    te_X[:, test_idx] .= test_X
+    tr_y[:, train_idx] .= train_y
+    te_y[:, test_idx] .= test_y
+
+    fg = FeaturedGraph(g)
+    train_data = (repeat(tr_X, outer=(1,1,256)), repeat(tr_y, outer=(1,1,256)))
+    test_data = (repeat(te_X, outer=(1,1,32)), repeat(te_y, outer=(1,1,32)))
+    train_loader = DataLoader(train_data, batchsize=batch_size, shuffle=true)
+    test_loader = DataLoader(test_data, batchsize=batch_size, shuffle=true)
+    return train_loader, test_loader, fg, train_idx, test_idx
+end
+
+@with_kw mutable struct Args
+    η = 0.01                # learning rate
+    λ = 5f-4                # regularization paramater
+    batch_size = 32         # batch size
+    num_nodes = 2708        # number of nodes for graph
+    epochs = 200            # number of epochs
+    seed = 0                # random seed
+    cuda = true             # use GPU
+    input_dim = 1433        # input dimension
+    hidden_dim = 16         # hidden dimension
+    target_dim = 7          # target dimension
+end
+
+## Loss: cross entropy with first layer L2 regularization 
+l2norm(x) = sum(abs2, x)
+function model_loss(model, λ, X, y, idx)
+    loss = logitcrossentropy(model(X)[:,idx,:], y[:,idx,:])
+    loss += λ*sum(l2norm, Flux.params(model[1]))
+    return loss
+end
+
+function accuracy(model, X::AbstractArray, y::AbstractArray, idx)
+    return mean(onecold(softmax(cpu(model(X))[:,idx,:])) .== onecold(cpu(y)[:,idx,:]))
+end
+
+accuracy(model, loader::DataLoader, device, idx) = mean(accuracy(model, X |> device, y |> device, idx) for (X, y) in loader)
+
+function train(; kws...)
+    # load hyperparamters
+    args = Args(; kws...)
+    args.seed > 0 && Random.seed!(args.seed)
+
+    # GPU config
+    if args.cuda && CUDA.has_cuda()
+        device = gpu
+        @info "Training on GPU"
+    else
+        device = cpu
+        @info "Training on CPU"
+    end
+
+    # load Cora from Planetoid dataset
+    train_loader, test_loader, fg, train_idx, test_idx = load_data(:cora, args.batch_size)
+    
+    # build model
+    model = Chain(
+        WithGraph(fg, GCNConv(args.input_dim=>args.hidden_dim, relu)),
+        Dropout(0.5),
+        WithGraph(fg, GCNConv(args.hidden_dim=>args.target_dim)),
+    ) |> device
+
+    # ADAM optimizer
+    opt = ADAM(args.η)
+    
+    # parameters
+    ps = Flux.params(model)
+
+    # training
+    train_steps = 0
+    @info "Start Training, total $(args.epochs) epochs"
+    for epoch = 1:args.epochs
+        @info "Epoch $(epoch)"
+        progress = Progress(length(train_loader))
+
+        for (X, y) in train_loader
+            loss, back = Flux.pullback(ps) do
+                model_loss(model, args.λ, X |> device, y |> device, train_idx |> device)
+            end
+            train_acc = accuracy(model, train_loader, device, train_idx)
+            test_acc = accuracy(model, test_loader, device, test_idx)
+            grad = back(1f0)
+            Flux.Optimise.update!(opt, ps, grad)
+
+            # progress meter
+            next!(progress; showvalues=[
+                (:loss, loss),
+                (:train_accuracy, train_acc),
+                (:test_accuracy, test_acc)
+            ])
+
+            train_steps += 1
+        end
+    end
+
+    return model, args
+end
+
+model, args = train()
diff --git a/examples/semisupervised_gcn.jl b/examples/semisupervised_gcn.jl
@@ -12,9 +12,7 @@ using ProgressMeter: Progress, next!
 using Statistics
 using Random
 
-CUDA.allowscalar(false)
-
-function load_data(dataset, batch_size)
+function load_data(dataset, batch_size, train_repeats=256, test_repeats=32)
     # (train_X, train_y) dim: (num_features, target_dim) × 140
     train_X, train_y = map(x->Matrix(x), traindata(Planetoid(), dataset))
     # (test_X, test_y) dim: (num_features, target_dim) × 1000
@@ -23,8 +21,8 @@ function load_data(dataset, batch_size)
     train_idx = train_indices(Planetoid(), dataset)
     test_idx = test_indices(Planetoid(), dataset)
 
-    train_data = [(subgraph(FeaturedGraph(g, nf=train_X), train_idx), train_y) for _ in 1:100];
-    test_data = [(subgraph(FeaturedGraph(g, nf=test_X), test_idx), test_y) for _ in 1:100];
+    train_data = [(subgraph(FeaturedGraph(g, nf=train_X), train_idx), train_y) for _ in 1:train_repeats]
+    test_data = [(subgraph(FeaturedGraph(g, nf=test_X), test_idx), test_y) for _ in 1:test_repeats]
     train_batch = Flux.batch(train_data)
     test_batch = Flux.batch(test_data)
 
diff --git a/src/layers/conv.jl b/src/layers/conv.jl
@@ -39,6 +39,12 @@ end
 
 (l::GCNConv)(Ã::AbstractMatrix, x::AbstractMatrix) = l.σ.(l.weight * x * Ã .+ l.bias)
 
+function (l::GCNConv)(Ã::AbstractMatrix, X::AbstractArray)
+    z = NNlib.batched_mul(l.weight, NNlib.batched_mul(X, Ã))
+    return l.σ.(z .+ l.bias)
+end
+
+# For variable graph
 function (l::GCNConv)(fg::AbstractFeaturedGraph)
     nf = node_feature(fg)
     Ã = Zygote.ignore() do
@@ -47,9 +53,13 @@ function (l::GCNConv)(fg::AbstractFeaturedGraph)
     return ConcreteFeaturedGraph(fg, nf = l(Ã, nf))
 end
 
+# For fixed graph
+WithGraph(fg::AbstractFeaturedGraph, l::GCNConv) =
+    WithGraph(l, GraphSignals.normalized_adjacency_matrix!(fg, eltype(l.weight); selfloop=true))
+
 function (wg::WithGraph{<:GCNConv})(X::AbstractArray)
     Ã = Zygote.ignore() do
-        GraphSignals.normalized_adjacency_matrix(wg.fg, eltype(X); selfloop=true)
+        GraphSignals.normalized_adjacency_matrix(wg.fg)
     end
     return wg.layer(Ã, X)
 end