PyTorch图神经网络实践(二)自定义图数据

    科技2025-06-26  5

    文章目录

    数据类型简单案例创建一个图创建Data示例自带函数添加属性节点分类 完整代码

    数据类型

    PyTorch Geometric定义了自己的数据类型。

    节点和节点之间的边构成了图。在PyTorch Geometric中,如果要构建图,那么需要两个要素:节点和边。PyTorch Geometric 提供了torch_geometric.data.Data 用于构建图,包括 5 个属性,每一个属性都不是必须的,可以为空。

    data.x: 用于存储每个节点的特征,形状是[num_nodes, num_node_features]。data.edge_index: 用于存储节点之间的边,形状是 [2, num_edges]。data.pos: 存储节点的坐标,形状是[num_nodes, num_dimensions]。data.y: 存储样本标签。如果是每个节点都有标签,那么形状是[num_nodes, *];如果是整张图只有一个标签,那么形状是[1, *]。data.edge_attr: 存储边的特征。形状是[num_edges, num_edge_features]。

    实际上,Data对象不仅仅限制于这些属性,我们可以通过data.face来扩展Data,以张量保存三维网格中三角形的连接关系。

    还可以添加其他的属性,如下所示

    data = Data(x=x, edge_index=edge_index) data.train_idx = torch.tensor([...], dtype=torch.long) data.train_mask = torch.tensor([...], dtype=torch.bool) data.test_mask = torch.tensor([...], dtype=torch.bool)

    简单案例

    创建一个图

    用networkx包创建一个图,然后用torch转换为Data对象。注意节点编号要从0开始,索引须为整形。

    import numpy as np import networkx as nx import matplotlib.pyplot as plt import community as community_louvain # build a graph G = nx.Graph() edgelist = [(0, 1), (0, 2), (1, 3)] # note that the order of edges G.add_edges_from(edgelist) # plot the graph fig, ax = plt.subplots(figsize=(4,4)) option = {'font_family':'serif', 'font_size':'15', 'font_weight':'semibold'} nx.draw_networkx(G, node_size=400, **option) #pos=nx.spring_layout(G) plt.axis('off') plt.show()

    图如下

    创建Data示例

    利用networkx图数据创建Data对象。

    import torch from torch_geometric.data import InMemoryDataset, Data x = torch.eye(G.number_of_nodes(), dtype=torch.float) adj = nx.to_scipy_sparse_matrix(G).tocoo() row = torch.from_numpy(adj.row.astype(np.int64)).to(torch.long) col = torch.from_numpy(adj.col.astype(np.int64)).to(torch.long) edge_index = torch.stack([row, col], dim=0) # Compute communities. partition = community_louvain.best_partition(G) y = torch.tensor([partition[i] for i in range(G.number_of_nodes())]) # Select a single training node for each community # (we just use the first one). train_mask = torch.zeros(y.size(0), dtype=torch.bool) for i in range(int(y.max()) + 1): train_mask[(y == i).nonzero(as_tuple=False)[0]] = True data = Data(x=x, edge_index=edge_index, y=y, train_mask=train_mask)

    依次查看每个变量的值

    上面

    x是节点特征矩阵,这里设为单位矩阵。adj是图G的邻接矩阵的稀疏表示,左边节点对代表一条边,右边是边的值,adj是对称矩阵。row和col分别是adj中非零元素所在的行索引以及列索引。edge_index就是PyTorch Geometric中边列表的表示形式,里面包含两个列表,第一个是row,第二个是col,row和col对应位置的元素就构成一条边。注意edge_index是可以表示边方向的,如果是无向图,则一条边会出现两次,比如(0, 1)和(1, 0)是指一条边,如果是在有向图中,它们就表示两条不同的边。partition是用louvain算法对图G进行社区划分后的结果,可以看到0和2属于一个社区,1和3属于另一个社区。y就是节点的社区标签。train_mask是训练集的标签,用于半监督节点分类任务,每类节点中只有一个节点的标签设置为已知True,其他为未知False。

    自带函数

    接下来,我们可以看看data示例本身自带哪些函数。

    上面可以看到data示例本身可以调用很多函数,比如查看数据键名、数据键值、节点数量、边数量、节点特征数量、是否有孤立节点、是否有自环、有向或者无向等等。

    添加属性

    试试添加其他属性。

    从上面例子看到可以继续添加test_mask属性,设置某些节点为测试集。

    节点分类

    利用上面这个简单的图实现节点分类任务,类别就是上面louvain算法给出的社区类别。训练数据为节点0和1,测试数据为节点2和3。

    构建一个图卷积神经网络,包含两个卷积层,第一层输入维度为4,输出维度为16;第二层输入维度为16,输出维度为2;第一层后面接上一个激活函数,并进行dropout操作。

    import torch.nn.functional as F from torch_geometric.nn import GCNConv class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = GCNConv(data.num_node_features, 16) self.conv2 = GCNConv(16, 2) def forward(self): x, edge_index = data.x, data.edge_index x = self.conv1(x, edge_index) x = F.relu(x) x = F.dropout(x, training=self.training) x = self.conv2(x, edge_index) return F.log_softmax(x, dim=1)

    指定优化器,训练函数和测试函数的代码如下:

    device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu') model, data = Net().to(device), data.to(device) optimizer = torch.optim.Adam([ dict(params=model.conv1.parameters(), weight_decay=5e-4), dict(params=model.conv2.parameters(), weight_decay=0) ], lr=0.01) # Only perform weight-decay on first convolution. def train(): optimizer.zero_grad() out = model() loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask]) loss.backward() optimizer.step() def test(): model.eval() logits, accs = model(), [] for _, mask in data('train_mask', 'test_mask'): pred = logits[mask].max(1)[1] acc = pred.eq(data.y[mask]).sum().item() / mask.sum().item() accs.append(acc) return accs

    训练十次,输出训练集和测试集上的结果:

    for epoch in range(1, 11): train() log = 'Epoch: {:03d}, Train: {:.4f}, Test: {:.4f}' print(log.format(epoch, *test()))

    输出如下:

    Epoch: 001, Train: 0.5000, Test: 0.5000 Epoch: 002, Train: 0.5000, Test: 0.5000 Epoch: 003, Train: 0.5000, Test: 1.0000 Epoch: 004, Train: 1.0000, Test: 1.0000 Epoch: 005, Train: 1.0000, Test: 1.0000 Epoch: 006, Train: 1.0000, Test: 1.0000 Epoch: 007, Train: 1.0000, Test: 1.0000 Epoch: 008, Train: 1.0000, Test: 1.0000 Epoch: 009, Train: 1.0000, Test: 1.0000 Epoch: 010, Train: 1.0000, Test: 1.0000

    可以看到在这个简单的网络上,使用图卷积神经网络训练四次以后结果就收敛了,分类很准确。

    完整代码

    完整代码如下:

    import numpy as np import networkx as nx import matplotlib.pyplot as plt import community as community_louvain import torch from torch_geometric.data import InMemoryDataset, Data # build a graph G = nx.Graph() edgelist = [(0, 1), (0, 2), (1, 3)] # note that the order of edges G.add_edges_from(edgelist) x = torch.eye(G.number_of_nodes(), dtype=torch.float) adj = nx.to_scipy_sparse_matrix(G).tocoo() row = torch.from_numpy(adj.row.astype(np.int64)).to(torch.long) col = torch.from_numpy(adj.col.astype(np.int64)).to(torch.long) edge_index = torch.stack([row, col], dim=0) # Compute communities. partition = community_louvain.best_partition(G) y = torch.tensor([partition[i] for i in range(G.number_of_nodes())]) # Select a single training node for each community # (we just use the first one). train_mask = torch.zeros(y.size(0), dtype=torch.bool) for i in range(int(y.max()) + 1): train_mask[(y == i).nonzero(as_tuple=False)[0]] = True data = Data(x=x, edge_index=edge_index, y=y, train_mask=train_mask) remaining = (~data.train_mask).nonzero(as_tuple=False).view(-1) remaining = remaining[torch.randperm(remaining.size(0))] data.test_mask = torch.zeros(y.size(0), dtype=torch.bool) data.test_mask.fill_(False) data.test_mask[remaining[:]] = True import torch.nn.functional as F from torch_geometric.nn import GCNConv class Net(torch.nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = GCNConv(data.num_node_features, 16) self.conv2 = GCNConv(16, 2) def forward(self): x, edge_index = data.x, data.edge_index x = self.conv1(x, edge_index) x = F.relu(x) x = F.dropout(x, training=self.training) x = self.conv2(x, edge_index) return F.log_softmax(x, dim=1) device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu') model, data = Net().to(device), data.to(device) optimizer = torch.optim.Adam([ dict(params=model.conv1.parameters(), weight_decay=5e-4), dict(params=model.conv2.parameters(), weight_decay=0) ], lr=0.01) # Only perform weight-decay on first convolution. def train(): optimizer.zero_grad() out = model() loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask]) loss.backward() optimizer.step() def test(): model.eval() logits, accs = model(), [] for _, mask in data('train_mask', 'test_mask'): pred = logits[mask].max(1)[1] acc = pred.eq(data.y[mask]).sum().item() / mask.sum().item() accs.append(acc) return accs for epoch in range(1, 11): train() log = 'Epoch: {:03d}, Train: {:.4f}, Test: {:.4f}' print(log.format(epoch, *test()))

    相关文章

    PyTorch图神经网络实践(一)环境安装

    Processed: 0.010, SQL: 8